Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology-Based Representations

Paul Michel, Abhilasha Ravichander, Shruti Rijhwani


Abstract
We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document representations in traditional NLP tasks, specifically document clustering and sentiment classification. We find that the embeddings do not benefit text analysis. In fact, performance is worse than simple techniques like tf-idf, indicating that the geometry of the document does not provide enough variability for classification on the basis of topic or sentiment in the chosen datasets.
Anthology ID:
W17-2628
Volume:
Proceedings of the 2nd Workshop on Representation Learning for NLP
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Phil Blunsom, Antoine Bordes, Kyunghyun Cho, Shay Cohen, Chris Dyer, Edward Grefenstette, Karl Moritz Hermann, Laura Rimell, Jason Weston, Scott Yih
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
235–240
Language:
URL:
https://aclanthology.org/W17-2628
DOI:
10.18653/v1/W17-2628
Bibkey:
Cite (ACL):
Paul Michel, Abhilasha Ravichander, and Shruti Rijhwani. 2017. Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology-Based Representations. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 235–240, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology-Based Representations (Michel et al., RepL4NLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2628.pdf
Data
IMDb Movie Reviews