Towards Qualitative Word Embeddings Evaluation: Measuring Neighbors Variation

Bénédicte Pierrejean, Ludovic Tanguy


Abstract
We propose a method to study the variation lying between different word embeddings models trained with different parameters. We explore the variation between models trained with only one varying parameter by observing the distributional neighbors variation and show how changing only one parameter can have a massive impact on a given semantic space. We show that the variation is not affecting all words of the semantic space equally. Variation is influenced by parameters such as setting a parameter to its minimum or maximum value but it also depends on the corpus intrinsic features such as the frequency of a word. We identify semantic classes of words remaining stable across the models trained and specific words having high variation.
Anthology ID:
N18-4005
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:
June
Year:
2018
Address:
New Orleans, Louisiana, USA
Editors:
Silvio Ricardo Cordeiro, Shereen Oraby, Umashanthi Pavalanathan, Kyeongmin Rim
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–39
Language:
URL:
https://aclanthology.org/N18-4005
DOI:
10.18653/v1/N18-4005
Bibkey:
Cite (ACL):
Bénédicte Pierrejean and Ludovic Tanguy. 2018. Towards Qualitative Word Embeddings Evaluation: Measuring Neighbors Variation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 32–39, New Orleans, Louisiana, USA. Association for Computational Linguistics.
Cite (Informal):
Towards Qualitative Word Embeddings Evaluation: Measuring Neighbors Variation (Pierrejean & Tanguy, NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-4005.pdf