Bigger analogy test set (State of the art)
Revision as of 03:06, 6 January 2017 by Anna gladkova (talk | contribs)
Dataset description
- New dataset proposed by Gladkova et al. (2016) [1]
- available here
- dataset balanced across 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics)
- 10 relations of each type, 50 unique pairs per category
- 99,200 questions in total
- more challenging than the Google set because of more diverse relations
- where applicable, more than one correct answer is supplied (e.g. both canine and animal are hypernyms of dog).
- comes with a testing script a testing script that implements 5 methods of solving analogies (See Analogy (State of the art))
This page reports results obtained with the "vanilla" 3CosAdd method, or vector offset[2].
Table of results
- Listed in chronological order.
Model | Reference | Inflectional morphology |
Derivational morphology |
Lexicographic semantics |
Encyclopedic semantics |
Corpus, window size, vector size |
---|---|---|---|---|---|---|
SVD | Drozd et al. (2016) [3] | 44.0 | 9.8 | 10.1 | 18.5 | 5B corpus (Araneum + Wikipedia + UkWac), window 3, 1000 dimensions |
GloVe | Drozd et al. (2016) [3] | 59.9 | 10.2 | 10.9 | 31.5 | 5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions |
Skip-Gram | Drozd et al. (2016) [3] | 61.0 | 11.2 | 9.1 | 26.5 | 5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions |
Methodological issues
- As with other analogy test sets, accuracy depends not only on the embedding and its parameters, but also on the method with which analogies are solved [4] [5]. Set-based methods[6] considerably outperform pair-based methods, showing that models do in fact encode much "missed" information.
- Therefore it is more accurate to think of analogy task as a way to describe and characterize an embedding, rather than evaluate it.
References
- ↑ Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
- ↑ Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
- ↑ 3.0 3.1 3.2 Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf
- ↑ Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the First Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. Retrieved from http://anthology.aclweb.org/W16-2503
- ↑ Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf
- ↑ Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf