Google analogy test set (State of the art)
Revision as of 03:07, 6 January 2017 by Anna gladkova (talk | contribs)
- Test set developed by Mikolov et al. (2013b)[1]
- Contains 19544 question pairs (8,869 semantic and 10,675 syntactic (i.e. morphological) questions)
- 14 types of relations (9 morphological and 5 semantic)
- Original link deprecated, copy hosted @TensorFlow
This page reports results obtained with the "vanilla" 3CosAdd method, or vector offset[2]. For other methods, see Analogy (State of the art)
Table of results
- Listed in chronological order.
Model | Reference | Sem | Syn | Corpus and window size |
---|---|---|---|---|
CBOW (640 dim) | Mikolov et al (2013) [2] | 24.0 | 64.0 | 6B Google News corpus, window 10 |
Skip-Gram (640 dim) | Mikolov et al (2013) [2] | 55.0 | 59.0 | ibid |
RNNLM (640 dim) | Mikolov et al (2013) [2] | 9.0 | 36.0 | |
NNLM (640 dim) | Mikolov et al (2013) [2] | 23.0 | 53.0 | |
GloVe (300 dim) | Pennington et al (2014) [3] | 81.9 | 69.3 | 42 B corpus, window 5 |
SVD | Levy et al (2015) [4] | 55.4 | Wikipedia 1.5B, window 2 | |
PPMI | Levy et al (2015) [4] | 55.3 | ibid | |
Skip-Gram | Levy et al (2015) [4] | 67.6 | ibid | |
GloVe | Levy et al (2015) [4] | 56.9 | ibid | |
Skip-Gram (50 dim) | Lai et al (2015) [5] | 44.8 | 44.43 | W&N 2.8 B corpus, window 5 |
CBOW (50 dim) | Lai et al (2015) [5] | 44.43 | 55.83 | ibid |
DVRS+SG (300 dim) | Garten et al (2015) [6] | 74.0 | 60.0 | enwiki9, window 10 |
Methodological Issues
- This test set is not balanced: 20-70 pairs per category, different number of semantic and morphological relations. See other sets at Analogy (State of the art).
- In the semantic part, country:capital relation accounts for over 50% of all semantic questions.
- Researchers usually report only the average accuracy for all semantic/syntactic questions, but there is a lot of variation for individual relations - between 10.53% and 99.41% [7], also depending on parameters of the model [8]. Since the test is not balanced, the above results could be flattering to the embeddings, and averaging the mean scores for each subcategory would yield lower results.
- Accuracy also depends on the method with which analogies are solved [9] [10]. Set-based methods[11] considerably outperform pair-based methods, showing that models do in fact encode much "missed" information.
References
- ↑ Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
- ↑ 2.0 2.1 2.2 2.3 2.4 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
- ↑ Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) (Vol. 12, pp. 1532–1543). Retrieved from http://llcao.net/cu-deeplearning15/presentation/nn-pres.pdf
- ↑ 4.0 4.1 4.2 4.3 Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.
- ↑ 5.0 5.1 Lai, S., Liu, K., Xu, L., & Zhao, J. (2015). How to Generate a Good Word Embedding? arXiv Preprint arXiv:1507.05523. Retrieved from http://arxiv.org/abs/1507.05523
- ↑ Garten, J., Sagae, K., Ustun, V., & Dehghani, M. (2015). Combining Distributed Vector Representations for Words. In Proceedings of NAACL-HLT (pp. 95–101). Retrieved from http://www.researchgate.net/profile/Volkan_Ustun/publication/277332298_Combining_Distributed_Vector_Representations_for_Words/links/55705a6308aee1eea7586e93.pdf
- ↑ Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf
- ↑ Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
- ↑ Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the First Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. Retrieved from http://anthology.aclweb.org/W16-2503
- ↑ Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf
- ↑ Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf