Google analogy test set (State of the art)
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
- Test set developed by Mikolov et al. (2013b)[1]
- Contains 19544 question pairs (8,869 semantic and 10,675 syntactic (i.e. morphological) questions)
- 14 types of relations (9 morphological and 5 semantic)
- Original link deprecated, copy hosted @TensorFlow
- see also: Analogy (State of the art)
This page reports results obtained with the "vanilla" 3CosAdd method, or vector offset[2]. For other methods, see Analogy (State of the art)
Table of results
- Listed in chronological order.
Model | Reference | Sem | Syn | Corpus and window size |
---|---|---|---|---|
CBOW (640 dim) | Mikolov et al (2013) [2] | 24.0 | 64.0 | 6B Google News corpus, window 10 |
Skip-Gram (640 dim) | Mikolov et al (2013) [2] | 55.0 | 59.0 | ibid |
RNNLM (640 dim) | Mikolov et al (2013) [2] | 9.0 | 36.0 | |
NNLM (640 dim) | Mikolov et al (2013) [2] | 23.0 | 53.0 | |
GloVe (300 dim) | Pennington et al (2014) [3] | 81.9 | 69.3 | 42 B corpus, window 5 |
SVD | Levy et al (2015) [4] | 55.4 | Wikipedia 1.5B, window 2 | |
PPMI | Levy et al (2015) [4] | 55.3 | ibid | |
Skip-Gram | Levy et al (2015) [4] | 67.6 | ibid | |
GloVe | Levy et al (2015) [4] | 56.9 | ibid | |
Skip-Gram (50 dim) | Lai et al (2015) [5] | 44.8 | 44.43 | W&N 2.8 B corpus, window 5 |
CBOW (50 dim) | Lai et al (2015) [5] | 44.43 | 55.83 | ibid |
DVRS+SG (300 dim) | Garten et al (2015) [6] | 74.0 | 60.0 | enwiki9, window 10 |
Methodological Issues
- This test set is not balanced: 20-70 pairs per category, different number of semantic and morphological relations. See other sets at Analogy (State of the art).
- In the semantic part, country:capital relation accounts for over 50% of all semantic questions.
- Researchers usually report only the average accuracy for all semantic/syntactic questions, but there is a lot of variation for individual relations - between 10.53% and 99.41% [7], also depending on parameters of the model [8]. Since the test is not balanced, the above results could be flattering to the embeddings, and averaging the mean scores for each subcategory would yield lower results.
- Accuracy also depends on the method with which analogies are solved [9] [10]. Set-based methods[11] considerably outperform pair-based methods, showing that models do in fact encode much "missed" information.
References
- ↑ Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
- ↑ 2.0 2.1 2.2 2.3 2.4 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
- ↑ Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) (Vol. 12, pp. 1532–1543). Retrieved from http://llcao.net/cu-deeplearning15/presentation/nn-pres.pdf
- ↑ 4.0 4.1 4.2 4.3 Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.
- ↑ 5.0 5.1 Lai, S., Liu, K., Xu, L., & Zhao, J. (2015). How to Generate a Good Word Embedding? arXiv Preprint arXiv:1507.05523. Retrieved from http://arxiv.org/abs/1507.05523
- ↑ Garten, J., Sagae, K., Ustun, V., & Dehghani, M. (2015). Combining Distributed Vector Representations for Words. In Proceedings of NAACL-HLT (pp. 95–101). Retrieved from http://www.researchgate.net/profile/Volkan_Ustun/publication/277332298_Combining_Distributed_Vector_Representations_for_Words/links/55705a6308aee1eea7586e93.pdf
- ↑ Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf
- ↑ Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
- ↑ Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the First Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. Retrieved from http://anthology.aclweb.org/W16-2503
- ↑ Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf
- ↑ Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf