Google analogy test set (State of the art)

Test set developed by Mikolov et al. (2013b)^[1]
Contains 19544 question pairs (8,869 semantic and 10,675 syntactic (i.e. morphological) questions)
14 types of relations (9 morphological and 5 semantic)
Original link deprecated, copy hosted @TensorFlow

This page reports results obtained with the "vanilla" 3CosAdd method, or vector offset^[2]. For other methods, see Analogy (State of the art)

Table of results

Listed in chronological order.

Model	Reference	Sem	Syn	Corpus and window size
CBOW (640 dim)	Mikolov et al (2013) ^[2]	24.0	64.0	6B Google News corpus, window 10
Skip-Gram (640 dim)	Mikolov et al (2013) ^[2]	55.0	59.0	ibid
RNNLM (640 dim)	Mikolov et al (2013) ^[2]	9.0	36.0
NNLM (640 dim)	Mikolov et al (2013) ^[2]	23.0	53.0
GloVe (300 dim)	Pennington et al (2014) ^[3]	81.9	69.3	42 B corpus, window 5
SVD	Levy et al (2015) ^[4]	55.4		Wikipedia 1.5B, window 2
PPMI	Levy et al (2015) ^[4]	55.3		ibid
Skip-Gram	Levy et al (2015) ^[4]	67.6		ibid
GloVe	Levy et al (2015) ^[4]	56.9		ibid
Skip-Gram (50 dim)	Lai et al (2015) ^[5]	44.8	44.43	W&N 2.8 B corpus, window 5
CBOW (50 dim)	Lai et al (2015) ^[5]	44.43	55.83	ibid
DVRS+SG (300 dim)	Garten et al (2015) ^[6]	74.0	60.0	enwiki9, window 10

Methodological Issues

This test set is not balanced: 20-70 pairs per category, different number of semantic and morphological relations. See other sets at Analogy (State of the art).
In the semantic part, country:capital relation accounts for over 50% of all semantic questions.
Researchers usually report only the average accuracy for all semantic/syntactic questions, but there is a lot of variation for individual relations - between 10.53% and 99.41% ^[7], also depending on parameters of the model ^[8]. Since the test is not balanced, the above results could be flattering to the embeddings, and averaging the mean scores for each subcategory would yield lower results.
Accuracy also depends on the method with which analogies are solved ^[9]. Set-based methods^[10] considerably outperform pair-based methods, showing that models do in fact encode much "missed" information.

References

↑ Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
↑ ^2.0 ^2.1 ^2.2 ^2.3 ^2.4 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
↑ Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) (Vol. 12, pp. 1532–1543). Retrieved from http://llcao.net/cu-deeplearning15/presentation/nn-pres.pdf
↑ ^4.0 ^4.1 ^4.2 ^4.3 Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.
↑ ^5.0 ^5.1 Lai, S., Liu, K., Xu, L., & Zhao, J. (2015). How to Generate a Good Word Embedding? arXiv Preprint arXiv:1507.05523. Retrieved from http://arxiv.org/abs/1507.05523
↑ Garten, J., Sagae, K., Ustun, V., & Dehghani, M. (2015). Combining Distributed Vector Representations for Words. In Proceedings of NAACL-HLT (pp. 95–101). Retrieved from http://www.researchgate.net/profile/Volkan_Ustun/publication/277332298_Combining_Distributed_Vector_Representations_for_Words/links/55705a6308aee1eea7586e93.pdf
↑ Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf
↑ Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
↑ Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the First Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. Retrieved from http://anthology.aclweb.org/W16-2503
↑ Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf

[1] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).

[Mikolov2013-2] 2.0 ^2.1 ^2.2 ^2.3 ^2.4 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).

[GloVe-3] Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) (Vol. 12, pp. 1532–1543). Retrieved from http://llcao.net/cu-deeplearning15/presentation/nn-pres.pdf

[Levy2015-4] 4.0 ^4.1 ^4.2 ^4.3 Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.

[Lai2015-5] 5.0 ^5.1 Lai, S., Liu, K., Xu, L., & Zhao, J. (2015). How to Generate a Good Word Embedding? arXiv Preprint arXiv:1507.05523. Retrieved from http://arxiv.org/abs/1507.05523

[6] Garten, J., Sagae, K., Ustun, V., & Dehghani, M. (2015). Combining Distributed Vector Representations for Words. In Proceedings of NAACL-HLT (pp. 95–101). Retrieved from http://www.researchgate.net/profile/Volkan_Ustun/publication/277332298_Combining_Distributed_Vector_Representations_for_Words/links/55705a6308aee1eea7586e93.pdf

[7] Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf

[8] Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf

[9] Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the First Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. Retrieved from http://anthology.aclweb.org/W16-2503

[10] Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Google analogy test set (State of the art)

Table of results

Methodological Issues

References

Navigation menu

Search