Google analogy test set (State of the art)

From ACL Wiki
Revision as of 00:01, 6 January 2017 by Anna gladkova (talk | contribs) (Created page with "* Test set developed by Mikolov et al. (2013b)<ref>Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proc...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page reports results obtained with the "vanilla" 3CosAdd method, or vector offset[2]. For other methods, see Analogy (State of the art)

Table of results

  • Listed in chronological order.


Model Reference Sem Syn Corpus and window size
CBOW (640 dim) Mikolov et al (2013) [2] 24.0 64.0 6B Google News corpus, window 10
Skip-Gram (640 dim) Mikolov et al (2013) [2] 55.0 59.0 ibid
RNNLM (640 dim) Mikolov et al (2013) [2] 9.0 36.0
NNLM (640 dim) Mikolov et al (2013) [2] 23.0 53.0
GloVe (300 dim) Pennington et al (2014) [3] 81.9 69.3 42 B corpus, window 5
SVD Levy et al (2015) [4] 55.4 Wikipedia 1.5B, window 2
PPMI Levy et al (2015) [4] 55.3 ibid
Skip-Gram Levy et al (2015) [4] 67.6 ibid
GloVe Levy et al (2015) [4] 56.9 ibid
Skip-Gram (50 dim) Lai et al (2015) [5] 44.8 44.43 W&N 2.8 B corpus, window 5
CBOW (50 dim) Lai et al (2015) [5] 44.43 55.83 ibid
DVRS+SG (300 dim) Garten et al (2015) [6] 74.0 60.0 enwiki9, window 10


Methodological Issues

  • This test set is not balanced: 20-70 pairs per category, different number of semantic and morphological relations. See other sets at Analogy (State of the art).
  • In the semantic part, country:capital relation accounts for over 50% of all semantic questions.
  • Researchers usually report only the average accuracy for all semantic/syntactic questions, but there is a lot of variation for individual relations - between 10.53% and 99.41% [7], also depending on parameters of the model [8]. Since the test is not balanced, the above results could be flattering to the embeddings, and averaging the mean scores for each subcategory would yield lower results.
  • Accuracy also depends on the method with which analogies are solved [9]. Set-based methods[10] considerably outperform pair-based methods, showing that models do in fact encode much "missed" information.


References

  1. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
  2. 2.0 2.1 2.2 2.3 2.4 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
  3. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) (Vol. 12, pp. 1532–1543). Retrieved from http://llcao.net/cu-deeplearning15/presentation/nn-pres.pdf
  4. 4.0 4.1 4.2 4.3 Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.
  5. 5.0 5.1 Lai, S., Liu, K., Xu, L., & Zhao, J. (2015). How to Generate a Good Word Embedding? arXiv Preprint arXiv:1507.05523. Retrieved from http://arxiv.org/abs/1507.05523
  6. Garten, J., Sagae, K., Ustun, V., & Dehghani, M. (2015). Combining Distributed Vector Representations for Words. In Proceedings of NAACL-HLT (pp. 95–101). Retrieved from http://www.researchgate.net/profile/Volkan_Ustun/publication/277332298_Combining_Distributed_Vector_Representations_for_Words/links/55705a6308aee1eea7586e93.pdf
  7. Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf
  8. Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
  9. Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the First Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. Retrieved from http://anthology.aclweb.org/W16-2503
  10. Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf