Bigger analogy test set (State of the art)

From ACL Wiki
Jump to: navigation, search

Dataset description

  • New dataset proposed by Gladkova et al. (2016) [1]
  • available here
  • dataset balanced across 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics)
  • 10 relations of each type, 50 unique pairs per category
  • 99,200 questions in total
  • more challenging than the Google set because of more diverse relations
  • where applicable, more than one correct answer is supplied (e.g. both canine and animal are hypernyms of dog).
  • comes with a testing script a testing script that implements 5 methods of solving analogies (See Analogy (State of the art))

This page reports results obtained with the "vanilla" 3CosAdd method, or vector offset[2].

Table of results

  • Listed in chronological order.
Model Reference Inflectional
morphology
Derivational
morphology
Lexicographic
semantics
Encyclopedic
semantics
Corpus, window size, vector size
SVD Drozd et al. (2016) [3] 44.0 9.8 10.1 18.5 5B corpus (Araneum + Wikipedia + UkWac), window 3, 1000 dimensions
GloVe Drozd et al. (2016) [3] 59.9 10.2 10.9 31.5 5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions
Skip-Gram Drozd et al. (2016) [3] 61.0 11.2 9.1 26.5 5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions


Methodological issues

  • As with other analogy test sets, accuracy depends not only on the embedding and its parameters, but also on the method with which analogies are solved [4] [5]. Set-based methods[6] considerably outperform pair-based methods, showing that models do in fact encode much "missed" information.
  • Therefore it is more accurate to think of analogy task as a way to describe and characterize an embedding, rather than evaluate it.

References

  1. Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
  2. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
  3. 3.0 3.1 3.2 Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf
  4. Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the First Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. Retrieved from http://anthology.aclweb.org/W16-2503
  5. Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf
  6. Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf