Bigger analogy test set (State of the art)

Dataset description

New dataset proposed by Gladkova et al. (2016) ^[1]
available here
dataset balanced across 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics)
10 relations of each type, 50 unique pairs per category
99,200 questions in total
more challenging than the Google set because of more diverse relations
where applicable, more than one correct answer is supplied (e.g. both canine and animal are hypernyms of dog).
comes with a testing script a testing script that implements 5 methods of solving analogies (See Analogy (State of the art))

This page reports results obtained with the "vanilla" 3CosAdd method, or vector offset^[2].

Table of results

Listed in chronological order.

Model	Reference	Inflectional morphology	Derivational morphology	Lexicographic semantics	Encyclopedic semantics	Corpus, window size, vector size
SVD	Drozd et al. (2016) ^[3]	44.0	9.8	10.1	18.5	5B corpus (Araneum + Wikipedia + UkWac), window 3, 1000 dimensions
GloVe	Drozd et al. (2016) ^[3]	59.9	10.2	10.9	31.5	5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions
Skip-Gram	Drozd et al. (2016) ^[3]	61.0	11.2	9.1	26.5	5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions

Methodological issues

As with other analogy test sets, accuracy depends not only on the embedding and its parameters, but also on the method with which analogies are solved ^[4] ^[5]. Set-based methods^[6] considerably outperform pair-based methods, showing that models do in fact encode much "missed" information.
Therefore it is more accurate to think of analogy task as a way to describe and characterize an embedding, rather than evaluate it.

References

↑ Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
↑ Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
↑ ^{Jump up to: 3.0} ^3.1 ^3.2 Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf
↑ Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the First Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. Retrieved from http://anthology.aclweb.org/W16-2503
↑ Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf
↑ Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf

[Gladkova2016-1] Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf

[Mikolov2013-2] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).

[Drozd2016-3] {Jump up to: 3.0} ^3.1 ^3.2 Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf

[4] Linzen, T. (2016). Issues in evaluating semantic spaces using word analogies. In Proceedings of the First Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. Retrieved from http://anthology.aclweb.org/W16-2503

[5] Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180). Retrieved from http://anthology.aclweb.org/W/W14/W14-1618.pdf

[6] Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf

[1]

[2]

[3]

[4]

[5]

[6]

Bigger analogy test set (State of the art)

Contents

Dataset description

Table of results

Methodological issues

References

Navigation menu

Search