Bigger analogy test set (State of the art)
Revision as of 02:52, 6 January 2017 by Anna gladkova (talk | contribs) (This page lists published results on Bigger Analogy Test Set (BATS))
Dataset description
- New dataset proposed by Gladkova et al. (2016) [1]
- dataset balanced across 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics)
- 10 relations of each type, 50 unique pairs per category
- 99,200 questions in total
- more challenging than the Google set because of more diverse relations
- where applicable, more than one correct answer is supplied (e.g. both canine and animal are hypernyms of dog).
- comes with a testing script a testing script that implements 5 methods of solving analogies (See Analogy (State of the art))
This page reports results obtained with the "vanilla" 3CosAdd method, or vector offset[2].
Table of results
- Listed in chronological order.
Model | Reference | Inflectional morphology |
Derivational morphology |
Lexicographic semantics |
Encyclopedic semantics |
Corpus, window size, vector size |
---|---|---|---|---|---|---|
SVD | Drozd et al. (2016) [3] | 44.0 | 9.8 | 10.1 | 18.5 | 5B corpus (Araneum + Wikipedia + UkWac), window 3, 1000 dimensions |
GloVe | Drozd et al. (2016) [3] | 59.9 | 10.2 | 10.9 | 31.5 | 5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions |
Skip-Gram | Drozd et al. (2016) [3] | 61.0 | 11.2 | 9.1 | 26.5 | 5B corpus (Araneum + Wikipedia + UkWac), window 8, 300 dimensions |
References
- ↑ Gladkova, A., Drozd, A., & Matsuoka, S. (2016). Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In Proceedings of the NAACL-HLT SRW (pp. 47–54). San Diego, California, June 12-17, 2016: ACL. Retrieved from https://www.aclweb.org/anthology/N/N16/N16-2002.pdf
- ↑ Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations (ICLR).
- ↑ 3.0 3.1 3.2 Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: beyond king - man + woman = queen. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3519–3530). Osaka, Japan, December 11-17: ACL. Retrieved from https://www.aclweb.org/anthology/C/C16/C16-1332.pdf