Difference between revisions of "SimLex-999 (State of the art)"

Latest revision as of 18:34, 15 September 2019

SimLex-999 aims at a cleaner benchmark of similarity (but not relatedness). Pairs of words were chosen to represent different ranges of similarity and with either high or low association. Subjects were instructed to differentiate between similarity and relatedness and rate regarding the former only.

Algorithm	Reference for algorithm	Reference for reported results	Type	Spearman's rho	Pearson's r	Notes
Re16	Recski et al. (2016)^[1]	Recski et al. (2016)^[1]	Hybrid	0.76	-
SVR4	Banjade et al. (2015)^[2]	Banjade et al. (2015)^[2]	Combined	0.642	0.658
Do19-hybrid	Dobó (2019)^[3]	Dobó (2019)^[3]	Hybrid	0.621	0.481
Sp17	Speer et al. (2017)^[4]	Dobó (2019)^[3]	Hybrid	0.616	0.634
joint(SP+,skip-gram)	Schwartz et al. (2015)^[5]	Schwartz et al. (2015)^[5]	Distributional	0.56	-	Trained on word2vec corpus, best results for pure distributional model.
UMBC	Han et al. (2013)^[6]	Banjade et al. (2015)^[2]		0.558	0.557	without using POS information
SP+	Schwartz et al. (2015)^[5]	Schwartz et al. (2015)^[5]	Distributional	0.52	-
RNNenc	Hill et al. (2014b)^[7]	Hill et al. (2014b)^[7]	Distributional, multilingual	0.52	-
Sa18	Salle et al. (2018)^[8]	Dobó (2019)^[3]	Distributional	0.417	0.426
Word2vec	Mikolov et al. (2013)^[9]	Hill et al. (2014a)^[10]	Distributional	0.414	-	Trained on Wikipedia
Pe14	Pennington et al. (2014)^[11]	Dobó (2019)^[3]	Distributional	0.406	0.433
Lesk		Banjade et al. (2015)^[2]		0.404	0.347
Do19-corpus	Dobó (2019)^[3]	Dobó (2019)^[3]	Distributional	0.393	0.401
ESA		Banjade et al. (2015)^[2]		0.271	0.145
Neural language model	Collobert & Weston (2008)^[12]	Hill et al. (2014a)^[10]	Distributional	0.268	-	Trained on Wikipedia
Neural language model with global context	Huang et al. (2012)^[13]	Hill et al. (2014a)^[10]	Distributional	0.098	-	Trained on Wikipedia

References

↑ ^{Jump up to: 1.0} ^1.1 Recski, G., Iklódi, E., Pajkossy, K., & Kornai, A. (2016). Measuring semantic similarity of words using concept networks. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 193-200.
↑ ^{Jump up to: 2.0} ^2.1 ^2.2 ^2.3 ^2.4 Banjade, R., Maharjan, N., Niraula, N., Rus, V., & Gautam, D. (2015). Lemon and Tea Are Not Similar: Measuring Word-to-Word Similarity by Combining Different Methods. Computational Linguistics and Intelligent Text Processing, 9041, 335–346. doi:10.1007/978-3-319-18111-0_25
↑ ^{Jump up to: 3.0} ^3.1 ^3.2 ^3.3 ^3.4 ^3.5 ^3.6 Dobó, A. (2019). A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages. University of Szeged. GitHub repository
↑ Speer, R., Chin, J., and Havasi, C. (2017). Conceptnet 5.5: An open multilingual graph of general knowledge. AAAI-17, pp. 4444-4451.
↑ ^{Jump up to: 5.0} ^5.1 ^5.2 ^5.3 Schwartz, R., Reichart, Roi, Rappoport, A. (2015). Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction, CoNLL 2015.
↑ Han, L., Kashyap, A., Finin, T., Mayfield, J., Weese, J.: UMBC EBIQUITY-CORE: Semantic textual similarity systems. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics, vol. 1, pp. 44–52 (2013)
↑ ^{Jump up to: 7.0} ^7.1 Hill, F., Cho, K., Jean, S., Devin, C., & Bengio, Y. (2014b). Not All Neural Embeddings are Born Equal, 1–5.
↑ Salle A., Idiart M., and Villavicencio A. (2018). LexVec
↑ Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of International Conference of Learning Representations, Scottsdale, Arizona, USA.
↑ ^{Jump up to: 10.0} ^10.1 ^10.2 Hill, F., Reichart, R., & Korhonen, A. (2014a). SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation. Computation and Language.
↑ Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. EMNLP 2014, pp. 1532-1543.
↑ R. Collobert and J. Weston. 2008. A unified architecture for natural language pro- cessing: Deep neural networks with multitask learning. In International Conference on Machine Learn- ing, ICML.
↑ Eric H Huang, Richard Socher, Christopher D Manning, and Andrew Y Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 873–882. Association for Computational Linguistics.

[recski16-1] {Jump up to: 1.0} ^1.1 Recski, G., Iklódi, E., Pajkossy, K., & Kornai, A. (2016). Measuring semantic similarity of words using concept networks. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 193-200.

[lemontea-2] {Jump up to: 2.0} ^2.1 ^2.2 ^2.3 ^2.4 Banjade, R., Maharjan, N., Niraula, N., Rus, V., & Gautam, D. (2015). Lemon and Tea Are Not Similar: Measuring Word-to-Word Similarity by Combining Different Methods. Computational Linguistics and Intelligent Text Processing, 9041, 335–346. doi:10.1007/978-3-319-18111-0_25

[dobo19-3] {Jump up to: 3.0} ^3.1 ^3.2 ^3.3 ^3.4 ^3.5 ^3.6 Dobó, A. (2019). A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages. University of Szeged. GitHub repository

[speer17-4] Speer, R., Chin, J., and Havasi, C. (2017). Conceptnet 5.5: An open multilingual graph of general knowledge. AAAI-17, pp. 4444-4451.

[spplus-5] {Jump up to: 5.0} ^5.1 ^5.2 ^5.3 Schwartz, R., Reichart, Roi, Rappoport, A. (2015). Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction, CoNLL 2015.

[6] Han, L., Kashyap, A., Finin, T., Mayfield, J., Weese, J.: UMBC EBIQUITY-CORE: Semantic textual similarity systems. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics, vol. 1, pp. 44–52 (2013)

[rnnenc-7] {Jump up to: 7.0} ^7.1 Hill, F., Cho, K., Jean, S., Devin, C., & Bengio, Y. (2014b). Not All Neural Embeddings are Born Equal, 1–5.

[salle18-8] Salle A., Idiart M., and Villavicencio A. (2018). LexVec

[9] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of International Conference of Learning Representations, Scottsdale, Arizona, USA.

[simlex-10] {Jump up to: 10.0} ^10.1 ^10.2 Hill, F., Reichart, R., & Korhonen, A. (2014a). SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation. Computation and Language.

[pennington14-11] Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. EMNLP 2014, pp. 1532-1543.

[12] R. Collobert and J. Weston. 2008. A unified architecture for natural language pro- cessing: Deep neural networks with multitask learning. In International Conference on Machine Learn- ing, ICML.

[13] Eric H Huang, Richard Socher, Christopher D Manning, and Andrew Y Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 873–882. Association for Computational Linguistics.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

@@ Line 19: / Line 19: @@
 |-
 | Do19-hybrid
-| Dobó (2019)<ref name=dobo19>Dobó, A. (2019). [http://doktori.bibl.u-szeged.hu/10120/1/AndrasDoboThesis2019.pdf A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages]. University of Szeged.</ref>
+| Dobó (2019)<ref name=dobo19>Dobó, A. (2019). [http://doktori.bibl.u-szeged.hu/10120/1/AndrasDoboThesis2019.pdf A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages]. University of Szeged. [https://github.com/doboandras/dsm-parameter-analysis GitHub repository]</ref>
 | Dobó (2019)<ref name=dobo19/>
 | Hybrid || 0.621 || 0.481

Difference between revisions of "SimLex-999 (State of the art)"

Latest revision as of 18:34, 15 September 2019

References

Navigation menu

Search