Difference between revisions of "WordSimilarity-353 Test Collection (State of the art)"

From ACL Wiki
Jump to: navigation, search
(See also)
 
(16 intermediate revisions by 4 users not shown)
Line 8: Line 8:
 
* introduced by [http://www.cs.technion.ac.il/~gabr/papers/tois_context.pdf Finkelstein et al. (2002)]
 
* introduced by [http://www.cs.technion.ac.il/~gabr/papers/tois_context.pdf Finkelstein et al. (2002)]
 
* subsequently used by many other researchers
 
* subsequently used by many other researchers
 +
* see also: [[Similarity (State of the art)]]
  
  
Line 13: Line 14:
  
 
* '''Listed in order of increasing [http://en.wikipedia.org/wiki/Spearman_rank_correlation Spearman's rho].'''
 
* '''Listed in order of increasing [http://en.wikipedia.org/wiki/Spearman_rank_correlation Spearman's rho].'''
 
  
 
{| border="1" cellpadding="5" cellspacing="1"  
 
{| border="1" cellpadding="5" cellspacing="1"  
 
|-
 
|-
 
! Algorithm
 
! Algorithm
! Reference
+
! Reference for algorithm
 +
! Reference for reported results
 
! Type
 
! Type
 
! Spearman's rho
 
! Spearman's rho
 +
! Pearson's r
 +
|-
 +
| L&C
 +
| Leacock and Chodorow (1998)
 +
| Hassan and Mihalcea (2011)
 +
| Knowledge-based
 +
| 0.302
 +
| 0.356
 +
|-
 +
| WNE
 +
| Jarmasz (2003)
 +
| Hassan and Mihalcea (2011)
 +
| Knowledge-based
 +
| 0.305
 +
| 0.271
 +
|-
 +
| J&C
 +
| Jiang and Conrath 1997
 +
| Hassan and Mihalcea (2011)
 +
| Knowledge-based
 +
| 0.318
 +
| 0.354
 +
|-
 +
| L&C
 +
| Leacock and Chodorow (1998)
 +
| Hassan and Mihalcea (2011)
 +
| Knowledge-based
 +
| 0.348
 +
| 0.341
 +
|-
 +
| H&S
 +
| Hirst and St-Onge (1998)
 +
| Hassan and Mihalcea (2011)
 +
| Knowledge-based
 +
| 0.302
 +
| 0.356
 +
|-
 +
| Lin
 +
| Lin (1998)
 +
| Hassan and Mihalcea (2011)
 +
| Corpus-based
 +
| 0.348
 +
| 0.357
 +
|-
 +
| Resnik
 +
| Resnik (1995)
 +
| Hassan and Mihalcea (2011)
 +
| Knowledge-based
 +
| 0.353
 +
| 0.365
 +
|-
 +
| ROGET
 +
| Jarmasz (2003)
 +
| Hassan and Mihalcea (2011)
 +
| Knowledge-based
 +
| 0.415
 +
| 0.536
 +
|-
 +
| C&W
 +
| Collobert and Weston (2008)
 +
| Collobert and Weston (2008)
 +
| Corpus-based
 +
| 0.5
 +
| N/A
 
|-
 
|-
 
| WikiRelate
 
| WikiRelate
 
| Strube and Ponzetto (2006)
 
| Strube and Ponzetto (2006)
| Wikipedia
+
| Strube and Ponzetto (2006)
 +
| Corpus-based
 +
| N/A
 
| 0.48
 
| 0.48
 +
|-
 +
| LSA
 +
| Landauer et al. (1997)
 +
| Hassan and Mihalcea (2011)
 +
| Corpus-based
 +
| 0.581
 +
| 0.492
 +
|-
 +
| LSA
 +
| Landauer et al. (1997)
 +
| Hassan and Mihalcea (2011)
 +
| Corpus-based
 +
| 0.581
 +
| 0.563
 
|-
 
|-
 
| simVB+simWN
 
| simVB+simWN
 +
| Finkelstein et al. (2002)
 
| Finkelstein et al. (2002)
 
| Finkelstein et al. (2002)
 
| Hybrid
 
| Hybrid
 +
| N/A
 
| 0.55
 
| 0.55
 +
|-
 +
| SSA
 +
| Hassan and Mihalcea (2011)
 +
| Hassan and Mihalcea (2011)
 +
| Knowledge-based
 +
| 0.622
 +
| 0.629
 
|-
 
|-
 
| HSMN+csmRNN
 
| HSMN+csmRNN
 +
| Luong et al. (2013)
 
| Luong et al. (2013)
 
| Luong et al. (2013)
 
| Corpus-based
 
| Corpus-based
 
| 0.65
 
| 0.65
 +
| N/A
 
|-
 
|-
| ESA-Wikipedia
+
| Multi-prototype
 +
| Huang et al. (2012)
 +
| Huang et al. (2012)
 +
| Corpus-based
 +
| 0.71
 +
| N/A
 +
|-
 +
| Multi-lingual SSA
 +
| Hassan et al. (2011)
 +
| Hassan et al. (2011)
 +
| Corpus-based
 +
| 0.713
 +
| 0.674
 +
|-
 +
| ESA
 
| Gabrilovich and Markovitch (2007)
 
| Gabrilovich and Markovitch (2007)
 +
| Gabrilovich and Markovitch (2007)
 +
| Corpus-based
 +
| 0.748
 +
| 0.503
 +
|-
 +
| TSA
 +
| Radinsky et al. (2011)
 +
| Radinsky et al. (2011)
 
| Hybrid
 
| Hybrid
| 0.75
+
| 0.80
 +
| N/A
 +
|-
 +
| CLEAR
 +
| Halawi et al. (2012)
 +
| Halawi et al. (2012)
 +
| Corpus-based
 +
| 0.81
 +
| N/A
 +
|-
 +
| Y&Q
 +
| Yih and Qazvinian (2012)
 +
| Yih and Qazvinian (2012)
 +
| Hybrid
 +
| 0.81
 +
| N/A
 +
|-
 
|}
 
|}
 
  
 
== References ==
 
== References ==
 +
 +
* '''Listed in alphabetical order.'''
  
 
Finkelstein, Lev, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. (2002) [http://www.cs.technion.ac.il/~gabr/papers/tois_context.pdf Placing Search in Context: The Concept Revisited]. ACM Transactions on Information Systems, 20(1):116-131.
 
Finkelstein, Lev, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. (2002) [http://www.cs.technion.ac.il/~gabr/papers/tois_context.pdf Placing Search in Context: The Concept Revisited]. ACM Transactions on Information Systems, 20(1):116-131.
  
Gabrilovich, Evgeniy, and Shaul Markovitch. (2007). [http://www.cs.technion.ac.il/~gabr/papers/ijcai-2007-sim.pdf Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis]. In IJCAI, vol. 7, pp. 1606-1611.
+
Gabrilovich, Evgeniy, and Shaul Markovitch, [http://www.cs.technion.ac.il/~gabr/papers/ijcai-2007-sim.pdf Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis], Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007.
 +
 
 +
Halawi, Guy, Gideon Dror, Evgeniy Gabrilovich, and Yehuda Koren. (2012). [http://gabrilovich.com/publications/papers/Halawi2012LSL.pdf Large-scale learning of word relatedness with constraints]. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1406-1414. ACM.
 +
 
 +
Hassan, Samer, and Rada Mihalcea: [http://www.samerhassan.com/images/4/48/Hassan.pdf Semantic Relatedness Using Salient Semantic Analysis]. AAAI 2011
 +
 
 +
Hirst, Graeme and David St-Onge. Lexical chains as representations of context for the detection and correction of malapropisms. In Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA, pages 305–332, 1998.
 +
 
 +
Islam, A., and Inkpen, D. 2006. [http://www.site.uottawa.ca/~mdislam/publications/LREC_06_242.pdf Second order co-occurrence pmi for determining the semantic similarity of words]. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006) 1033–1038.
 +
 
 +
Jarmasz, M. 2003. [http://www.arxiv.org/pdf/1204.0140 Roget’s thesaurus as a Lexical Resource for Natural Language Processing]. Ph.D. Dissertation, Ottawa Carleton Institute for Computer Science, School of Information Technology and Engineering, University of Ottawa.
 +
 
 +
Jiang, Jay J. and David W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics (ROCLING X), Taiwan, pages 19–33, 1997.
 +
 
 +
Landauer, T. K.; L, T. K.; Laham, D.; Rehder, B.; and Schreiner, M. E. 1997. How well can passage meaning be derived without using word order? a comparison of latent semantic analysis and humans.
 +
 
 +
Leacock, Claudia and Martin Chodorow. Combining local context and WordNet similarity for word sense identification. In Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA, pages 265–283, 1998.
 +
 
 +
Lin, Dekang. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison,WI, pages 296–304, 1998.
  
 
Luong, Minh-Thang, Richard Socher, and Christopher D. Manning. (2013). [http://nlp.stanford.edu/~lmthang/data/papers/conll13_morpho.pdf Better word representations with recursive neural networks for morphology]. CoNLL-2013: 104.
 
Luong, Minh-Thang, Richard Socher, and Christopher D. Manning. (2013). [http://nlp.stanford.edu/~lmthang/data/papers/conll13_morpho.pdf Better word representations with recursive neural networks for morphology]. CoNLL-2013: 104.
  
Strube, Michael and Simone Paolo Ponzetto. (2006). [http://www.aaai.org/Papers/AAAI/2006/AAAI06-223.pdf WikiRelate! Computing Semantic Relatedness Using Wikipedia]. Proceedings of The 21st National Conference on Artificial Intelligence (AAAI), Boston, MA.
+
Pilehvar, M.T., D. Jurgens and R. Navigli. [http://wwwusers.di.uniroma1.it/~navigli/pubs/ACL_2013_Pilehvar_Jurgens_Navigli.pdf Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity]. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, August 4-9, 2013, pp. 1341-1351.
  
 +
Radinsky, Kira, Eugene Agichtein, Evgeniy Gabrilovich, and Shaul Markovitch. (2011). [http://gabrilovich.com/publications/papers/Radinsky2011WTS.pdf A word at a time: computing word relatedness using temporal semantic analysis]. In Proceedings of the 20th international conference on World wide web, pp. 337-346. ACM.
  
== See also ==
+
Resnik, Philip. Using information content to evaluate semantic similarity. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453, Montreal, Canada, 1995.
  
* [[Similarity (State of the art)]]
+
Strube, Michael and Simone Paolo Ponzetto. (2006). [http://www.aaai.org/Papers/AAAI/2006/AAAI06-223.pdf WikiRelate! Computing Semantic Relatedness Using Wikipedia]. Proceedings of The 21st National Conference on Artificial Intelligence (AAAI), Boston, MA.
* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
+
* [[SAT Analogy Questions]]
+
* [[TOEFL Synonym Questions (State of the art)|TOEFL Synonym Questions]]
+
* [[State of the art]]
+
  
 +
Yih, W. and Qazvinian, V. (2012). [http://aclweb.org/anthology/N/N12/N12-1077.pdf Measuring Word Relatedness Using Heterogeneous Vector Space Models]. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012).
  
 
[[Category:State of the art]]
 
[[Category:State of the art]]

Latest revision as of 06:59, 29 January 2014


Table of results

Algorithm Reference for algorithm Reference for reported results Type Spearman's rho Pearson's r
L&C Leacock and Chodorow (1998) Hassan and Mihalcea (2011) Knowledge-based 0.302 0.356
WNE Jarmasz (2003) Hassan and Mihalcea (2011) Knowledge-based 0.305 0.271
J&C Jiang and Conrath 1997 Hassan and Mihalcea (2011) Knowledge-based 0.318 0.354
L&C Leacock and Chodorow (1998) Hassan and Mihalcea (2011) Knowledge-based 0.348 0.341
H&S Hirst and St-Onge (1998) Hassan and Mihalcea (2011) Knowledge-based 0.302 0.356
Lin Lin (1998) Hassan and Mihalcea (2011) Corpus-based 0.348 0.357
Resnik Resnik (1995) Hassan and Mihalcea (2011) Knowledge-based 0.353 0.365
ROGET Jarmasz (2003) Hassan and Mihalcea (2011) Knowledge-based 0.415 0.536
C&W Collobert and Weston (2008) Collobert and Weston (2008) Corpus-based 0.5 N/A
WikiRelate Strube and Ponzetto (2006) Strube and Ponzetto (2006) Corpus-based N/A 0.48
LSA Landauer et al. (1997) Hassan and Mihalcea (2011) Corpus-based 0.581 0.492
LSA Landauer et al. (1997) Hassan and Mihalcea (2011) Corpus-based 0.581 0.563
simVB+simWN Finkelstein et al. (2002) Finkelstein et al. (2002) Hybrid N/A 0.55
SSA Hassan and Mihalcea (2011) Hassan and Mihalcea (2011) Knowledge-based 0.622 0.629
HSMN+csmRNN Luong et al. (2013) Luong et al. (2013) Corpus-based 0.65 N/A
Multi-prototype Huang et al. (2012) Huang et al. (2012) Corpus-based 0.71 N/A
Multi-lingual SSA Hassan et al. (2011) Hassan et al. (2011) Corpus-based 0.713 0.674
ESA Gabrilovich and Markovitch (2007) Gabrilovich and Markovitch (2007) Corpus-based 0.748 0.503
TSA Radinsky et al. (2011) Radinsky et al. (2011) Hybrid 0.80 N/A
CLEAR Halawi et al. (2012) Halawi et al. (2012) Corpus-based 0.81 N/A
Y&Q Yih and Qazvinian (2012) Yih and Qazvinian (2012) Hybrid 0.81 N/A

References

  • Listed in alphabetical order.

Finkelstein, Lev, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. (2002) Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems, 20(1):116-131.

Gabrilovich, Evgeniy, and Shaul Markovitch, Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007.

Halawi, Guy, Gideon Dror, Evgeniy Gabrilovich, and Yehuda Koren. (2012). Large-scale learning of word relatedness with constraints. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1406-1414. ACM.

Hassan, Samer, and Rada Mihalcea: Semantic Relatedness Using Salient Semantic Analysis. AAAI 2011

Hirst, Graeme and David St-Onge. Lexical chains as representations of context for the detection and correction of malapropisms. In Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA, pages 305–332, 1998.

Islam, A., and Inkpen, D. 2006. Second order co-occurrence pmi for determining the semantic similarity of words. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006) 1033–1038.

Jarmasz, M. 2003. Roget’s thesaurus as a Lexical Resource for Natural Language Processing. Ph.D. Dissertation, Ottawa Carleton Institute for Computer Science, School of Information Technology and Engineering, University of Ottawa.

Jiang, Jay J. and David W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics (ROCLING X), Taiwan, pages 19–33, 1997.

Landauer, T. K.; L, T. K.; Laham, D.; Rehder, B.; and Schreiner, M. E. 1997. How well can passage meaning be derived without using word order? a comparison of latent semantic analysis and humans.

Leacock, Claudia and Martin Chodorow. Combining local context and WordNet similarity for word sense identification. In Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA, pages 265–283, 1998.

Lin, Dekang. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison,WI, pages 296–304, 1998.

Luong, Minh-Thang, Richard Socher, and Christopher D. Manning. (2013). Better word representations with recursive neural networks for morphology. CoNLL-2013: 104.

Pilehvar, M.T., D. Jurgens and R. Navigli. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, August 4-9, 2013, pp. 1341-1351.

Radinsky, Kira, Eugene Agichtein, Evgeniy Gabrilovich, and Shaul Markovitch. (2011). A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on World wide web, pp. 337-346. ACM.

Resnik, Philip. Using information content to evaluate semantic similarity. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453, Montreal, Canada, 1995.

Strube, Michael and Simone Paolo Ponzetto. (2006). WikiRelate! Computing Semantic Relatedness Using Wikipedia. Proceedings of The 21st National Conference on Artificial Intelligence (AAAI), Boston, MA.

Yih, W. and Qazvinian, V. (2012). Measuring Word Relatedness Using Heterogeneous Vector Space Models. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012).