Difference between revisions of "TOEFL Synonym Questions (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
m
 
(48 intermediate revisions by 9 users not shown)
Line 1: Line 1:
 +
* '''the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder]''', the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
 
* TOEFL = Test of English as a Foreign Language
 
* TOEFL = Test of English as a Foreign Language
 
* 80 multiple-choice synonym questions; 4 choices per question
 
* 80 multiple-choice synonym questions; 4 choices per question
* TOEFL questions available from [http://www.pearsonkt.com/bioLandauer.shtml Thomas Landauer]
+
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between two words
 
 
* subsequently used by many other researchers
 
* subsequently used by many other researchers
 +
* see also: [[Similarity (State of the art)]]
  
  
Line 35: Line 36:
 
|-
 
|-
 
|}
 
|}
 +
  
 
== Table of results ==
 
== Table of results ==
 
  
 
{| border="1" cellpadding="5" cellspacing="1" width="100%"
 
{| border="1" cellpadding="5" cellspacing="1" width="100%"
Line 96: Line 97:
 
| 64.50%
 
| 64.50%
 
| 53.01–74.88%
 
| 53.01–74.88%
 +
|-
 +
| RI
 +
| Karlgren and Sahlgren (2001)
 +
| Karlgren and Sahlgren (2001)
 +
| Corpus-based
 +
| 72.50%
 +
| 61.38-81.90%
 +
|-
 +
| DS
 +
| Pado and Lapata (2007)
 +
| Pado and Lapata (2007)
 +
| Corpus-based
 +
| 73.00%
 +
| 62.72-82.96%
 
|-
 
|-
 
| PMI-IR
 
| PMI-IR
Line 102: Line 117:
 
| Corpus-based
 
| Corpus-based
 
| 73.75%
 
| 73.75%
| 62.71–82.96%
+
| 62.72–82.96%
 +
|-
 +
| PairClass
 +
| Turney (2008)
 +
| Turney (2008)
 +
| Corpus-based
 +
| 76.25%
 +
| 65.42-85.06%
 
|-
 
|-
 
| HSO
 
| HSO
Line 117: Line 139:
 
| 78.75%
 
| 78.75%
 
| 68.17–87.11%
 
| 68.17–87.11%
 +
|-
 +
| Sa18
 +
| Salle et al. (2018)
 +
| Dobó (2019)
 +
| Corpus-based
 +
| 80.00%
 +
| 69.56–88.11%
 
|-
 
|-
 
| PMI-IR
 
| PMI-IR
Line 124: Line 153:
 
| 81.25%
 
| 81.25%
 
| 70.97–89.11%
 
| 70.97–89.11%
 +
|-
 +
| LC-IR
 +
| Higgins (2005)
 +
| Higgins (2005)
 +
| Web-based
 +
| 81.25%
 +
| 70.97–89.11%
 +
|-
 +
| Do19-corpus
 +
| Dobó (2019)
 +
| Dobó (2019)
 +
| Corpus-based
 +
| 81.25%
 +
| 70.97–89.11%
 +
|-
 +
| CWO
 +
| Ruiz-Casado et al. (2005)
 +
| Ruiz-Casado et al. (2005)
 +
| Web-based
 +
| 82.55%
 +
| 72.38–90.09%
 
|-
 
|-
 
| PPMIC
 
| PPMIC
| Bullinaria and Levy (2006)
+
| Bullinaria and Levy (2007)
| Bullinaria and Levy (2006)
+
| Bullinaria and Levy (2007)
 
| Corpus-based
 
| Corpus-based
 
| 85.00%
 
| 85.00%
Line 138: Line 188:
 
| 86.25%
 
| 86.25%
 
| 76.73-92.93%
 
| 76.73-92.93%
 +
|-
 +
| SR
 +
| Tsatsaronis et al. (2010)
 +
| Tsatsaronis et al. (2010)
 +
| Lexicon-based
 +
| 87.50%
 +
| 78.21-93.84%
 +
|-
 +
| DC13
 +
| Dobó and Csirik (2013)
 +
| Dobó and Csirik (2013)
 +
| Corpus-based
 +
| 88.75%
 +
| 79.72-94.72%
 +
|-
 +
| Pe14
 +
| Pennington et al. (2014)
 +
| Dobó (2019)
 +
| Corpus-based
 +
| 90.00%
 +
| 81.24-95.58%
 
|-
 
|-
 
| LSA
 
| LSA
Line 145: Line 216:
 
| 92.50%
 
| 92.50%
 
| 84.39-97.20%
 
| 84.39-97.20%
 +
|-
 +
| LSA
 +
| Han (2014)
 +
| Han (2014)
 +
| Hybrid
 +
| 95.0%
 +
| 87.69-98.62%
 +
|-
 +
| ADW
 +
| Pilehvar et al. (2013)
 +
| Pilehvar et al. (2013)
 +
| WordNet graph-based (unsupervised)
 +
| 96.25%
 +
| 89.43-99.22%
 
|-
 
|-
 
| PR
 
| PR
Line 153: Line 238:
 
| 91.26–99.70%
 
| 91.26–99.70%
 
|-
 
|-
 +
| Sp19
 +
| Speer et al. (2017)
 +
| Dobó (2019)
 +
| Hybrid
 +
| 98.75%
 +
| 93.23–99.97%
 +
|-
 +
| Do19-hybrid
 +
| Dobó (2019)
 +
| Dobó (2019)
 +
| Hybrid
 +
| 98.75%
 +
| 93.23–99.97%
 +
|-
 +
| PCCP
 +
| Bullinaria and Levy (2012)
 +
| Bullinaria and Levy (2012)
 +
| Corpus-based
 +
| 100.00%
 +
| 96.32-100.00%
 
|}
 
|}
 
  
 
== Explanation of table ==
 
== Explanation of table ==
Line 163: Line 267:
 
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
 
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
 
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
 
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using [http://home.clara.net/sisa/onemean.htm Binomial Exact Test]
+
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
 
* table rows sorted in order of increasing percent correct
 
* table rows sorted in order of increasing percent correct
 
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
 
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
 
* LSA = Latent Semantic Analysis
 
* LSA = Latent Semantic Analysis
 +
* PCCP = Principal Component vectors with Caron P
 
* PMI-IR = Pointwise Mutual Information - Information Retrieval
 
* PMI-IR = Pointwise Mutual Information - Information Retrieval
 
* PR = Product Rule
 
* PR = Product Rule
 
* PPMIC = Positive Pointwise Mutual Information with Cosine
 
* PPMIC = Positive Pointwise Mutual Information with Cosine
 
* GLSA = Generalized Latent Semantic Analysis
 
* GLSA = Generalized Latent Semantic Analysis
 +
* CWO = Context Window Overlapping
 +
* DS = Dependency Space
 +
* RI = Random Indexing
 +
 +
== Notes ==
  
 +
* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
 +
* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
 +
* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
 +
* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct
  
 
== References ==
 
== References ==
  
Bullinaria, J.A., and Levy, J.P. (2006). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. To appear in ''Behavior Research Methods'', 38.
+
Bullinaria, J.A., and Levy, J.P. (2007). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. ''Behavior Research Methods'', 39(3), 510-526.
 +
 
 +
Bullinaria, J.A., and Levy, J.P. (2012). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.9582&rep=rep1&type=pdf Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD]. ''Behavior Research Methods'',  44(3):890-907.
 +
 
 +
Dobó, A. (2019). [http://doktori.bibl.u-szeged.hu/10120/1/AndrasDoboThesis2019.pdf A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages]. University of Szeged. [https://github.com/doboandras/dsm-parameter-analysis GitHub repository]
 +
 
 +
Dobó, A., and Csirik, J. (2013). [http://link.springer.com/chapter/10.1007/978-3-642-35843-2_42 Computing semantic similarity using large static corpora]. In: van Emde Boas, P. et al. (eds.) ''SOFSEM 2013: Theory and Practice of Computer Science. LNCS, Vol. 7741''. Springer-Verlag, Berlin Heidelberg, pp. 491-502
 +
 
 +
Lushan Han. (2014). [http://ebiquity.umbc.edu/paper/html/id/658/Schema-Free-Querying-of-Semantic-Data Schema Free Querying of Semantic Data], Ph.D. dissertation, University of Maryland, Baltimore County, Baltimore, MD USA.
 +
 
 +
Higgins, D. (2005). [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.1517 Which Statistics Reflect Semantics? Rethinking Synonymy and Word Similarity.] In: Kepser, S., Reis, M. (eds.) ''Linguistic Evidence: Empirical, Theoretical and Computational Perspectives''. Mouton de Gruyter, Berlin, pp. 265–284.
  
 
Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
 
Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
  
Jarmasz, M., and Szpakowicz, S. (2003). [http://www.site.uottawa.ca/~mjarmasz/pubs/jarmasz_roget_sim.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.
+
Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.
  
 
Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
 
Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
 +
 +
Karlgren, J. and Sahlgren, M. (2001). [http://www.sics.se/~jussi/Artiklar/2001_RWIbook/KarlgrenSahlgren2001.pdf From Words to Understanding]. In Uesaka, Y., Kanerva, P., & Asoh, H. (Eds.), ''Foundations of Real-World Intelligence'', Stanford: CSLI Publications, pp. 294–308.
  
 
Landauer, T.K., and Dumais, S.T. (1997). [http://lsa.colorado.edu/papers/plato/plato.annote.html A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge]. ''Psychological Review'', 104(2):211–240.
 
Landauer, T.K., and Dumais, S.T. (1997). [http://lsa.colorado.edu/papers/plato/plato.annote.html A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge]. ''Psychological Review'', 104(2):211–240.
  
Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.
+
Leacock, C., and Chodorow, M. (1998). [http://books.google.ca/books?id=Rehu8OOzMIMC&lpg=PA265&ots=IpnaLkZUec&lr&pg=PA265#v=onepage&q&f=false Combining local context and WordNet similarity for word sense identification]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.
  
 
Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.
 
Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.
Line 191: Line 317:
 
Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.
 
Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.
  
Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity], ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.
+
Pado, S., and Lapata, M. (2007). [http://www.nlpado.de/~sebastian/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.
 +
 
 +
Pennington, J., Socher, R., and Manning, C. (2014). [https://www.aclweb.org/anthology/D14-1162 Glove: Global vectors for word representation]. ''EMNLP 2014'', pp. 1532-1543.
 +
 
 +
Pilehvar, M.T., Jurgens D., and Navigli R. (2013). [http://wwwusers.di.uniroma1.it/~navigli/pubs/ACL_2013_Pilehvar_Jurgens_Navigli.pdf Align, disambiguate and walk: A unified approach for measuring semantic similarity]. ''Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013),'' Sofia, Bulgaria.
 +
 
 +
Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.
  
 
Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.
 
Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.
 +
 +
Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.
 +
 +
Salle A., Idiart M., and Villavicencio A. (2018) [https://github.com/alexandres/lexvec/blob/master/README.md LexVec]
 +
 +
Speer, R., Chin, J., and Havasi, C. (2017). [https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14972/14051 Conceptnet 5.5: An open multilingual graph of general knowledge]. ''AAAI-17'', pp. 4444-4451.
  
 
Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.
 
Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.
 +
 +
Tsatsaronis, G., Varlamis, I., and Vazirgiannis, M. (2010). [http://arxiv.org/abs/1401.5699 Text Relatedness Based on a Word Thesaurus]. ''Journal of Artificial Intelligence Research'' 37, 1–39
  
 
Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.
 
Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.
Line 201: Line 341:
 
Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.
 
Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.
  
 
+
Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.
== See also ==
 
 
 
* [[Attributional and Relational Similarity (State of the art)]]
 
* [[SAT Analogy Questions]]
 
* [[State of the art]]
 
  
  
 
[[Category:State of the art]]
 
[[Category:State of the art]]
 +
[[Category:Similarity]]

Latest revision as of 17:32, 15 September 2019

  • the TOEFL questions are available on request by contacting LSA Support at CU Boulder, the people who manage the LSA web site at Colorado
  • TOEFL = Test of English as a Foreign Language
  • 80 multiple-choice synonym questions; 4 choices per question
  • introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
  • subsequently used by many other researchers
  • see also: Similarity (State of the art)


Sample question

Stem: levied
Choices: (a) imposed
(b) believed
(c) requested
(d) correlated
Solution: (a) imposed


Table of results

Algorithm Reference for algorithm Reference for experiment Type Correct 95% confidence
RES Resnik (1995) Jarmasz and Szpakowicz (2003) Hybrid 20.31% 12.89–31.83%
LC Leacock and Chodrow (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 21.88% 13.91–33.21%
LIN Lin (1998) Jarmasz and Szpakowicz (2003) Hybrid 24.06% 15.99–35.94%
Random Random guessing 1 / 4 = 25.00% Random 25.00% 15.99–35.94%
JC Jiang and Conrath (1997) Jarmasz and Szpakowicz (2003) Hybrid 25.00% 15.99–35.94%
LSA Landauer and Dumais (1997) Landauer and Dumais (1997) Corpus-based 64.38% 52.90–74.80%
Human Average non-English US college applicant Landauer and Dumais (1997) Human 64.50% 53.01–74.88%
RI Karlgren and Sahlgren (2001) Karlgren and Sahlgren (2001) Corpus-based 72.50% 61.38-81.90%
DS Pado and Lapata (2007) Pado and Lapata (2007) Corpus-based 73.00% 62.72-82.96%
PMI-IR Turney (2001) Turney (2001) Corpus-based 73.75% 62.72–82.96%
PairClass Turney (2008) Turney (2008) Corpus-based 76.25% 65.42-85.06%
HSO Hirst and St.-Onge (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 77.91% 68.17–87.11%
JS Jarmasz and Szpakowicz (2003) Jarmasz and Szpakowicz (2003) Lexicon-based 78.75% 68.17–87.11%
Sa18 Salle et al. (2018) Dobó (2019) Corpus-based 80.00% 69.56–88.11%
PMI-IR Terra and Clarke (2003) Terra and Clarke (2003) Corpus-based 81.25% 70.97–89.11%
LC-IR Higgins (2005) Higgins (2005) Web-based 81.25% 70.97–89.11%
Do19-corpus Dobó (2019) Dobó (2019) Corpus-based 81.25% 70.97–89.11%
CWO Ruiz-Casado et al. (2005) Ruiz-Casado et al. (2005) Web-based 82.55% 72.38–90.09%
PPMIC Bullinaria and Levy (2007) Bullinaria and Levy (2007) Corpus-based 85.00% 75.26-92.00%
GLSA Matveeva et al. (2005) Matveeva et al. (2005) Corpus-based 86.25% 76.73-92.93%
SR Tsatsaronis et al. (2010) Tsatsaronis et al. (2010) Lexicon-based 87.50% 78.21-93.84%
DC13 Dobó and Csirik (2013) Dobó and Csirik (2013) Corpus-based 88.75% 79.72-94.72%
Pe14 Pennington et al. (2014) Dobó (2019) Corpus-based 90.00% 81.24-95.58%
LSA Rapp (2003) Rapp (2003) Corpus-based 92.50% 84.39-97.20%
LSA Han (2014) Han (2014) Hybrid 95.0% 87.69-98.62%
ADW Pilehvar et al. (2013) Pilehvar et al. (2013) WordNet graph-based (unsupervised) 96.25% 89.43-99.22%
PR Turney et al. (2003) Turney et al. (2003) Hybrid 97.50% 91.26–99.70%
Sp19 Speer et al. (2017) Dobó (2019) Hybrid 98.75% 93.23–99.97%
Do19-hybrid Dobó (2019) Dobó (2019) Hybrid 98.75% 93.23–99.97%
PCCP Bullinaria and Levy (2012) Bullinaria and Levy (2012) Corpus-based 100.00% 96.32-100.00%

Explanation of table

  • Algorithm = name of algorithm
  • Reference for algorithm = where to find out more about given algorithm
  • Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
  • Type = general type of algorithm: corpus-based, lexicon-based, hybrid
  • Correct = percent of 80 questions that given algorithm answered correctly
  • 95% confidence = confidence interval calculated using the Binomial Exact Test
  • table rows sorted in order of increasing percent correct
  • several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
  • LSA = Latent Semantic Analysis
  • PCCP = Principal Component vectors with Caron P
  • PMI-IR = Pointwise Mutual Information - Information Retrieval
  • PR = Product Rule
  • PPMIC = Positive Pointwise Mutual Information with Cosine
  • GLSA = Generalized Latent Semantic Analysis
  • CWO = Context Window Overlapping
  • DS = Dependency Space
  • RI = Random Indexing

Notes

  • the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
  • the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
  • some of the algorithms may have been tuned on the TOEFL questions; read the references for details
  • Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

References

Bullinaria, J.A., and Levy, J.P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3), 510-526.

Bullinaria, J.A., and Levy, J.P. (2012). Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behavior Research Methods, 44(3):890-907.

Dobó, A. (2019). A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages. University of Szeged. GitHub repository

Dobó, A., and Csirik, J. (2013). Computing semantic similarity using large static corpora. In: van Emde Boas, P. et al. (eds.) SOFSEM 2013: Theory and Practice of Computer Science. LNCS, Vol. 7741. Springer-Verlag, Berlin Heidelberg, pp. 491-502

Lushan Han. (2014). Schema Free Querying of Semantic Data, Ph.D. dissertation, University of Maryland, Baltimore County, Baltimore, MD USA.

Higgins, D. (2005). Which Statistics Reflect Semantics? Rethinking Synonymy and Word Similarity. In: Kepser, S., Reis, M. (eds.) Linguistic Evidence: Empirical, Theoretical and Computational Perspectives. Mouton de Gruyter, Berlin, pp. 265–284.

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Karlgren, J. and Sahlgren, M. (2001). From Words to Understanding. In Uesaka, Y., Kanerva, P., & Asoh, H. (Eds.), Foundations of Real-World Intelligence, Stanford: CSLI Publications, pp. 294–308.

Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). Generalized latent semantic analysis for term representation. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161-199.

Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. EMNLP 2014, pp. 1532-1543.

Pilehvar, M.T., Jurgens D., and Navigli R. (2013). Align, disambiguate and walk: A unified approach for measuring semantic similarity. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria.

Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth Machine Translation Summit, pp. 315-322.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) Using context-window overlapping in Synonym Discovery and Ontology Extension. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005), Borovets, Bulgaria.

Salle A., Idiart M., and Villavicencio A. (2018) LexVec

Speer, R., Chin, J., and Havasi, C. (2017). Conceptnet 5.5: An open multilingual graph of general knowledge. AAAI-17, pp. 4444-4451.

Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.

Tsatsaronis, G., Varlamis, I., and Vazirgiannis, M. (2010). Text Relatedness Based on a Word Thesaurus. Journal of Artificial Intelligence Research 37, 1–39

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). A uniform approach to analogies, synonyms, antonyms, and associations. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.