Difference between revisions of "TOEFL Synonym Questions (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
Line 29: Line 29:
 
| Resnik (1995)
 
| Resnik (1995)
 
| Jarmasz and Szpakowicz (2003)
 
| Jarmasz and Szpakowicz (2003)
| hybrid
+
| Hybrid
 
| 20.31%
 
| 20.31%
 
| 12.89–31.83%
 
| 12.89–31.83%
Line 36: Line 36:
 
| Leacock and Chodrow (1998)
 
| Leacock and Chodrow (1998)
 
| Jarmasz and Szpakowicz (2003)
 
| Jarmasz and Szpakowicz (2003)
| lexicon-based
+
| Lexicon-based
 
| 21.88%
 
| 21.88%
 
| 13.91–33.21%
 
| 13.91–33.21%
Line 43: Line 43:
 
| Lin (1998)
 
| Lin (1998)
 
| Jarmasz and Szpakowicz (2003)
 
| Jarmasz and Szpakowicz (2003)
| hybrid
+
| Hybrid
 
| 24.06%
 
| 24.06%
 +
| 15.99–35.94%
 +
|-
 +
| Random
 +
| Random guessing
 +
|
 +
| Random
 +
| 25.00%
 
| 15.99–35.94%
 
| 15.99–35.94%
 
|-
 
|-
Line 50: Line 57:
 
| Jiang and Conrath (1997)
 
| Jiang and Conrath (1997)
 
| Jarmasz and Szpakowicz (2003)
 
| Jarmasz and Szpakowicz (2003)
| hybrid
+
| Hybrid
 
| 25.00%
 
| 25.00%
 
| 15.99–35.94%
 
| 15.99–35.94%
Line 57: Line 64:
 
| Landauer and Dumais (1997)
 
| Landauer and Dumais (1997)
 
| Landauer and Dumais (1997)
 
| Landauer and Dumais (1997)
| corpus-based
+
| Corpus-based
 
| 64.38%
 
| 64.38%
 
| 52.90–74.80%
 
| 52.90–74.80%
 
|-
 
|-
|
+
| Human
 
| Average non-English US college applicant
 
| Average non-English US college applicant
 
| Landauer and Dumais (1997)
 
| Landauer and Dumais (1997)
| human
+
| Human
 
| 64.50%
 
| 64.50%
 
| 53.01–74.88%
 
| 53.01–74.88%
Line 71: Line 78:
 
| Turney (2001)
 
| Turney (2001)
 
| Turney (2001)
 
| Turney (2001)
| corpus-based
+
| Corpus-based
 
| 73.75%
 
| 73.75%
 
| 62.71–82.96%
 
| 62.71–82.96%
Line 78: Line 85:
 
| Hirst and St.-Onge (1998)
 
| Hirst and St.-Onge (1998)
 
| Jarmasz and Szpakowicz (2003)
 
| Jarmasz and Szpakowicz (2003)
| lexicon-based
+
| Lexicon-based
 
| 77.91%
 
| 77.91%
 
| 68.17–87.11%
 
| 68.17–87.11%
Line 85: Line 92:
 
| Jarmasz and Szpakowicz (2003)
 
| Jarmasz and Szpakowicz (2003)
 
| Jarmasz and Szpakowicz (2003)
 
| Jarmasz and Szpakowicz (2003)
| lexicon-based
+
| Lexicon-based
 
| 78.75%
 
| 78.75%
 
| 68.17–87.11%
 
| 68.17–87.11%
Line 92: Line 99:
 
| Terra and Clarke (2003)
 
| Terra and Clarke (2003)
 
| Terra and Clarke (2003)
 
| Terra and Clarke (2003)
| corpus-based
+
| Corpus-based
 
| 81.25%
 
| 81.25%
 
| 70.97–89.11%
 
| 70.97–89.11%
Line 99: Line 106:
 
| Rapp (2003)
 
| Rapp (2003)
 
| Rapp (2003)
 
| Rapp (2003)
| corpus-based
+
| Corpus-based
 
| 92.50%
 
| 92.50%
 
| 84.39-97.20%
 
| 84.39-97.20%
Line 106: Line 113:
 
| Turney et al. (2003)
 
| Turney et al. (2003)
 
| Turney et al. (2003)
 
| Turney et al. (2003)
| hybrid
+
| Hybrid
 
| 97.50%
 
| 97.50%
 
| 91.26–99.70%
 
| 91.26–99.70%

Revision as of 05:25, 13 May 2007

  • TOEFL = Test of English as a Foreign Language
  • 80 multiple-choice synonym questions; 4 choices per question
  • TOEFL questions available from Thomas Landauer
  • introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
  • subsequently used by many other researchers
  • Algorithm = name of algorithm
  • Reference for algorithm = where to find out more about given algorithm for measuring similarity
  • Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
  • Type = general type of algorithm: corpus-based, lexicon-based, hybrid
  • Correct = percent of 80 questions that given algorithm answered correctly
  • 95% confidence = confidence interval calculated using Binomial Exact Test
  • table rows sorted in order of increasing percent correct
  • several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
  • LSA = Latent Semantic Analysis
  • PMI-IR = Pointwise Mutual Information - Information Retrieval
  • PR = Product Rule


Algorithm Reference for algorithm Reference for experiment Type Correct 95% confidence
RES Resnik (1995) Jarmasz and Szpakowicz (2003) Hybrid 20.31% 12.89–31.83%
LC Leacock and Chodrow (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 21.88% 13.91–33.21%
LIN Lin (1998) Jarmasz and Szpakowicz (2003) Hybrid 24.06% 15.99–35.94%
Random Random guessing Random 25.00% 15.99–35.94%
JC Jiang and Conrath (1997) Jarmasz and Szpakowicz (2003) Hybrid 25.00% 15.99–35.94%
LSA Landauer and Dumais (1997) Landauer and Dumais (1997) Corpus-based 64.38% 52.90–74.80%
Human Average non-English US college applicant Landauer and Dumais (1997) Human 64.50% 53.01–74.88%
PMI-IR Turney (2001) Turney (2001) Corpus-based 73.75% 62.71–82.96%
HSO Hirst and St.-Onge (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 77.91% 68.17–87.11%
JS Jarmasz and Szpakowicz (2003) Jarmasz and Szpakowicz (2003) Lexicon-based 78.75% 68.17–87.11%
PMI-IR Terra and Clarke (2003) Terra and Clarke (2003) Corpus-based 81.25% 70.97–89.11%
LSA Rapp (2003) Rapp (2003) Corpus-based 92.50% 84.39-97.20%
PR Turney et al. (2003) Turney et al. (2003) Hybrid 97.50% 91.26–99.70%


Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity, Proceedings of the Ninth Machine Translation Summit, pp. 315-322.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.