TOEFL Synonym Questions (State of the art)
- TOEFL = Test of English as a Foreign Language
- 80 multiple-choice synonym questions; 4 choices per question
- TOEFL questions available from Thomas Landauer
- introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between two words
- subsequently used by many other researchers
- see also SAT Analogy Questions
- see also State of the art
Table of results
|Algorithm||Reference for algorithm||Reference for experiment||Type||Correct||95% confidence|
|RES||Resnik (1995)||Jarmasz and Szpakowicz (2003)||Hybrid||20.31%||12.89–31.83%|
|LC||Leacock and Chodrow (1998)||Jarmasz and Szpakowicz (2003)||Lexicon-based||21.88%||13.91–33.21%|
|LIN||Lin (1998)||Jarmasz and Szpakowicz (2003)||Hybrid||24.06%||15.99–35.94%|
|Random||Random guessing||1 / 4 = 25.00%||Random||25.00%||15.99–35.94%|
|JC||Jiang and Conrath (1997)||Jarmasz and Szpakowicz (2003)||Hybrid||25.00%||15.99–35.94%|
|LSA||Landauer and Dumais (1997)||Landauer and Dumais (1997)||Corpus-based||64.38%||52.90–74.80%|
|Human||Average non-English US college applicant||Landauer and Dumais (1997)||Human||64.50%||53.01–74.88%|
|PMI-IR||Turney (2001)||Turney (2001)||Corpus-based||73.75%||62.71–82.96%|
|HSO||Hirst and St.-Onge (1998)||Jarmasz and Szpakowicz (2003)||Lexicon-based||77.91%||68.17–87.11%|
|JS||Jarmasz and Szpakowicz (2003)||Jarmasz and Szpakowicz (2003)||Lexicon-based||78.75%||68.17–87.11%|
|PMI-IR||Terra and Clarke (2003)||Terra and Clarke (2003)||Corpus-based||81.25%||70.97–89.11%|
|PPMIC||Bullinaria and Levy (2006)||Bullinaria and Levy (2006)||Corpus-based||85.00%||75.26-92.00%|
|GLSA||Matveeva et al. (2005)||Matveeva et al. (2005)||Corpus-based||86.25%||76.73-92.93%|
|LSA||Rapp (2003)||Rapp (2003)||Corpus-based||92.50%||84.39-97.20%|
|PR||Turney et al. (2003)||Turney et al. (2003)||Hybrid||97.50%||91.26–99.70%|
Explanation of Table
- Algorithm = name of algorithm
- Reference for algorithm = where to find out more about given algorithm
- Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
- Type = general type of algorithm: corpus-based, lexicon-based, hybrid
- Correct = percent of 80 questions that given algorithm answered correctly
- 95% confidence = confidence interval calculated using Binomial Exact Test
- table rows sorted in order of increasing percent correct
- several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
- LSA = Latent Semantic Analysis
- PMI-IR = Pointwise Mutual Information - Information Retrieval
- PR = Product Rule
- PPMIC = Positive Pointwise Mutual Information with Cosine
- GLSA = Generalized Latent Semantic Analysis
Bullinaria, J.A., and Levy, J.P. (2006). Extracting semantic representations from word co-occurrence statistics: A computational study. To appear in Behavior Research Methods, 38.
Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.
Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.
Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.
Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.
Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.
Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.
Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). Generalized latent semantic analysis for term representation. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria.
Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity, Proceedings of the Ninth Machine Translation Summit, pp. 315-322.
Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.
Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.
Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.
Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.