TOEFL Synonym Questions (State of the art)
- TOEFL = Test of English as a Foreign Language
- 80 multiple-choice synonym questions; 4 choices per question
- TOEFL questions available from Thomas Landauer
- introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
- subsequently used by many other researchers
- Reference for algorithm = where to find out more about given algorithm for measuring similarity
- Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
- Algorithm = general type of algorithm: corpus-based, lexicon-based, hybrid
- Correct = percent of 80 questions that given algorithm answered correctly
- 95% confidence = confidence interval calculated using Binomial Exact Test
- table rows sorted in order of increasing percent correct
Reference for algorithm | Reference for experiment | Algorithm | Correct | 95% confidence |
---|---|---|---|---|
Resnik (1995) | Jarmasz and Szpakowicz (2003) | hybrid | 20.31% | 12.89–31.83% |
Leacock and Chodrow (1998) | Jarmasz and Szpakowicz (2003) | lexicon-based | 21.88% | 13.91–33.21% |
Lin (1998) | Jarmasz and Szpakowicz (2003) | hybrid | 24.06% | 15.99–35.94% |
Jiang and Conrath (1997) | Jarmasz and Szpakowicz (2003) | hybrid | 25.00% | 15.99–35.94% |
Landauer and Dumais (1997) | Landauer and Dumais (1997) | corpus-based | 64.38% | 52.90–74.80% |
Average non-English US college applicant | Landauer and Dumais (1997) | human | 64.50% | 53.01–74.88% |
Turney (2001) | Turney (2001) | corpus-based | 73.75% | 62.71–82.96% |
Hirst and St.-Onge (1998) | Jarmasz and Szpakowicz (2003) | lexicon-based | 77.91% | 68.17–87.11% |
Jarmasz and Szpakowicz (2003) | Jarmasz and Szpakowicz (2003) | lexicon-based | 78.75% | 68.17–87.11% |
Terra and Clarke (2003) | Terra and Clarke (2003) | corpus-based | 81.25% | 70.97–89.11% |
Rapp (2003) | Rapp (2003) | corpus-based | 92.50% | 84.39-97.20% |
Turney et al. (2003) | Turney et al. (2003) | hybrid | 97.50% | 91.26–99.70% |
Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.
Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.
Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.
Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.
Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.
Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.