TOEFL Synonym Questions (State of the art)

From ACL Wiki
Revision as of 20:29, 11 May 2007 by Pdturney (talk | contribs)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
  • TOEFL = Test of English as a Foreign Language
  • 80 multiple-choice synonym questions; 4 choices per question
  • TOEFL questions available from Thomas Landauer
  • introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
  • subsequently used by many other researchers
  • Reference for algorithm = where to find out more about given algorithm for measuring similarity
  • Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
  • Algorithm = general type of algorithm: corpus-based, lexicon-based, hybrid
  • Correct = percent of 80 questions that given algorithm answered correctly
  • 95% confidence = confidence interval calculated using Binomial Exact Test
  • table rows sorted in order of increasing percent correct


Reference for algorithm Reference for experiment Algorithm Correct 95% confidence
Resnik (1995) Jarmasz and Szpakowicz (2003) hybrid 20.31% 12.89–31.83%
Leacock and Chodrow (1998) Jarmasz and Szpakowicz (2003) lexicon-based 21.88% 13.91–33.21%
Lin (1998) Jarmasz and Szpakowicz (2003) hybrid 24.06% 15.99–35.94%
Jiang and Conrath (1997) Jarmasz and Szpakowicz (2003) hybrid 25.00% 15.99–35.94%
Landauer and Dumais (1997) Landauer and Dumais (1997) corpus-based 64.38% 52.90–74.80%
Average non-English US college applicant Landauer and Dumais (1997) human 64.50% 53.01–74.88%
Turney (2001) Turney (2001) corpus-based 73.75% 62.71–82.96%
Hirst and St.-Onge (1998) Jarmasz and Szpakowicz (2003) lexicon-based 77.91% 68.17–87.11%
Jarmasz and Szpakowicz (2003) Jarmasz and Szpakowicz (2003) lexicon-based 78.75% 68.17–87.11%
Terra and Clarke (2003) Terra and Clarke (2003) corpus-based 81.25% 70.97–89.11%
Rapp (2003) Rapp (2003) corpus-based 92.50% 84.39-97.20%
Turney et al. (2003) Turney et al. (2003) hybrid 97.50% 91.26–99.70%


Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.