Difference between revisions of "TOEFL Synonym Questions (State of the art)"

From ACL Wiki
Jump to: navigation, search
Line 1: Line 1:
 
* TOEFL = Test of English as a Foreign Language
 
* TOEFL = Test of English as a Foreign Language
 
* 80 multiple-choice synonym questions; 4 choices per question
 
* 80 multiple-choice synonym questions; 4 choices per question
* introduced in Landauer and Dumais (1997)
+
* TOEFL questions available from [http://www.pearsonkt.com/bioLandauer.shtml Thomas Landauer]
 +
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
 
* subsequently used by many other researchers
 
* subsequently used by many other researchers
* 95% confidence interval calculated using [http://home.clara.net/sisa/onemean.htm Binomial Exact Test]
+
* '''Reference for algorithm''' = where to find out more about given algorithm for measuring similarity
 +
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
 +
* '''Algorithm''' = general type of algorithm: corpus-based, lexicon-based, hybrid
 +
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
 +
* '''95% confidence''' = confidence interval calculated using [http://home.clara.net/sisa/onemean.htm Binomial Exact Test]
 
* table rows sorted in order of increasing percent correct
 
* table rows sorted in order of increasing percent correct
  
Line 90: Line 95:
  
  
Jarmasz, M. and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', pp. 212-219.
+
Jarmasz, M., and Szpakowicz, S. (2003). [http://www.site.uottawa.ca/~mjarmasz/pubs/jarmasz_roget_sim.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.

Revision as of 19:29, 11 May 2007

  • TOEFL = Test of English as a Foreign Language
  • 80 multiple-choice synonym questions; 4 choices per question
  • TOEFL questions available from Thomas Landauer
  • introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
  • subsequently used by many other researchers
  • Reference for algorithm = where to find out more about given algorithm for measuring similarity
  • Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
  • Algorithm = general type of algorithm: corpus-based, lexicon-based, hybrid
  • Correct = percent of 80 questions that given algorithm answered correctly
  • 95% confidence = confidence interval calculated using Binomial Exact Test
  • table rows sorted in order of increasing percent correct


Reference for algorithm Reference for experiment Algorithm Correct 95% confidence
Resnik (1995) Jarmasz and Szpakowicz (2003) hybrid 20.31% 12.89–31.83%
Leacock and Chodrow (1998) Jarmasz and Szpakowicz (2003) lexicon-based 21.88% 13.91–33.21%
Lin (1998) Jarmasz and Szpakowicz (2003) hybrid 24.06% 15.99–35.94%
Jiang and Conrath (1997) Jarmasz and Szpakowicz (2003) hybrid 25.00% 15.99–35.94%
Landauer and Dumais (1997) Landauer and Dumais (1997) corpus-based 64.38% 52.90–74.80%
Average non-English US college applicant Landauer and Dumais (1997) human 64.50% 53.01–74.88%
Turney (2001) Turney (2001) corpus-based 73.75% 62.71–82.96%
Hirst and St.-Onge (1998) Jarmasz and Szpakowicz (2003) lexicon-based 77.91% 68.17–87.11%
Jarmasz and Szpakowicz (2003) Jarmasz and Szpakowicz (2003) lexicon-based 78.75% 68.17–87.11%
Terra and Clarke (2003) Terra and Clarke (2003) corpus-based 81.25% 70.97–89.11%
Rapp (2003) Rapp (2003) corpus-based 92.50% 84.39-97.20%
Turney et al. (2003) Turney et al. (2003) hybrid 97.50% 91.26–99.70%


Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.