TOEFL Synonym Questions (State of the art)

From ACL Wiki
Revision as of 04:21, 8 January 2008 by Edward518 (talk | contribs)

Jump to: navigation, search
  • TOEFL = Test of English as a Foreign Language
  • 80 multiple-choice synonym questions; 4 choices per question
  • TOEFL questions available from Thomas Landauer
  • introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between two words
  • subsequently used by many other researchers

Sample question

Stem: levied
Choices: (a) imposed
(b) believed
(c) requested
(d) correlated
Solution: (a) imposed

Table of results

Algorithm Reference for algorithm Reference for experiment Type Correct 95% confidence
RES Resnik (1995) Jarmasz and Szpakowicz (2003) Hybrid 20.31% 12.89–31.83%
LC Leacock and Chodrow (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 21.88% 13.91–33.21%
LIN Lin (1998) Jarmasz and Szpakowicz (2003) Hybrid 24.06% 15.99–35.94%
Random Random guessing 1 / 4 = 25.00% Random 25.00% 15.99–35.94%
JC Jiang and Conrath (1997) Jarmasz and Szpakowicz (2003) Hybrid 25.00% 15.99–35.94%
LSA Landauer and Dumais (1997) Landauer and Dumais (1997) Corpus-based 64.38% 52.90–74.80%
Human Average non-English US college applicant Landauer and Dumais (1997) Human 64.50% 53.01–74.88%
DS Pado and Lapata (2007) Pado and Lapata (2007) Corpus-based 73.00% 62.72-82.96%
PMI-IR Turney (2001) Turney (2001) Corpus-based 73.75% 62.72–82.96%
HSO Hirst and St.-Onge (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 77.91% 68.17–87.11%
JS Jarmasz and Szpakowicz (2003) Jarmasz and Szpakowicz (2003) Lexicon-based 78.75% 68.17–87.11%
PMI-IR Terra and Clarke (2003) Terra and Clarke (2003) Corpus-based 81.25% 70.97–89.11%
CWO Ruiz-Casado et al. (2005) Ruiz-Casado et al. (2005) Web-based 82.55% 72.38–90.09%
PPMIC Bullinaria and Levy (2006) Bullinaria and Levy (2006) Corpus-based 85.00% 75.26-92.00%
GLSA Matveeva et al. (2005) Matveeva et al. (2005) Corpus-based 86.25% 76.73-92.93%
LSA Rapp (2003) Rapp (2003) Corpus-based 92.50% 84.39-97.20%
PR Turney et al. (2003) Turney et al. (2003) Hybrid 97.50% 91.26–99.70%

Explanation of table

  • Algorithm = name of algorithm
  • Reference for algorithm = where to find out more about given algorithm
  • Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
  • Type = general type of algorithm: corpus-based, lexicon-based, hybrid
  • Correct = percent of 80 questions that given algorithm answered correctly
  • 95% confidence = confidence interval calculated using Binomial Exact Test
  • table rows sorted in order of increasing percent correct
  • several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
  • LSA = Latent Semantic Analysis
  • PMI-IR = Pointwise Mutual Information - Information Retrieval
  • PR = Product Rule
  • PPMIC = Positive Pointwise Mutual Information with Cosine
  • GLSA = Generalized Latent Semantic Analysis
  • CWO = Context Window Overlapping
  • DS = Dependency Space


  • the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
  • the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns
  • some of the algorithms may have been tuned on the TOEFL questions


Bullinaria, J.A., and Levy, J.P. (2006). Extracting semantic representations from word co-occurrence statistics: A computational study. To appear in Behavior Research Methods, 38.

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). Generalized latent semantic analysis for term representation. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161-199.

Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth Machine Translation Summit, pp. 315-322.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) Using context-window overlapping in Synonym Discovery and Ontology Extension. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005), Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.

See also


火车票 订火车票 北京火车票 火车票预定 火车票预订 火车票查询 北京火车票预定 北京火车票查询 北京火车票预订 火车票 订火车票 北京火车票 火车票预定 火车票预订 火车票查询 北京火车票预定 北京火车票查询 北京火车票预订 搬场 搬家 上海搬场 上海搬场公司上海搬场 搬家公司 上海搬家公司 上海搬家 婚庆 婚庆公司 婚庆网 搜索引擎优化 网络营销