TOEFL Synonym Questions (State of the art)

TOEFL = Test of English as a Foreign Language
80 multiple-choice synonym questions; 4 choices per question
TOEFL questions available from Thomas Landauer
introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between two words
subsequently used by many other researchers

Sample question

Stem:		levied
Choices:	(a)	imposed
	(b)	believed
	(c)	requested
	(d)	correlated
Solution:	(a)	imposed

Table of results

Algorithm	Reference for algorithm	Reference for experiment	Type	Correct	95% confidence
RES	Resnik (1995)	Jarmasz and Szpakowicz (2003)	Hybrid	20.31%	12.89–31.83%
LC	Leacock and Chodrow (1998)	Jarmasz and Szpakowicz (2003)	Lexicon-based	21.88%	13.91–33.21%
LIN	Lin (1998)	Jarmasz and Szpakowicz (2003)	Hybrid	24.06%	15.99–35.94%
Random	Random guessing	1 / 4 = 25.00%	Random	25.00%	15.99–35.94%
JC	Jiang and Conrath (1997)	Jarmasz and Szpakowicz (2003)	Hybrid	25.00%	15.99–35.94%
LSA	Landauer and Dumais (1997)	Landauer and Dumais (1997)	Corpus-based	64.38%	52.90–74.80%
Human	Average non-English US college applicant	Landauer and Dumais (1997)	Human	64.50%	53.01–74.88%
PMI-IR	Turney (2001)	Turney (2001)	Corpus-based	73.75%	62.71–82.96%
HSO	Hirst and St.-Onge (1998)	Jarmasz and Szpakowicz (2003)	Lexicon-based	77.91%	68.17–87.11%
JS	Jarmasz and Szpakowicz (2003)	Jarmasz and Szpakowicz (2003)	Lexicon-based	78.75%	68.17–87.11%
PMI-IR	Terra and Clarke (2003)	Terra and Clarke (2003)	Corpus-based	81.25%	70.97–89.11%
Context-Window Overlapping	Ruiz-Casado et al. (2005)	Ruiz-Casado et al. (2005)	Web-based	82.55%	72.38–90.09%
PPMIC	Bullinaria and Levy (2006)	Bullinaria and Levy (2006)	Corpus-based	85.00%	75.26-92.00%
GLSA	Matveeva et al. (2005)	Matveeva et al. (2005)	Corpus-based	86.25%	76.73-92.93%
LSA	Rapp (2003)	Rapp (2003)	Corpus-based	92.50%	84.39-97.20%
PR	Turney et al. (2003)	Turney et al. (2003)	Hybrid	97.50%	91.26–99.70%

Explanation of table

Algorithm = name of algorithm
Reference for algorithm = where to find out more about given algorithm
Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
Type = general type of algorithm: corpus-based, lexicon-based, hybrid
Correct = percent of 80 questions that given algorithm answered correctly
95% confidence = confidence interval calculated using Binomial Exact Test
table rows sorted in order of increasing percent correct
several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
LSA = Latent Semantic Analysis
PMI-IR = Pointwise Mutual Information - Information Retrieval
PR = Product Rule
PPMIC = Positive Pointwise Mutual Information with Cosine
GLSA = Generalized Latent Semantic Analysis

References

Bullinaria, J.A., and Levy, J.P. (2006). Extracting semantic representations from word co-occurrence statistics: A computational study. To appear in Behavior Research Methods, 38.

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). Generalized latent semantic analysis for term representation. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria.

Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity, Proceedings of the Ninth Machine Translation Summit, pp. 315-322.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) Using context-window overlapping in Synonym Discovery and Ontology Extension. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005), Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.

TOEFL Synonym Questions (State of the art)

Contents

Sample question

Table of results

Explanation of table

References

See also

Navigation menu

Search