Difference between revisions of "TOEFL Synonym Questions (State of the art)"

Revision as of 06:25, 13 May 2007

TOEFL = Test of English as a Foreign Language
80 multiple-choice synonym questions; 4 choices per question
TOEFL questions available from Thomas Landauer
introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
subsequently used by many other researchers
Algorithm = name of algorithm
Reference for algorithm = where to find out more about given algorithm for measuring similarity
Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
Type = general type of algorithm: corpus-based, lexicon-based, hybrid
Correct = percent of 80 questions that given algorithm answered correctly
95% confidence = confidence interval calculated using Binomial Exact Test
table rows sorted in order of increasing percent correct
several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
LSA = Latent Semantic Analysis
PMI-IR = Pointwise Mutual Information - Information Retrieval
PR = Product Rule

Algorithm	Reference for algorithm	Reference for experiment	Type	Correct	95% confidence
RES	Resnik (1995)	Jarmasz and Szpakowicz (2003)	Hybrid	20.31%	12.89–31.83%
LC	Leacock and Chodrow (1998)	Jarmasz and Szpakowicz (2003)	Lexicon-based	21.88%	13.91–33.21%
LIN	Lin (1998)	Jarmasz and Szpakowicz (2003)	Hybrid	24.06%	15.99–35.94%
Random	Random guessing		Random	25.00%	15.99–35.94%
JC	Jiang and Conrath (1997)	Jarmasz and Szpakowicz (2003)	Hybrid	25.00%	15.99–35.94%
LSA	Landauer and Dumais (1997)	Landauer and Dumais (1997)	Corpus-based	64.38%	52.90–74.80%
Human	Average non-English US college applicant	Landauer and Dumais (1997)	Human	64.50%	53.01–74.88%
PMI-IR	Turney (2001)	Turney (2001)	Corpus-based	73.75%	62.71–82.96%
HSO	Hirst and St.-Onge (1998)	Jarmasz and Szpakowicz (2003)	Lexicon-based	77.91%	68.17–87.11%
JS	Jarmasz and Szpakowicz (2003)	Jarmasz and Szpakowicz (2003)	Lexicon-based	78.75%	68.17–87.11%
PMI-IR	Terra and Clarke (2003)	Terra and Clarke (2003)	Corpus-based	81.25%	70.97–89.11%
LSA	Rapp (2003)	Rapp (2003)	Corpus-based	92.50%	84.39-97.20%
PR	Turney et al. (2003)	Turney et al. (2003)	Hybrid	97.50%	91.26–99.70%

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity, Proceedings of the Ninth Machine Translation Summit, pp. 315-322.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.

@@ Line 29: / Line 29: @@
 | Resnik (1995)
 | Jarmasz and Szpakowicz (2003)
-| hybrid
+| Hybrid
 | 20.31%
 | 12.89–31.83%
@@ Line 36: / Line 36: @@
 | Leacock and Chodrow (1998)
 | Jarmasz and Szpakowicz (2003)
-| lexicon-based
+| Lexicon-based
 | 21.88%
 | 13.91–33.21%
@@ Line 43: / Line 43: @@
 | Lin (1998)
 | Jarmasz and Szpakowicz (2003)
-| hybrid
+| Hybrid
 | 24.06%
+| 15.99–35.94%
+|-
+| Random
+| Random guessing
+|
+| Random
+| 25.00%
 | 15.99–35.94%
 |-
@@ Line 50: / Line 57: @@
 | Jiang and Conrath (1997)
 | Jarmasz and Szpakowicz (2003)
-| hybrid
+| Hybrid
 | 25.00%
 | 15.99–35.94%
@@ Line 57: / Line 64: @@
 | Landauer and Dumais (1997)
 | Landauer and Dumais (1997)
-| corpus-based
+| Corpus-based
 | 64.38%
 | 52.90–74.80%
 |-
-|
+| Human
 | Average non-English US college applicant
 | Landauer and Dumais (1997)
-| human
+| Human
 | 64.50%
 | 53.01–74.88%
@@ Line 71: / Line 78: @@
 | Turney (2001)
 | Turney (2001)
-| corpus-based
+| Corpus-based
 | 73.75%
 | 62.71–82.96%
@@ Line 78: / Line 85: @@
 | Hirst and St.-Onge (1998)
 | Jarmasz and Szpakowicz (2003)
-| lexicon-based
+| Lexicon-based
 | 77.91%
 | 68.17–87.11%
@@ Line 85: / Line 92: @@
 | Jarmasz and Szpakowicz (2003)
 | Jarmasz and Szpakowicz (2003)
-| lexicon-based
+| Lexicon-based
 | 78.75%
 | 68.17–87.11%
@@ Line 92: / Line 99: @@
 | Terra and Clarke (2003)
 | Terra and Clarke (2003)
-| corpus-based
+| Corpus-based
 | 81.25%
 | 70.97–89.11%
@@ Line 99: / Line 106: @@
 | Rapp (2003)
 | Rapp (2003)
-| corpus-based
+| Corpus-based
 | 92.50%
 | 84.39-97.20%
@@ Line 106: / Line 113: @@
 | Turney et al. (2003)
 | Turney et al. (2003)
-| hybrid
+| Hybrid
 | 97.50%
 | 91.26–99.70%

Difference between revisions of "TOEFL Synonym Questions (State of the art)"

Revision as of 06:25, 13 May 2007

Navigation menu

Search