Difference between revisions of "TOEFL Synonym Questions (State of the art)"

Revision as of 06:52, 12 May 2007

TOEFL = Test of English as a Foreign Language
80 multiple-choice synonym questions; 4 choices per question
TOEFL questions available from Thomas Landauer
introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
subsequently used by many other researchers
Algorithm = name of algorithm
Reference for algorithm = where to find out more about given algorithm for measuring similarity
Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
Algorithm = general type of algorithm: corpus-based, lexicon-based, hybrid
Correct = percent of 80 questions that given algorithm answered correctly
95% confidence = confidence interval calculated using Binomial Exact Test
table rows sorted in order of increasing percent correct
several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
LSA = Latent Semantic Analysis
PMI-IR = Pointwise Mutual Information - Information Retrieval
PR = Product Rule

Algorithm	Reference for algorithm	Reference for experiment	Algorithm	Correct	95% confidence
RES	Resnik (1995)	Jarmasz and Szpakowicz (2003)	hybrid	20.31%	12.89–31.83%
LC	Leacock and Chodrow (1998)	Jarmasz and Szpakowicz (2003)	lexicon-based	21.88%	13.91–33.21%
LIN	Lin (1998)	Jarmasz and Szpakowicz (2003)	hybrid	24.06%	15.99–35.94%
JC	Jiang and Conrath (1997)	Jarmasz and Szpakowicz (2003)	hybrid	25.00%	15.99–35.94%
LSA	Landauer and Dumais (1997)	Landauer and Dumais (1997)	corpus-based	64.38%	52.90–74.80%
	Average non-English US college applicant	Landauer and Dumais (1997)	human	64.50%	53.01–74.88%
PMI-IR	Turney (2001)	Turney (2001)	corpus-based	73.75%	62.71–82.96%
HSO	Hirst and St.-Onge (1998)	Jarmasz and Szpakowicz (2003)	lexicon-based	77.91%	68.17–87.11%
JS	Jarmasz and Szpakowicz (2003)	Jarmasz and Szpakowicz (2003)	lexicon-based	78.75%	68.17–87.11%
PMI-IR	Terra and Clarke (2003)	Terra and Clarke (2003)	corpus-based	81.25%	70.97–89.11%
RAP	Rapp (2003)	Rapp (2003)	corpus-based	92.50%	84.39-97.20%
PR	Turney et al. (2003)	Turney et al. (2003)	hybrid	97.50%	91.26–99.70%

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

@@ Line 4: / Line 4: @@
 * introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
 * subsequently used by many other researchers
+* '''Algorithm''' = name of algorithm
 * '''Reference for algorithm''' = where to find out more about given algorithm for measuring similarity
 * '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
@@ Line 11: / Line 12: @@
 * table rows sorted in order of increasing percent correct
 * several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
+* LSA = Latent Semantic Analysis
+* PMI-IR = Pointwise Mutual Information - Information Retrieval
+* PR = Product Rule
 {| border="1" cellpadding="5" cellspacing="1" width="100%"
 |-
+! Algorithm
 ! Reference for algorithm
 ! Reference for experiment
@@ Line 21: / Line 26: @@
 ! 95% confidence
 |-
+| RES
 | Resnik (1995)
 | Jarmasz and Szpakowicz (2003)
@@ Line 27: / Line 33: @@
 | 12.89–31.83%
 |-
+| LC
 | Leacock and Chodrow (1998)
 | Jarmasz and Szpakowicz (2003)
@@ Line 33: / Line 40: @@
 | 13.91–33.21%
 |-
+| LIN
 | Lin (1998)
 | Jarmasz and Szpakowicz (2003)
@@ Line 39: / Line 47: @@
 | 15.99–35.94%
 |-
+| JC
 | Jiang and Conrath (1997)
 | Jarmasz and Szpakowicz (2003)
@@ Line 45: / Line 54: @@
 | 15.99–35.94%
 |-
+| LSA
 | Landauer and Dumais (1997)
 | Landauer and Dumais (1997)
@@ Line 51: / Line 61: @@
 | 52.90–74.80%
 |-
+|
 | Average non-English US college applicant
 | Landauer and Dumais (1997)
@@ Line 57: / Line 68: @@
 | 53.01–74.88%
 |-
+| PMI-IR
 | Turney (2001)
 | Turney (2001)
@@ Line 63: / Line 75: @@
 | 62.71–82.96%
 |-
+| HSO
 | Hirst and St.-Onge (1998)
 | Jarmasz and Szpakowicz (2003)
@@ Line 69: / Line 82: @@
 | 68.17–87.11%
 |-
+| JS
 | Jarmasz and Szpakowicz (2003)
 | Jarmasz and Szpakowicz (2003)
@@ Line 75: / Line 89: @@
 | 68.17–87.11%
 |-
+| PMI-IR
 | Terra and Clarke (2003)
 | Terra and Clarke (2003)
@@ Line 81: / Line 96: @@
 | 70.97–89.11%
 |-
+| RAP
 | Rapp (2003)
 | Rapp (2003)
@@ Line 87: / Line 103: @@
 | 84.39-97.20%
 |-
+| PR
 | Turney et al. (2003)
 | Turney et al. (2003)

Difference between revisions of "TOEFL Synonym Questions (State of the art)"

Revision as of 06:52, 12 May 2007

Navigation menu

Search