Difference between revisions of "TOEFL Synonym Questions (State of the art)"

Revision as of 07:21, 16 December 2012

TOEFL = Test of English as a Foreign Language
80 multiple-choice synonym questions; 4 choices per question
the TOEFL questions are available on request by contacting LSA Support at CU Boulder, the people who manage the LSA web site at Colorado
introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
subsequently used by many other researchers

Sample question

Stem:		levied
Choices:	(a)	imposed
	(b)	believed
	(c)	requested
	(d)	correlated
Solution:	(a)	imposed

Table of results

Algorithm	Reference for algorithm	Reference for experiment	Type	Correct	95% confidence
RES	Resnik (1995)	Jarmasz and Szpakowicz (2003)	Hybrid	20.31%	12.89–31.83%
LC	Leacock and Chodrow (1998)	Jarmasz and Szpakowicz (2003)	Lexicon-based	21.88%	13.91–33.21%
LIN	Lin (1998)	Jarmasz and Szpakowicz (2003)	Hybrid	24.06%	15.99–35.94%
Random	Random guessing	1 / 4 = 25.00%	Random	25.00%	15.99–35.94%
JC	Jiang and Conrath (1997)	Jarmasz and Szpakowicz (2003)	Hybrid	25.00%	15.99–35.94%
LSA	Landauer and Dumais (1997)	Landauer and Dumais (1997)	Corpus-based	64.38%	52.90–74.80%
Human	Average non-English US college applicant	Landauer and Dumais (1997)	Human	64.50%	53.01–74.88%
DS	Pado and Lapata (2007)	Pado and Lapata (2007)	Corpus-based	73.00%	62.72-82.96%
PMI-IR	Turney (2001)	Turney (2001)	Corpus-based	73.75%	62.72–82.96%
PairClass	Turney (2008)	Turney (2008)	Corpus-based	76.25%	65.42-85.06%
HSO	Hirst and St.-Onge (1998)	Jarmasz and Szpakowicz (2003)	Lexicon-based	77.91%	68.17–87.11%
JS	Jarmasz and Szpakowicz (2003)	Jarmasz and Szpakowicz (2003)	Lexicon-based	78.75%	68.17–87.11%
PMI-IR	Terra and Clarke (2003)	Terra and Clarke (2003)	Corpus-based	81.25%	70.97–89.11%
CWO	Ruiz-Casado et al. (2005)	Ruiz-Casado et al. (2005)	Web-based	82.55%	72.38–90.09%
PPMIC	Bullinaria and Levy (2006)	Bullinaria and Levy (2006)	Corpus-based	85.00%	75.26-92.00%
GLSA	Matveeva et al. (2005)	Matveeva et al. (2005)	Corpus-based	86.25%	76.73-92.93%
LSA	Rapp (2003)	Rapp (2003)	Corpus-based	92.50%	84.39-97.20%
PR	Turney et al. (2003)	Turney et al. (2003)	Hybrid	97.50%	91.26–99.70%

Explanation of table

Algorithm = name of algorithm
Reference for algorithm = where to find out more about given algorithm
Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
Type = general type of algorithm: corpus-based, lexicon-based, hybrid
Correct = percent of 80 questions that given algorithm answered correctly
95% confidence = confidence interval calculated using Binomial Exact Test
table rows sorted in order of increasing percent correct
several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
LSA = Latent Semantic Analysis
PMI-IR = Pointwise Mutual Information - Information Retrieval
PR = Product Rule
PPMIC = Positive Pointwise Mutual Information with Cosine
GLSA = Generalized Latent Semantic Analysis
CWO = Context Window Overlapping
DS = Dependency Space

Notes

the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
some of the algorithms may have been tuned on the TOEFL questions; read the references for details
Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

References

Bullinaria, J.A., and Levy, J.P. (2006). Extracting semantic representations from word co-occurrence statistics: A computational study. To appear in Behavior Research Methods, 38.

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). Generalized latent semantic analysis for term representation. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161-199.

Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth Machine Translation Summit, pp. 315-322.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) Using context-window overlapping in Synonym Discovery and Ontology Extension. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005), Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). A uniform approach to analogies, synonyms, antonyms, and associations. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.

@@ Line 1: / Line 1: @@
 * TOEFL = Test of English as a Foreign Language
 * 80 multiple-choice synonym questions; 4 choices per question
-* TOEFL questions available from [http://www.pearsonkt.com/bioLandauer.shtml Thomas Landauer]
+* the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
-* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
+* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
 * subsequently used by many other researchers
-* '''Algorithm''' = name of algorithm
-* '''Reference for algorithm''' = where to find out more about given algorithm for measuring degree of similarity between two words
-* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
-* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
-* '''Correct''' = percent of 80 questions that given algorithm answered correctly
-* '''95% confidence''' = confidence interval calculated using [http://home.clara.net/sisa/onemean.htm Binomial Exact Test]
-* table rows sorted in order of increasing percent correct
-* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
-* LSA = Latent Semantic Analysis
-* PMI-IR = Pointwise Mutual Information - Information Retrieval
-* PR = Product Rule
+== Sample question ==
+::{| border="0" cellpadding="1" cellspacing="1"
+|-
+! Stem:
+|
+| levied
+|-
+! Choices:
+| (a)
+| imposed
+|-
+|
+| (b)
+| believed
+|-
+|
+| (c)
+| requested
+|-
+|
+| (d)
+| correlated
+|-
+! Solution:
+| (a)
+| imposed
+|-
+|}
+== Table of results ==
 {| border="1" cellpadding="5" cellspacing="1" width="100%"
@@ Line 74: / Line 96: @@
 | 64.50%
 | 53.01–74.88%
+|-
+| DS
+| Pado and Lapata (2007)
+| Pado and Lapata (2007)
+| Corpus-based
+| 73.00%
+| 62.72-82.96%
 |-
 | PMI-IR
@@ Line 80: / Line 109: @@
 | Corpus-based
 | 73.75%
-| 62.71–82.96%
+| 62.72–82.96%
+|-
+| PairClass
+| Turney (2008)
+| Turney (2008)
+| Corpus-based
+| 76.25%
+| 65.42-85.06%
 |-
 | HSO
@@ Line 102: / Line 138: @@
 | 81.25%
 | 70.97–89.11%
+|-
+| CWO
+| Ruiz-Casado et al. (2005)
+| Ruiz-Casado et al. (2005)
+| Web-based
+| 82.55%
+| 72.38–90.09%
+|-
+| PPMIC
+| Bullinaria and Levy (2006)
+| Bullinaria and Levy (2006)
+| Corpus-based
+| 85.00%
+| 75.26-92.00%
+|-
+| GLSA
+| Matveeva et al. (2005)
+| Matveeva et al. (2005)
+| Corpus-based
+| 86.25%
+| 76.73-92.93%
 |-
 | LSA
@@ Line 119: / Line 176: @@
 |}
+== Explanation of table ==
+* '''Algorithm''' = name of algorithm
+* '''Reference for algorithm''' = where to find out more about given algorithm
+* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
+* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
+* '''Correct''' = percent of 80 questions that given algorithm answered correctly
+* '''95% confidence''' = confidence interval calculated using [http://www.quantitativeskills.com/sisa/statistics/onemean.htm Binomial Exact Test]
+* table rows sorted in order of increasing percent correct
+* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
+* LSA = Latent Semantic Analysis
+* PMI-IR = Pointwise Mutual Information - Information Retrieval
+* PR = Product Rule
+* PPMIC = Positive Pointwise Mutual Information with Cosine
+* GLSA = Generalized Latent Semantic Analysis
+* CWO = Context Window Overlapping
+* DS = Dependency Space
+== Notes ==
+* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
+* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
+* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
+* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct
+== References ==
+Bullinaria, J.A., and Levy, J.P. (2006). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. To appear in ''Behavior Research Methods'', 38.
 Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
-Jarmasz, M., and Szpakowicz, S. (2003). [http://www.site.uottawa.ca/~mjarmasz/pubs/jarmasz_roget_sim.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.
+Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.
 Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
@@ Line 132: / Line 218: @@
 Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.
-Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity], ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.
+Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.
+Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.
+Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.
 Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.
+Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.
 Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.
@@ Line 141: / Line 233: @@
 Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.
+Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.
+== See also ==
+* [[Attributional and Relational Similarity (State of the art)]]
+* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
+* [[SAT Analogy Questions]]
+* [[State of the art]]
+[[Category:State of the art]]

Difference between revisions of "TOEFL Synonym Questions (State of the art)"

Revision as of 07:21, 16 December 2012

Contents

Sample question

Table of results

Explanation of table

Notes

References

See also

Navigation menu

Search