Difference between revisions of "TOEFL Synonym Questions (State of the art)"

From ACL Wiki
Jump to: navigation, search
(35 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
* TOEFL = Test of English as a Foreign Language
 
* TOEFL = Test of English as a Foreign Language
 
* 80 multiple-choice synonym questions; 4 choices per question
 
* 80 multiple-choice synonym questions; 4 choices per question
* TOEFL questions available from [http://www.pearsonkt.com/bioLandauer.shtml Thomas Landauer]
+
* the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado]
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring similarity
+
* introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
 
* subsequently used by many other researchers
 
* subsequently used by many other researchers
* '''Algorithm''' = name of algorithm
 
* '''Reference for algorithm''' = where to find out more about given algorithm for measuring degree of similarity between two words
 
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
 
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
 
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
 
* '''95% confidence''' = confidence interval calculated using [http://home.clara.net/sisa/onemean.htm Binomial Exact Test]
 
* table rows sorted in order of increasing percent correct
 
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
 
* LSA = Latent Semantic Analysis
 
* PMI-IR = Pointwise Mutual Information - Information Retrieval
 
* PR = Product Rule
 
  
 +
 +
== Sample question ==
 +
 +
::{| border="0" cellpadding="1" cellspacing="1"
 +
|-
 +
! Stem:
 +
|
 +
| levied
 +
|-
 +
! Choices:
 +
| (a)
 +
| imposed
 +
|-
 +
|
 +
| (b)
 +
| believed
 +
|-
 +
|
 +
| (c)
 +
| requested
 +
|-
 +
|
 +
| (d)
 +
| correlated
 +
|-
 +
! Solution:
 +
| (a)
 +
| imposed
 +
|-
 +
|}
 +
 +
 +
== Table of results ==
  
 
{| border="1" cellpadding="5" cellspacing="1" width="100%"
 
{| border="1" cellpadding="5" cellspacing="1" width="100%"
Line 74: Line 96:
 
| 64.50%
 
| 64.50%
 
| 53.01–74.88%
 
| 53.01–74.88%
 +
|-
 +
| DS
 +
| Pado and Lapata (2007)
 +
| Pado and Lapata (2007)
 +
| Corpus-based
 +
| 73.00%
 +
| 62.72-82.96%
 
|-
 
|-
 
| PMI-IR
 
| PMI-IR
Line 80: Line 109:
 
| Corpus-based
 
| Corpus-based
 
| 73.75%
 
| 73.75%
| 62.71–82.96%
+
| 62.72–82.96%
 +
|-
 +
| PairClass
 +
| Turney (2008)
 +
| Turney (2008)
 +
| Corpus-based
 +
| 76.25%
 +
| 65.42-85.06%
 
|-
 
|-
 
| HSO
 
| HSO
Line 102: Line 138:
 
| 81.25%
 
| 81.25%
 
| 70.97–89.11%
 
| 70.97–89.11%
 +
|-
 +
| CWO
 +
| Ruiz-Casado et al. (2005)
 +
| Ruiz-Casado et al. (2005)
 +
| Web-based
 +
| 82.55%
 +
| 72.38–90.09%
 +
|-
 +
| PPMIC
 +
| Bullinaria and Levy (2006)
 +
| Bullinaria and Levy (2006)
 +
| Corpus-based
 +
| 85.00%
 +
| 75.26-92.00%
 +
|-
 +
| GLSA
 +
| Matveeva et al. (2005)
 +
| Matveeva et al. (2005)
 +
| Corpus-based
 +
| 86.25%
 +
| 76.73-92.93%
 
|-
 
|-
 
| LSA
 
| LSA
Line 119: Line 176:
 
|}
 
|}
  
 +
 +
== Explanation of table ==
 +
 +
* '''Algorithm''' = name of algorithm
 +
* '''Reference for algorithm''' = where to find out more about given algorithm
 +
* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions
 +
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
 +
* '''Correct''' = percent of 80 questions that given algorithm answered correctly
 +
* '''95% confidence''' = confidence interval calculated using [http://www.quantitativeskills.com/sisa/statistics/onemean.htm Binomial Exact Test]
 +
* table rows sorted in order of increasing percent correct
 +
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
 +
* LSA = Latent Semantic Analysis
 +
* PMI-IR = Pointwise Mutual Information - Information Retrieval
 +
* PR = Product Rule
 +
* PPMIC = Positive Pointwise Mutual Information with Cosine
 +
* GLSA = Generalized Latent Semantic Analysis
 +
* CWO = Context Window Overlapping
 +
* DS = Dependency Space
 +
 +
== Notes ==
 +
 +
* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
 +
* the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
 +
* some of the algorithms may have been tuned on the TOEFL questions; read the references for details
 +
* Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct
 +
 +
== References ==
 +
 +
Bullinaria, J.A., and Levy, J.P. (2006). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. To appear in ''Behavior Research Methods'', 38.
  
 
Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
 
Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
  
Jarmasz, M., and Szpakowicz, S. (2003). [http://www.site.uottawa.ca/~mjarmasz/pubs/jarmasz_roget_sim.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.
+
Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.
  
 
Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
 
Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
Line 132: Line 218:
 
Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.
 
Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.
  
Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity], ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.
+
Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria.
 +
 
 +
Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199.
 +
 
 +
Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322.
  
 
Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.
 
Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.
 +
 +
Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria.
  
 
Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.
 
Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.
Line 141: Line 233:
  
 
Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.
 
Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.
 +
 +
Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.
 +
 +
== See also ==
 +
 +
* [[Attributional and Relational Similarity (State of the art)]]
 +
* [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]]
 +
* [[SAT Analogy Questions]]
 +
* [[State of the art]]
 +
 +
 +
[[Category:State of the art]]

Revision as of 07:21, 16 December 2012

  • TOEFL = Test of English as a Foreign Language
  • 80 multiple-choice synonym questions; 4 choices per question
  • the TOEFL questions are available on request by contacting LSA Support at CU Boulder, the people who manage the LSA web site at Colorado
  • introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
  • subsequently used by many other researchers


Sample question

Stem: levied
Choices: (a) imposed
(b) believed
(c) requested
(d) correlated
Solution: (a) imposed


Table of results

Algorithm Reference for algorithm Reference for experiment Type Correct 95% confidence
RES Resnik (1995) Jarmasz and Szpakowicz (2003) Hybrid 20.31% 12.89–31.83%
LC Leacock and Chodrow (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 21.88% 13.91–33.21%
LIN Lin (1998) Jarmasz and Szpakowicz (2003) Hybrid 24.06% 15.99–35.94%
Random Random guessing 1 / 4 = 25.00% Random 25.00% 15.99–35.94%
JC Jiang and Conrath (1997) Jarmasz and Szpakowicz (2003) Hybrid 25.00% 15.99–35.94%
LSA Landauer and Dumais (1997) Landauer and Dumais (1997) Corpus-based 64.38% 52.90–74.80%
Human Average non-English US college applicant Landauer and Dumais (1997) Human 64.50% 53.01–74.88%
DS Pado and Lapata (2007) Pado and Lapata (2007) Corpus-based 73.00% 62.72-82.96%
PMI-IR Turney (2001) Turney (2001) Corpus-based 73.75% 62.72–82.96%
PairClass Turney (2008) Turney (2008) Corpus-based 76.25% 65.42-85.06%
HSO Hirst and St.-Onge (1998) Jarmasz and Szpakowicz (2003) Lexicon-based 77.91% 68.17–87.11%
JS Jarmasz and Szpakowicz (2003) Jarmasz and Szpakowicz (2003) Lexicon-based 78.75% 68.17–87.11%
PMI-IR Terra and Clarke (2003) Terra and Clarke (2003) Corpus-based 81.25% 70.97–89.11%
CWO Ruiz-Casado et al. (2005) Ruiz-Casado et al. (2005) Web-based 82.55% 72.38–90.09%
PPMIC Bullinaria and Levy (2006) Bullinaria and Levy (2006) Corpus-based 85.00% 75.26-92.00%
GLSA Matveeva et al. (2005) Matveeva et al. (2005) Corpus-based 86.25% 76.73-92.93%
LSA Rapp (2003) Rapp (2003) Corpus-based 92.50% 84.39-97.20%
PR Turney et al. (2003) Turney et al. (2003) Hybrid 97.50% 91.26–99.70%


Explanation of table

  • Algorithm = name of algorithm
  • Reference for algorithm = where to find out more about given algorithm
  • Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
  • Type = general type of algorithm: corpus-based, lexicon-based, hybrid
  • Correct = percent of 80 questions that given algorithm answered correctly
  • 95% confidence = confidence interval calculated using Binomial Exact Test
  • table rows sorted in order of increasing percent correct
  • several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
  • LSA = Latent Semantic Analysis
  • PMI-IR = Pointwise Mutual Information - Information Retrieval
  • PR = Product Rule
  • PPMIC = Positive Pointwise Mutual Information with Cosine
  • GLSA = Generalized Latent Semantic Analysis
  • CWO = Context Window Overlapping
  • DS = Dependency Space

Notes

  • the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
  • the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns; this explains some of the lower scores
  • some of the algorithms may have been tuned on the TOEFL questions; read the references for details
  • Landauer and Dumais (1997) report scores that were corrected for guessing by subtracting a penalty of 1/3 for each incorrect answer; they report a score of 52.5% when this penalty is applied; when the penalty is removed, their performance is 64.4% correct

References

Bullinaria, J.A., and Levy, J.P. (2006). Extracting semantic representations from word co-occurrence statistics: A computational study. To appear in Behavior Research Methods, 38.

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). Generalized latent semantic analysis for term representation. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria.

Pado, S., and Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161-199.

Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth Machine Translation Summit, pp. 315-322.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) Using context-window overlapping in Synonym Discovery and Ontology Extension. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005), Borovets, Bulgaria.

Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.

Turney, P.D. (2008). A uniform approach to analogies, synonyms, antonyms, and associations. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.

See also