Difference between revisions of "TOEFL Synonym Questions (State of the art)"
(30 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
* TOEFL = Test of English as a Foreign Language | * TOEFL = Test of English as a Foreign Language | ||
* 80 multiple-choice synonym questions; 4 choices per question | * 80 multiple-choice synonym questions; 4 choices per question | ||
− | * TOEFL questions available | + | * the TOEFL questions are available on request by contacting [http://lsa.colorado.edu/mail_sub.html LSA Support at CU Boulder], the people who manage the [http://lsa.colorado.edu/ LSA web site at Colorado] |
− | * introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between | + | * introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words |
* subsequently used by many other researchers | * subsequently used by many other researchers | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | |||
+ | == Sample question == | ||
+ | |||
+ | ::{| border="0" cellpadding="1" cellspacing="1" | ||
+ | |- | ||
+ | ! Stem: | ||
+ | | | ||
+ | | levied | ||
+ | |- | ||
+ | ! Choices: | ||
+ | | (a) | ||
+ | | imposed | ||
+ | |- | ||
+ | | | ||
+ | | (b) | ||
+ | | believed | ||
+ | |- | ||
+ | | | ||
+ | | (c) | ||
+ | | requested | ||
+ | |- | ||
+ | | | ||
+ | | (d) | ||
+ | | correlated | ||
+ | |- | ||
+ | ! Solution: | ||
+ | | (a) | ||
+ | | imposed | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | |||
+ | == Table of results == | ||
{| border="1" cellpadding="5" cellspacing="1" width="100%" | {| border="1" cellpadding="5" cellspacing="1" width="100%" | ||
Line 74: | Line 96: | ||
| 64.50% | | 64.50% | ||
| 53.01–74.88% | | 53.01–74.88% | ||
+ | |- | ||
+ | | DS | ||
+ | | Pado and Lapata (2007) | ||
+ | | Pado and Lapata (2007) | ||
+ | | Corpus-based | ||
+ | | 73.00% | ||
+ | | 62.72-82.96% | ||
|- | |- | ||
| PMI-IR | | PMI-IR | ||
Line 80: | Line 109: | ||
| Corpus-based | | Corpus-based | ||
| 73.75% | | 73.75% | ||
− | | 62. | + | | 62.72–82.96% |
+ | |- | ||
+ | | PairClass | ||
+ | | Turney (2008) | ||
+ | | Turney (2008) | ||
+ | | Corpus-based | ||
+ | | 76.25% | ||
+ | | 65.42-85.06% | ||
|- | |- | ||
| HSO | | HSO | ||
Line 102: | Line 138: | ||
| 81.25% | | 81.25% | ||
| 70.97–89.11% | | 70.97–89.11% | ||
+ | |- | ||
+ | | CWO | ||
+ | | Ruiz-Casado et al. (2005) | ||
+ | | Ruiz-Casado et al. (2005) | ||
+ | | Web-based | ||
+ | | 82.55% | ||
+ | | 72.38–90.09% | ||
+ | |- | ||
+ | | PPMIC | ||
+ | | Bullinaria and Levy (2006) | ||
+ | | Bullinaria and Levy (2006) | ||
+ | | Corpus-based | ||
+ | | 85.00% | ||
+ | | 75.26-92.00% | ||
+ | |- | ||
+ | | GLSA | ||
+ | | Matveeva et al. (2005) | ||
+ | | Matveeva et al. (2005) | ||
+ | | Corpus-based | ||
+ | | 86.25% | ||
+ | | 76.73-92.93% | ||
|- | |- | ||
| LSA | | LSA | ||
Line 119: | Line 176: | ||
|} | |} | ||
+ | |||
+ | == Explanation of table == | ||
+ | |||
+ | * '''Algorithm''' = name of algorithm | ||
+ | * '''Reference for algorithm''' = where to find out more about given algorithm | ||
+ | * '''Reference for experiment''' = where to find out more about evaluation of given algorithm with TOEFL questions | ||
+ | * '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid | ||
+ | * '''Correct''' = percent of 80 questions that given algorithm answered correctly | ||
+ | * '''95% confidence''' = confidence interval calculated using [http://www.quantitativeskills.com/sisa/statistics/onemean.htm Binomial Exact Test] | ||
+ | * table rows sorted in order of increasing percent correct | ||
+ | * several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package | ||
+ | * LSA = Latent Semantic Analysis | ||
+ | * PMI-IR = Pointwise Mutual Information - Information Retrieval | ||
+ | * PR = Product Rule | ||
+ | * PPMIC = Positive Pointwise Mutual Information with Cosine | ||
+ | * GLSA = Generalized Latent Semantic Analysis | ||
+ | * CWO = Context Window Overlapping | ||
+ | * DS = Dependency Space | ||
+ | |||
+ | == Caveats == | ||
+ | |||
+ | * the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms | ||
+ | * the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns | ||
+ | * some of the algorithms may have been tuned on the TOEFL questions | ||
+ | |||
+ | |||
+ | == References == | ||
+ | |||
+ | Bullinaria, J.A., and Levy, J.P. (2006). [http://www.cs.bham.ac.uk/~jxb/PUBS/BRM.pdf Extracting semantic representations from word co-occurrence statistics: A computational study]. To appear in ''Behavior Research Methods'', 38. | ||
Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332. | Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332. | ||
− | Jarmasz, M., and Szpakowicz, S. (2003). [http://www. | + | Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219. |
Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan. | Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan. | ||
Line 132: | Line 218: | ||
Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304. | Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304. | ||
− | Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity] | + | Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). [http://people.cs.uchicago.edu/~matveeva/SynGLSA_ranlp_final.pdf Generalized latent semantic analysis for term representation]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05)'', Borovets, Bulgaria. |
+ | |||
+ | Pado, S., and Lapata, M. (2007). [http://www.coli.uni-saarland.de/~pado/pub/papers/cl07_pado.pdf Dependency-based construction of semantic space models]. ''Computational Linguistics'', 33(2), 161-199. | ||
+ | |||
+ | Rapp, R. (2003). [http://www.amtaweb.org/summit/MTSummit/FinalPapers/19-Rapp-final.pdf Word sense discovery based on sense descriptor dissimilarity]. ''Proceedings of the Ninth Machine Translation Summit'', pp. 315-322. | ||
Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453. | Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453. | ||
+ | |||
+ | Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) [http://alfonseca.org/pubs/2005-ranlp1.pdf Using context-window overlapping in Synonym Discovery and Ontology Extension]. ''Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005)'', Borovets, Bulgaria. | ||
Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251. | Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251. | ||
Line 141: | Line 233: | ||
Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489. | Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489. | ||
+ | |||
+ | Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912. | ||
+ | |||
+ | == See also == | ||
+ | |||
+ | * [[Attributional and Relational Similarity (State of the art)]] | ||
+ | * [[ESL Synonym Questions (State of the art)|ESL Synonym Questions]] | ||
+ | * [[SAT Analogy Questions]] | ||
+ | * [[State of the art]] | ||
+ | |||
+ | |||
+ | [[Category:State of the art]] |
Revision as of 05:19, 25 June 2012
- TOEFL = Test of English as a Foreign Language
- 80 multiple-choice synonym questions; 4 choices per question
- the TOEFL questions are available on request by contacting LSA Support at CU Boulder, the people who manage the LSA web site at Colorado
- introduced in Landauer and Dumais (1997) as a way of evaluating algorithms for measuring degree of similarity between words
- subsequently used by many other researchers
Sample question
Stem: levied Choices: (a) imposed (b) believed (c) requested (d) correlated Solution: (a) imposed
Table of results
Algorithm | Reference for algorithm | Reference for experiment | Type | Correct | 95% confidence |
---|---|---|---|---|---|
RES | Resnik (1995) | Jarmasz and Szpakowicz (2003) | Hybrid | 20.31% | 12.89–31.83% |
LC | Leacock and Chodrow (1998) | Jarmasz and Szpakowicz (2003) | Lexicon-based | 21.88% | 13.91–33.21% |
LIN | Lin (1998) | Jarmasz and Szpakowicz (2003) | Hybrid | 24.06% | 15.99–35.94% |
Random | Random guessing | 1 / 4 = 25.00% | Random | 25.00% | 15.99–35.94% |
JC | Jiang and Conrath (1997) | Jarmasz and Szpakowicz (2003) | Hybrid | 25.00% | 15.99–35.94% |
LSA | Landauer and Dumais (1997) | Landauer and Dumais (1997) | Corpus-based | 64.38% | 52.90–74.80% |
Human | Average non-English US college applicant | Landauer and Dumais (1997) | Human | 64.50% | 53.01–74.88% |
DS | Pado and Lapata (2007) | Pado and Lapata (2007) | Corpus-based | 73.00% | 62.72-82.96% |
PMI-IR | Turney (2001) | Turney (2001) | Corpus-based | 73.75% | 62.72–82.96% |
PairClass | Turney (2008) | Turney (2008) | Corpus-based | 76.25% | 65.42-85.06% |
HSO | Hirst and St.-Onge (1998) | Jarmasz and Szpakowicz (2003) | Lexicon-based | 77.91% | 68.17–87.11% |
JS | Jarmasz and Szpakowicz (2003) | Jarmasz and Szpakowicz (2003) | Lexicon-based | 78.75% | 68.17–87.11% |
PMI-IR | Terra and Clarke (2003) | Terra and Clarke (2003) | Corpus-based | 81.25% | 70.97–89.11% |
CWO | Ruiz-Casado et al. (2005) | Ruiz-Casado et al. (2005) | Web-based | 82.55% | 72.38–90.09% |
PPMIC | Bullinaria and Levy (2006) | Bullinaria and Levy (2006) | Corpus-based | 85.00% | 75.26-92.00% |
GLSA | Matveeva et al. (2005) | Matveeva et al. (2005) | Corpus-based | 86.25% | 76.73-92.93% |
LSA | Rapp (2003) | Rapp (2003) | Corpus-based | 92.50% | 84.39-97.20% |
PR | Turney et al. (2003) | Turney et al. (2003) | Hybrid | 97.50% | 91.26–99.70% |
Explanation of table
- Algorithm = name of algorithm
- Reference for algorithm = where to find out more about given algorithm
- Reference for experiment = where to find out more about evaluation of given algorithm with TOEFL questions
- Type = general type of algorithm: corpus-based, lexicon-based, hybrid
- Correct = percent of 80 questions that given algorithm answered correctly
- 95% confidence = confidence interval calculated using Binomial Exact Test
- table rows sorted in order of increasing percent correct
- several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
- LSA = Latent Semantic Analysis
- PMI-IR = Pointwise Mutual Information - Information Retrieval
- PR = Product Rule
- PPMIC = Positive Pointwise Mutual Information with Cosine
- GLSA = Generalized Latent Semantic Analysis
- CWO = Context Window Overlapping
- DS = Dependency Space
Caveats
- the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
- the TOEFL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns
- some of the algorithms may have been tuned on the TOEFL questions
References
Bullinaria, J.A., and Levy, J.P. (2006). Extracting semantic representations from word co-occurrence statistics: A computational study. To appear in Behavior Research Methods, 38.
Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.
Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.
Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.
Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.
Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.
Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.
Matveeva, I., Levow, G., Farahat, A., and Royer, C. (2005). Generalized latent semantic analysis for term representation. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria.
Pado, S., and Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161-199.
Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth Machine Translation Summit, pp. 315-322.
Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.
Ruiz-Casado, M., Alfonseca, E. and Castells, P. (2005) Using context-window overlapping in Synonym Discovery and Ontology Extension. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2005), Borovets, Bulgaria.
Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.
Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.
Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.
Turney, P.D. (2008). A uniform approach to analogies, synonyms, antonyms, and associations. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.