Difference between revisions of "ESL Synonym Questions (State of the art)"

Latest revision as of 11:23, 28 June 2015

ESL = English as a Second Language
50 multiple-choice synonym questions; 4 choices per question
each question includes a sentence, providing context for the question
ESL questions available on request from Peter Turney
introduced in Turney (2001) as a way of evaluating algorithms for measuring degree of similarity between words
subsequently used by many other researchers
see also: Similarity (State of the art)

Sample question

Stem:		"A rusty nail is not as strong as a clean, new one."
Choices:	(a)	corroded
	(b)	black
	(c)	dirty
	(d)	painted
Solution:	(a)	corroded

Table of results

Algorithm	Reference for algorithm	Reference for experiment	Type	Correct	95% confidence
Random	Random guessing	1 / 4 = 25.00%	Random	25.00%	14.63-40.34%
RES	Resnik (1995)	Jarmasz and Szpakowicz (2003)	Hybrid	32.66%	21.21-48.77%
LC	Leacock and Chodrow (1998)	Jarmasz and Szpakowicz (2003)	Lexicon-based	36.00%	22.92-50.81%
LIN	Lin (1998)	Jarmasz and Szpakowicz (2003)	Hybrid	36.00%	22.92-50.81%
JC	Jiang and Conrath (1997)	Jarmasz and Szpakowicz (2003)	Hybrid	36.00%	22.92-50.81%
HSO	Hirst and St.-Onge (1998)	Jarmasz and Szpakowicz (2003)	Lexicon-based	62.00%	47.18-75.35%
PMI-IR	Turney (2001)	Turney (2001)	Corpus-based	74.00%	59.66-85.37%
PMI-IR	Terra and Clarke (2003)	Terra and Clarke (2003)	Corpus-based	80.00%	66.28-89.97%
JS	Jarmasz and Szpakowicz (2003)	Jarmasz and Szpakowicz (2003)	Lexicon-based	82.00%	68.56-91.42%

Explanation of table

Algorithm = name of algorithm
Reference for algorithm = where to find out more about given algorithm
Reference for experiment = where to find out more about evaluation of given algorithm with ESL questions
Type = general type of algorithm: corpus-based, lexicon-based, hybrid
Correct = percent of 80 questions that given algorithm answered correctly
95% confidence = confidence interval calculated using the Binomial Exact Test
table rows sorted in order of increasing percent correct
several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
PMI-IR = Pointwise Mutual Information - Information Retrieval
Terra and Clarke (2003) call the ESL Synonym Questions "TS1"

Caveats

the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
the ESL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns

References

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jarmasz, M., and Szpakowicz, S. (2003). Roget’s thesaurus and semantic similarity, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, September, pp. 212-219.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pp. 244–251.

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

@@ Line 1: / Line 1: @@
-In the fashion life, sweet, pleasant style http://www.toptbshop.com/Tory-Burch-Heel-shoes_6_1.htm is a lot of girls to pursue fashion elements, a pleasant little http://www.toptbshop.com/Tory-Burch-Reva-Flats_9_1.htm dress how can a beautiful bag to decorate it! Let me http://www.toptbshop.com/Tory-Burch-New-Arrival_7_1.htm introduce you to several popular http://www.toptbshop.com/Tory-Burch-Heel-shoes_6_1.htm bag to match your dress it sweet and pleasant!This D-shaped large bag is very cool fashion, with the gold chain as a decoration. On a simple bag with clothes with http://www.toptbshop.com/Tory-Burch-Flip-Flops_4_1.htm flowers appropriate.
+* ESL = English as a Second Language
+* 50 multiple-choice synonym questions; 4 choices per question
+* each question includes a sentence, providing context for the question
+* ESL questions available on request from [http://www.apperceptual.com/ Peter Turney]
+* introduced in Turney (2001) as a way of evaluating algorithms for measuring degree of similarity between words
+* subsequently used by many other researchers
+* see also: [[Similarity (State of the art)]]
+== Sample question ==
+::{| border="0" cellpadding="1" cellspacing="1"
+|-
+! Stem:
+|
+| "A '''rusty''' nail is not as strong as a clean, new one."
+|-
+! Choices:
+| (a)
+| corroded
+|-
+|
+| (b)
+| black
+|-
+|
+| (c)
+| dirty
+|-
+|
+| (d)
+| painted
+|-
+! Solution:
+| (a)
+| corroded
+|-
+|}
+== Table of results ==
+{| border="1" cellpadding="5" cellspacing="1" width="100%"
+|-
+! Algorithm
+! Reference for algorithm
+! Reference for experiment
+! Type
+! Correct
+! 95% confidence
+|-
+| Random
+| Random guessing
+| 1 / 4 = 25.00%
+| Random
+| 25.00%
+| 14.63-40.34%
+|-
+| RES
+| Resnik (1995)
+| Jarmasz and Szpakowicz (2003)
+| Hybrid
+| 32.66%
+| 21.21-48.77%
+|-
+| LC
+| Leacock and Chodrow (1998)
+| Jarmasz and Szpakowicz (2003)
+| Lexicon-based
+| 36.00%
+| 22.92-50.81%
+|-
+| LIN
+| Lin (1998)
+| Jarmasz and Szpakowicz (2003)
+| Hybrid
+| 36.00%
+| 22.92-50.81%
+|-
+| JC
+| Jiang and Conrath (1997)
+| Jarmasz and Szpakowicz (2003)
+| Hybrid
+| 36.00%
+| 22.92-50.81%
+|-
+| HSO
+| Hirst and St.-Onge (1998)
+| Jarmasz and Szpakowicz (2003)
+| Lexicon-based
+| 62.00%
+| 47.18-75.35%
+|-
+| PMI-IR
+| Turney (2001)
+| Turney (2001)
+| Corpus-based
+| 74.00%
+| 59.66-85.37%
+|-
+| PMI-IR
+| Terra and Clarke (2003)
+| Terra and Clarke (2003)
+| Corpus-based
+| 80.00%
+| 66.28-89.97%
+|-
+| JS
+| Jarmasz and Szpakowicz (2003)
+| Jarmasz and Szpakowicz (2003)
+| Lexicon-based
+| 82.00%
+| 68.56-91.42%
+|-
+|}
+== Explanation of table ==
+* '''Algorithm''' = name of algorithm
+* '''Reference for algorithm''' = where to find out more about given algorithm
+* '''Reference for experiment''' = where to find out more about evaluation of given algorithm with ESL questions
+* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
+* '''Correct''' = percent of 80 questions that given algorithm answered correctly
+* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
+* table rows sorted in order of increasing percent correct
+* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
+* PMI-IR = Pointwise Mutual Information - Information Retrieval
+* Terra and Clarke (2003) call the ESL Synonym Questions "TS1"
+== Caveats ==
+* the performance of a corpus-based algorithm depends on the corpus, so the difference in performance between two corpus-based systems may be due to the different corpora, rather than the different algorithms
+* the ESL questions include nouns, verbs, and adjectives, but some of the WordNet-based algorithms were only designed to work with nouns
+== References ==
+Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
+Jarmasz, M., and Szpakowicz, S. (2003). [http://www.csi.uottawa.ca/~szpak/recent_papers/TR-2003-01.pdf Roget’s thesaurus and semantic similarity], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, September, pp. 212-219.
+Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
+Leacock, C., and Chodorow, M. (1998). [http://books.google.ca/books?id=Rehu8OOzMIMC&lpg=PA265&ots=IpnaLkZUec&lr&pg=PA265#v=onepage&q&f=false Combining local context and WordNet similarity for word sense identification]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.
+Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.
+Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.
+Terra, E., and Clarke, C.L.A. (2003). [http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf Frequency estimates for statistical word similarity measures]. ''Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003)'', pp. 244–251.
+Turney, P.D. (2001). [http://arxiv.org/abs/cs.LG/0212033 Mining the Web for synonyms: PMI-IR versus LSA on TOEFL]. ''Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001)'', Freiburg, Germany, pp. 491-502.
+[[Category:State of the art]]
+[[Category:Similarity]]