Difference between revisions of "SAT Analogy Questions (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
m (→‎References: make reference style more consistent)
 
(27 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
* SAT = Scholastic Aptitude Test
 
* SAT = Scholastic Aptitude Test
 
* 374 multiple-choice analogy questions; 5 choices per question
 
* 374 multiple-choice analogy questions; 5 choices per question
* SAT questions collected by [http://www.cs.rutgers.edu/~mlittman/ Michael Littman], available from [http://www.apperceptual.com/ Peter Turney]
+
* SAT questions collected by [http://www.cs.rutgers.edu/~mlittman/ Michael Littman], available on request from [http://www.apperceptual.com/ Peter Turney]
 
* introduced in Turney et al. (2003) as a way of evaluating algorithms for measuring relational similarity
 
* introduced in Turney et al. (2003) as a way of evaluating algorithms for measuring relational similarity
 +
* see also: [[Similarity (State of the art)]], [[Analogy (State of the art)]]
  
  
Line 118: Line 119:
 
| Corpus-based
 
| Corpus-based
 
| 44.0%
 
| 44.0%
 +
| 39.0-49.3%
 +
|-
 +
| BagPack
 +
| Herdağdelen and Baroni (2009)
 +
| Herdağdelen and Baroni (2009)
 +
| Corpus-based
 +
| 44.1%
 
| 39.0-49.3%
 
| 39.0-49.3%
 
|-
 
|-
Line 126: Line 134:
 
| 47.1%
 
| 47.1%
 
| 42.2-52.5%
 
| 42.2-52.5%
 +
|-
 +
| Dual-Space
 +
| Turney (2012)
 +
| Turney (2012)
 +
| Corpus-based
 +
| 51.1%
 +
| 46.1-56.5%
 +
|-
 +
| BMI
 +
| Bollegala et al. (2009)
 +
| Bollegala et al. (2009)
 +
| Corpus-based
 +
| 51.1%
 +
| 46.1-56.5%
 +
|-
 +
| PairClass
 +
| Turney (2008)
 +
| Turney (2008)
 +
| Corpus-based
 +
| 52.1%
 +
| 46.9-57.3%
 
|-
 
|-
 
| PERT
 
| PERT
Line 133: Line 162:
 
| 53.5%
 
| 53.5%
 
| 48.5-58.9%
 
| 48.5-58.9%
 +
|-
 +
| SuperSim
 +
| Turney (2013)
 +
| Turney (2013)
 +
| Corpus-based
 +
| 54.8%
 +
| 49.6-59.9%
 +
|-
 +
| ConceptNet
 +
| Speer et al. (2017)
 +
| Speer et al. (2017)
 +
| Hybrid
 +
| 56.1%
 +
| 51.0-61.2%
 
|-
 
|-
 
| LRA
 
| LRA
Line 139: Line 182:
 
| Corpus-based
 
| Corpus-based
 
| 56.1%
 
| 56.1%
| 51.0–61.2%
+
| 51.0-61.2%
 
|-
 
|-
 
| Human
 
| Human
Line 148: Line 191:
 
| 52.0-62.3%
 
| 52.0-62.3%
 
|-
 
|-
 +
| Human Voting
 +
| Lofi (2013)
 +
| Lofi (2013)
 +
| Human Voting
 +
| 81.5%
 +
| 77.2-85.4%
 
|}
 
|}
 
  
 
== Explanation of table ==
 
== Explanation of table ==
Line 158: Line 206:
 
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
 
* '''Type''' = general type of algorithm: corpus-based, lexicon-based, hybrid
 
* '''Correct''' = percent of 374 questions that given algorithm answered correctly
 
* '''Correct''' = percent of 374 questions that given algorithm answered correctly
* '''95% confidence''' = confidence interval calculated using [http://www.quantitativeskills.com/sisa/statistics/onemean.htm Binomial Exact Test]
+
* '''95% confidence''' = confidence interval calculated using the [[Statistical calculators|Binomial Exact Test]]
 
* table rows sorted in order of increasing percent correct
 
* table rows sorted in order of increasing percent correct
 
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
 
* several WordNet-based similarity measures are implemented in [http://www.d.umn.edu/~tpederse/ Ted Pedersen]'s [http://www.d.umn.edu/~tpederse/similarity.html WordNet::Similarity] package
Line 167: Line 215:
 
* PMI-IR = Pointwise Mutual Information - Information Retrieval
 
* PMI-IR = Pointwise Mutual Information - Information Retrieval
 
* LSA+Predication = Latent Semantic Analysis + Predication
 
* LSA+Predication = Latent Semantic Analysis + Predication
 +
* BagPack = Bag of words representation of Paired concept knowledge
 +
* ConceptNet = ConceptNet Numberbatch 2016.09
  
 
== References ==
 
== References ==
  
 
Bicici, E., and Yuret, D. (2006). [http://www.denizyuret.com/pub/tainn-06/LAWSQ-LNCS.pdf Clustering word pairs to answer analogy questions]. ''Proceedings of the Fifteenth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN 2006)''.  
 
Bicici, E., and Yuret, D. (2006). [http://www.denizyuret.com/pub/tainn-06/LAWSQ-LNCS.pdf Clustering word pairs to answer analogy questions]. ''Proceedings of the Fifteenth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN 2006)''.  
 +
 +
Bollegala D., Matsuo Y., and Ishizuka M. (2009).  [http://www2009.org/proceedings/pdf/p651.pdf Measuring the similarity between implicit semantic relations from the web]. ''Proceedings of the 18th International Conference on World Wide Web'', ACM, pages 651–660.
 +
 +
Herdağdelen A. and Baroni M. (2009) [http://clic.cimec.unitn.it/marco/publications/gems-09/herdagdelen-baroni-gems09.pdf BagPack: A general framework to represent semantic relations]. ''Proceedings of the EACL 2009 Geometrical Models for Natural Language Semantics (GEMS) Workshop'', East Stroudsburg PA: ACL, 33-40.
  
 
Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
 
Hirst, G., and St-Onge, D. (1998). [http://mirror.eacoss.org/documentation/ITLibrary/IRIS/Data/1997/Hirst/Lexical/1997-Hirst-Lexical.pdf Lexical chains as representation of context for the detection and correction of malapropisms]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, 305-332.
Line 176: Line 230:
 
Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
 
Jiang, J.J., and Conrath, D.W. (1997). [http://wortschatz.uni-leipzig.de/~sbordag/aalw05/Referate/03_Assoziationen_BudanitskyResnik/Jiang_Conrath_97.pdf Semantic similarity based on corpus statistics and lexical taxonomy]. ''Proceedings of the International Conference on Research in Computational Linguistics'', Taiwan.
  
Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.
+
Leacock, C., and Chodorow, M. (1998). [http://books.google.ca/books?id=Rehu8OOzMIMC&lpg=PA265&ots=IpnaLkZUec&lr&pg=PA265#v=onepage&q&f=false Combining local context and WordNet similarity for word sense identification]. In C. Fellbaum (ed.), ''WordNet: An Electronic Lexical Database''. Cambridge: MIT Press, pp. 265-283.
  
 
Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.
 
Lin, D. (1998). [http://www.cs.ualberta.ca/~lindek/papers/sim.pdf An information-theoretic definition of similarity]. ''Proceedings of the 15th International Conference on Machine Learning (ICML-98)'', Madison, WI, pp. 296-304.
  
Mangalath, P., Quesada, J., and Kintsch, W. (2004). [http://www.andrew.cmu.edu/user/jquesada/pdf/analogyPredicationCogSciPoster1.pdf Analogy-making as predication using relational information and LSA vectors]. In K.D. Forbus, D. Gentner & T. Regier (Eds.), ''Proceedings of the 26th Annual Meeting of the Cognitive Science Society''. Chicago: Lawrence Erlbaum Associates.
+
Lofi, C. (2013). [http://www.ifis.cs.tu-bs.de/sites/default/files/biblio/13cdim_final_pdf_80985.pdf Just ask a human?--Controlling Quality in Relational Similarity and Analogy Processing using the Crowd]. ''Proceedings of the Workshop of the 15th BTW Conference on Database Systems for Business, Technology, and Web (BTW 2013)'', Magdeburg, Germany, pp. 197-210.
 +
 
 +
Mangalath, P., Quesada, J., and Kintsch, W. (2004). [http://www.cogsci.northwestern.edu/cogsci2004/ma/ma355.pdf Analogy-making as predication using relational information and LSA vectors]. In K.D. Forbus, D. Gentner & T. Regier (Eds.), ''Proceedings of the 26th Annual Meeting of the Cognitive Science Society''. Chicago: Lawrence Erlbaum Associates.
  
 
Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.
 
Resnik, P. (1995). [http://citeseer.ist.psu.edu/resnik95using.html Using information content to evaluate semantic similarity]. ''Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95)'', Montreal, pp. 448-453.
 +
 +
Speer, R., Chin, J., and Havasi, C. (2017). [http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972 ConceptNet 5.5: An Open Multilingual Graph of General Knowledge]. ''Proceedings of The 31st AAAI Conference on Artificial Intelligence'', San Francisco, CA.
  
 
Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.
 
Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). [http://arxiv.org/abs/cs.CL/0309035 Combining independent modules to solve multiple-choice synonym and analogy problems]. ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03)'', Borovets, Bulgaria, pp. 482-489.
Line 194: Line 252:
 
Turney, P.D. (2006b). [http://arxiv.org/abs/cs.CL/0608100 Similarity of semantic relations]. ''Computational Linguistics'', 32 (3), 379-416.
 
Turney, P.D. (2006b). [http://arxiv.org/abs/cs.CL/0608100 Similarity of semantic relations]. ''Computational Linguistics'', 32 (3), 379-416.
  
Veale, T. (2004). [http://afflatus.ucd.ie/Papers/ecai2004.pdf WordNet sits the SAT: A knowledge-based approach to lexical analogy]. ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', pp. 606–612, Valencia, Spain.
+
Turney, P.D. (2008). [http://arxiv.org/abs/0809.0124 A uniform approach to analogies, synonyms, antonyms, and associations]. ''Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)'', Manchester, UK, pp. 905-912.
  
 +
Turney, P.D. (2012). [http://jair.org/papers/paper3640.html Domain and function: A dual-space model of semantic relations and compositions], ''Journal of Artificial Intelligence Research (JAIR)'', 44, 533-585.
  
== See also ==
+
Turney, P.D. (2013), [http://aclweb.org/anthology/Q/Q13/Q13-1029.pdf Distributional semantics beyond words: Supervised learning of analogy and paraphrase], ''Transactions of the Association for Computational Linguistics (TACL)'', 1, 353-366.
  
* [[Attributional and Relational Similarity (State of the art)]]
+
Veale, T. (2004). [http://afflatus.ucd.ie/Papers/ecai2004.pdf WordNet sits the SAT: A knowledge-based approach to lexical analogy]. ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', pp. 606–612, Valencia, Spain.
* [[TOEFL Synonym Questions]]
 
* [[State of the art]]
 
  
  
 
[[Category:State of the art]]
 
[[Category:State of the art]]
 +
[[Category:Similarity]]
 +
[[Category:Analogy]]

Latest revision as of 09:03, 22 March 2017


Sample question

Stem: mason:stone
Choices: (a) teacher:chalk
(b) carpenter:wood
(c) soldier:gun
(d) photograph:camera
(e) book:word
Solution: (b) carpenter:wood

Table of results

Algorithm Reference for algorithm Reference for experiment Type Correct 95% confidence
Random Random guessing 1 / 5 = 20.0% Random 20.0% 16.1-24.5%
JC Jiang and Conrath (1997) Turney (2006b) Hybrid 27.3% 23.1-32.4%
LIN Lin (1998) Turney (2006b) Hybrid 27.3% 23.1-32.4%
LC Leacock and Chodrow (1998) Turney (2006b) Lexicon-based 31.3% 26.9-36.5%
HSO Hirst and St.-Onge (1998) Turney (2006b) Lexicon-based 32.1% 27.6-37.4%
RES Resnik (1995) Turney (2006b) Hybrid 33.2% 28.7-38.5%
PMI-IR Turney (2001) Turney (2006b) Corpus-based 35.0% 30.2-40.1%
LSA+Predication Mangalath et al. (2004) Mangalath et al. (2004) Corpus-based 42.0% 37.2-47.4%
KNOW-BEST Veale (2004) Veale (2004) Lexicon-based 43.0% 38.0-48.2%
k-means Bicici and Yuret (2006) Bicici and Yuret (2006) Corpus-based 44.0% 39.0-49.3%
BagPack Herdağdelen and Baroni (2009) Herdağdelen and Baroni (2009) Corpus-based 44.1% 39.0-49.3%
VSM Turney and Littman (2005) Turney and Littman (2005) Corpus-based 47.1% 42.2-52.5%
Dual-Space Turney (2012) Turney (2012) Corpus-based 51.1% 46.1-56.5%
BMI Bollegala et al. (2009) Bollegala et al. (2009) Corpus-based 51.1% 46.1-56.5%
PairClass Turney (2008) Turney (2008) Corpus-based 52.1% 46.9-57.3%
PERT Turney (2006a) Turney (2006a) Corpus-based 53.5% 48.5-58.9%
SuperSim Turney (2013) Turney (2013) Corpus-based 54.8% 49.6-59.9%
ConceptNet Speer et al. (2017) Speer et al. (2017) Hybrid 56.1% 51.0-61.2%
LRA Turney (2006b) Turney (2006b) Corpus-based 56.1% 51.0-61.2%
Human Average US college applicant Turney and Littman (2005) Human 57.0% 52.0-62.3%
Human Voting Lofi (2013) Lofi (2013) Human Voting 81.5% 77.2-85.4%

Explanation of table

  • Algorithm = name of algorithm
  • Reference for algorithm = where to find out more about given algorithm
  • Reference for experiment = where to find out more about evaluation of given algorithm with SAT questions
  • Type = general type of algorithm: corpus-based, lexicon-based, hybrid
  • Correct = percent of 374 questions that given algorithm answered correctly
  • 95% confidence = confidence interval calculated using the Binomial Exact Test
  • table rows sorted in order of increasing percent correct
  • several WordNet-based similarity measures are implemented in Ted Pedersen's WordNet::Similarity package
  • KNOW-BEST = KNOWledge-Based Entertainment and Scholastic Testing
  • VSM = Vector Space Model
  • LRA = Latent Relational Analysis
  • PERT = Pertinence
  • PMI-IR = Pointwise Mutual Information - Information Retrieval
  • LSA+Predication = Latent Semantic Analysis + Predication
  • BagPack = Bag of words representation of Paired concept knowledge
  • ConceptNet = ConceptNet Numberbatch 2016.09

References

Bicici, E., and Yuret, D. (2006). Clustering word pairs to answer analogy questions. Proceedings of the Fifteenth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN 2006).

Bollegala D., Matsuo Y., and Ishizuka M. (2009). Measuring the similarity between implicit semantic relations from the web. Proceedings of the 18th International Conference on World Wide Web, ACM, pages 651–660.

Herdağdelen A. and Baroni M. (2009) BagPack: A general framework to represent semantic relations. Proceedings of the EACL 2009 Geometrical Models for Natural Language Semantics (GEMS) Workshop, East Stroudsburg PA: ACL, 33-40.

Hirst, G., and St-Onge, D. (1998). Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 305-332.

Jiang, J.J., and Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

Leacock, C., and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (ed.), WordNet: An Electronic Lexical Database. Cambridge: MIT Press, pp. 265-283.

Lin, D. (1998). An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, pp. 296-304.

Lofi, C. (2013). Just ask a human?--Controlling Quality in Relational Similarity and Analogy Processing using the Crowd. Proceedings of the Workshop of the 15th BTW Conference on Database Systems for Business, Technology, and Web (BTW 2013), Magdeburg, Germany, pp. 197-210.

Mangalath, P., Quesada, J., and Kintsch, W. (2004). Analogy-making as predication using relational information and LSA vectors. In K.D. Forbus, D. Gentner & T. Regier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society. Chicago: Lawrence Erlbaum Associates.

Resnik, P. (1995). Using information content to evaluate semantic similarity. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, pp. 448-453.

Speer, R., Chin, J., and Havasi, C. (2017). ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. Proceedings of The 31st AAAI Conference on Artificial Intelligence, San Francisco, CA.

Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489.

Turney, P.D., and Littman, M.L. (2005). Corpus-based learning of analogies and semantic relations. Machine Learning, 60 (1-3), 251-278.

Turney, P.D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), Freiburg, Germany, pp. 491-502.

Turney, P.D. (2006a). Expressing implicit semantic relations without supervision. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (Coling/ACL-06), Sydney, Australia, pp. 313-320.

Turney, P.D. (2006b). Similarity of semantic relations. Computational Linguistics, 32 (3), 379-416.

Turney, P.D. (2008). A uniform approach to analogies, synonyms, antonyms, and associations. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.

Turney, P.D. (2012). Domain and function: A dual-space model of semantic relations and compositions, Journal of Artificial Intelligence Research (JAIR), 44, 533-585.

Turney, P.D. (2013), Distributional semantics beyond words: Supervised learning of analogy and paraphrase, Transactions of the Association for Computational Linguistics (TACL), 1, 353-366.

Veale, T. (2004). WordNet sits the SAT: A knowledge-based approach to lexical analogy. Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), pp. 606–612, Valencia, Spain.