Difference between revisions of "Question Answering (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
(Add TREC QA results with aNMM model from Yang et al. in CIKM 2016)
(2 intermediate revisions by 2 users not shown)
Line 5: Line 5:
 
* [http://cs.stanford.edu/people/mengqiu/data/qg-emnlp07-data.tgz QA Answer Sentence Selection Dataset]: labeled sentences using TREC QA track data, provided by [http://cs.stanford.edu/people/mengqiu/ Mengqiu Wang] and first used in [http://www.aclweb.org/anthology/D/D07/D07-1003.pdf Wang et al. (2007)].  
 
* [http://cs.stanford.edu/people/mengqiu/data/qg-emnlp07-data.tgz QA Answer Sentence Selection Dataset]: labeled sentences using TREC QA track data, provided by [http://cs.stanford.edu/people/mengqiu/ Mengqiu Wang] and first used in [http://www.aclweb.org/anthology/D/D07/D07-1003.pdf Wang et al. (2007)].  
 
* Over time, the original dataset diverged to two versions due to different pre-processing in recent publications: both have the same training set but their development and test sets differ. The Raw version has 82 questions in the development set and 100 questions in the test set; The Clean version (Wang and Ittycheriah et al. 2015, Tan et al. 2015, dos Santos et al. 2016, Wang et al. 2016) removed questions with no answers or with only positive/negative answers, thus has only 65 questions in the development set and 68 questions in the test set.  
 
* Over time, the original dataset diverged to two versions due to different pre-processing in recent publications: both have the same training set but their development and test sets differ. The Raw version has 82 questions in the development set and 100 questions in the test set; The Clean version (Wang and Ittycheriah et al. 2015, Tan et al. 2015, dos Santos et al. 2016, Wang et al. 2016) removed questions with no answers or with only positive/negative answers, thus has only 65 questions in the development set and 68 questions in the test set.  
* Note: MAP/MRR scores on the two versions of TREC QA data (Clean vs Raw) are not comparable according to [http://www.cs.umd.edu/~jinfeng/publications/PairwiseNeuralNetwork_CIKM2016.pdf Rao et al. (2016)].  
+
* Note: MAP/MRR scores on the two versions of TREC QA data (Clean vs Raw) are not comparable according to [https://dl.acm.org/authorize.cfm?key=N27026 Rao et al. (2016)].  
  
  
Line 138: Line 138:
 
| 0.801
 
| 0.801
 
| 0.877
 
| 0.877
 +
|-
 +
| Wang et al.  (2017) - BiMPM
 +
| Wang et al.  (2017)
 +
| 0.802
 +
| 0.875
 
|}
 
|}
  
Line 160: Line 165:
 
* Hua He, Kevin Gimpel and Jimmy Lin. 2015. [http://aclweb.org/anthology/D/D15/D15-1181.pdf Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks]. In EMNLP 2015.
 
* Hua He, Kevin Gimpel and Jimmy Lin. 2015. [http://aclweb.org/anthology/D/D15/D15-1181.pdf Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks]. In EMNLP 2015.
 
* Hua He and Jimmy Lin. 2016. [https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement]. In NAACL 2016.
 
* Hua He and Jimmy Lin. 2016. [https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement]. In NAACL 2016.
* Liu Yang, Qingyao Ai, Jiafeng Guo, W. Bruce Croft. 2016. [http://maroo.cs.umass.edu/pub/web/getpdf.php?id=1240 aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model]. In CIKM 2016
+
* Liu Yang, Qingyao Ai, Jiafeng Guo, W. Bruce Croft. 2016. [http://maroo.cs.umass.edu/pub/web/getpdf.php?id=1240 aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model]. In CIKM 2016.
* Jinfeng Rao, Hua He and Jimmy Lin. 2016. [http://www.cs.umd.edu/~jinfeng/publications/PairwiseNeuralNetwork_CIKM2016.pdf Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks]. In CIKM 2016
+
* Jinfeng Rao, Hua He and Jimmy Lin. 2016. [https://dl.acm.org/authorize.cfm?key=N27026 Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks]. In CIKM 2016.
 
[[Category:State of the art]]
 
[[Category:State of the art]]
 +
* Zhiguo Wang, Wael Hamza and Radu Florian. 2017.  [https://arxiv.org/pdf/1702.03814.pdf Bilateral Multi-Perspective Matching for Natural Language Sentences]. In eprint arXiv:1702.03814.

Revision as of 20:28, 13 February 2017

Answer Sentence Selection

The task of answer sentence selection is designed for the open-domain question answering setting. Given a question and a set of candidate sentences, the task is to choose the correct sentence that contains the exact answer and can sufficiently support the answer choice.

  • QA Answer Sentence Selection Dataset: labeled sentences using TREC QA track data, provided by Mengqiu Wang and first used in Wang et al. (2007).
  • Over time, the original dataset diverged to two versions due to different pre-processing in recent publications: both have the same training set but their development and test sets differ. The Raw version has 82 questions in the development set and 100 questions in the test set; The Clean version (Wang and Ittycheriah et al. 2015, Tan et al. 2015, dos Santos et al. 2016, Wang et al. 2016) removed questions with no answers or with only positive/negative answers, thus has only 65 questions in the development set and 68 questions in the test set.
  • Note: MAP/MRR scores on the two versions of TREC QA data (Clean vs Raw) are not comparable according to Rao et al. (2016).


Algorithm - Raw Version of TREC QA Reference MAP MRR
Punyakanok (2004) Wang et al. (2007) 0.419 0.494
Cui (2005) Wang et al. (2007) 0.427 0.526
Wang (2007) Wang et al. (2007) 0.603 0.685
H&S (2010) Heilman and Smith (2010) 0.609 0.692
W&M (2010) Wang and Manning (2010) 0.595 0.695
Yao (2013) Yao et al. (2013) 0.631 0.748
S&M (2013) Severyn and Moschitti (2013) 0.678 0.736
Shnarch (2013) - Backward Shnarch (2013) 0.686 0.754
Yih (2013) - LCLR Yih et al. (2013) 0.709 0.770
Yu (2014) - TRAIN-ALL bigram+count Yu et al. (2014) 0.711 0.785
W&N (2015) - Three-Layer BLSTM+BM25 Wang and Nyberg (2015) 0.713 0.791
Feng (2015) - Architecture-II Tan et al. (2015) 0.711 0.800
S&M (2015) Severyn and Moschitti (2015) 0.746 0.808
Yang (2016) - Attention-Based Neural Matching Model Yang et al. (2016) 0.750 0.811
H&L (2016) - Pairwise Word Interaction Modelling He and Lin (2016) 0.758 0.822
H&L (2015) - Multi-Perspective CNN He and Lin (2015) 0.762 0.830
Rao (2016) - PairwiseRank + Multi-Perspective CNN Rao et al. (2016) 0.780 0.834


Algorithm - Clean Version of TREC QA Reference MAP MRR
W&I (2015) Wang and Ittycheriah (2015) 0.746 0.820
Tan (2015) - QA-LSTM/CNN+attention Tan et al. (2015) 0.728 0.832
dos Santos (2016) - Attentive Pooling CNN dos Santos et al. (2016) 0.753 0.851
Wang et al. (2016) - L.D.C Model Wang et al. (2016) 0.771 0.845
H&L (2015) - Multi-Perspective CNN He and Lin (2015) 0.777 0.836
Rao et al. (2016) - PairwiseRank + Multi-Perspective CNN Rao et al. (2016) 0.801 0.877
Wang et al. (2017) - BiMPM Wang et al. (2017) 0.802 0.875

References