Difference between revisions of "Question Answering (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
Line 3: Line 3:
 
The task of answer sentence selection is designed for the open-domain question answering setting. Given a question and a set of candidate sentences, the task is to choose the correct sentence that contains the exact answer and can sufficiently support the answer choice.  
 
The task of answer sentence selection is designed for the open-domain question answering setting. Given a question and a set of candidate sentences, the task is to choose the correct sentence that contains the exact answer and can sufficiently support the answer choice.  
  
* [http://cs.stanford.edu/people/mengqiu/data/qg-emnlp07-data.tgz QA Answer Sentence Selection Dataset]: labeled sentences using TREC QA track data, provided by [http://cs.stanford.edu/people/mengqiu/ Mengqiu Wang] and first used in [http://www.aclweb.org/anthology/D/D07/D07-1003.pdf Wang et al. (2007)].
+
* [http://cs.stanford.edu/people/mengqiu/data/qg-emnlp07-data.tgz QA Answer Sentence Selection Dataset]: labeled sentences using TREC QA track data, provided by [http://cs.stanford.edu/people/mengqiu/ Mengqiu Wang] and first used in [http://www.aclweb.org/anthology/D/D07/D07-1003.pdf Wang et al. (2007)]. Over time, this dataset diverged to two versions: both have the same training set but their development and test sets differ due to different pre-processing. The raw version has 82 questions in the development set and 100 questions in the test set. Recently, the raw dataset was cleaned by some researchers (Tan et al. 2015, dos Santos et al. 2016, Wang et al. 2016) by removing those questions with no answers or with only positive/negative answers, leaving 65 questions in the development set and 68 questions in the test set.
  
  
 
{| border="1" cellpadding="5" cellspacing="1"
 
{| border="1" cellpadding="5" cellspacing="1"
 
|-
 
|-
! Algorithm
+
! Algorithm -- Raw Version
 
! Reference
 
! Reference
 
! [http://en.wikipedia.org/wiki/Mean_average_precision MAP]
 
! [http://en.wikipedia.org/wiki/Mean_average_precision MAP]
Line 82: Line 82:
 
| 0.746
 
| 0.746
 
| 0.820
 
| 0.820
 +
|-
 +
| H&L (2015)
 +
| He and Lin (2016)
 +
| 0.755
 +
| 0.825
 +
|-
 +
| Rao (2016) - Pairwise + Multiple Perspective CNN
 +
| Rao et al. (2016)
 +
| 0.780
 +
| 0.834
 +
|}
 +
 +
 +
{| border="1" cellpadding="5" cellspacing="1"
 +
|-
 +
! Algorithm -- Clean Version
 +
! Reference
 +
! [http://en.wikipedia.org/wiki/Mean_average_precision MAP]
 +
! [http://en.wikipedia.org/wiki/Mean_reciprocal_rank MRR]
 
|-
 
|-
 
| Tan (2015) - QA-LSTM/CNN+attention  
 
| Tan (2015) - QA-LSTM/CNN+attention  
Line 97: Line 116:
 
| 0.771
 
| 0.771
 
| 0.845
 
| 0.845
 +
|-
 +
| Rao et al.  (2016) - Pairwise + Multiple Perspective CNN
 +
| Rao et al. (2016)
 +
| 0.801
 +
| 0.877
 
|}
 
|}
  
Line 117: Line 141:
 
* Cicero dos Santos, Ming Tan, Bing Xiang & Bowen Zhou. 2016. [http://arxiv.org/abs/1602.03609 Attentive Pooling Networks]. In eprint arXiv:1602.03609.
 
* Cicero dos Santos, Ming Tan, Bing Xiang & Bowen Zhou. 2016. [http://arxiv.org/abs/1602.03609 Attentive Pooling Networks]. In eprint arXiv:1602.03609.
 
* Zhiguo Wang, Haitao Mi and Abraham Ittycheriah. 2016. [http://arxiv.org/pdf/1602.07019v1.pdf Sentence Similarity Learning by Lexical Decomposition and Composition]. In eprint arXiv:1602.07019.
 
* Zhiguo Wang, Haitao Mi and Abraham Ittycheriah. 2016. [http://arxiv.org/pdf/1602.07019v1.pdf Sentence Similarity Learning by Lexical Decomposition and Composition]. In eprint arXiv:1602.07019.
 +
* Hua He and Jimmy Lin. 2016. [https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement]. In NAACL 2016.
 +
* Jinfeng Rao, Hua He and Jimmy Lin. 2016. [http://www.cs.umd.edu/~jinfeng/publications/PairwiseNeuralNetwork_CIKM2016.pdf Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks]. In CIKM 2016
 
[[Category:State of the art]]
 
[[Category:State of the art]]

Revision as of 10:17, 20 October 2016

Answer Sentence Selection

The task of answer sentence selection is designed for the open-domain question answering setting. Given a question and a set of candidate sentences, the task is to choose the correct sentence that contains the exact answer and can sufficiently support the answer choice.

  • QA Answer Sentence Selection Dataset: labeled sentences using TREC QA track data, provided by Mengqiu Wang and first used in Wang et al. (2007). Over time, this dataset diverged to two versions: both have the same training set but their development and test sets differ due to different pre-processing. The raw version has 82 questions in the development set and 100 questions in the test set. Recently, the raw dataset was cleaned by some researchers (Tan et al. 2015, dos Santos et al. 2016, Wang et al. 2016) by removing those questions with no answers or with only positive/negative answers, leaving 65 questions in the development set and 68 questions in the test set.


Algorithm -- Raw Version Reference MAP MRR
Punyakanok (2004) Wang et al. (2007) 0.419 0.494
Cui (2005) Wang et al. (2007) 0.427 0.526
Wang (2007) Wang et al. (2007) 0.603 0.685
H&S (2010) Heilman and Smith (2010) 0.609 0.692
W&M (2010) Wang and Manning (2010) 0.595 0.695
Yao (2013) Yao et al. (2013) 0.631 0.748
S&M (2013) Severyn and Moschitti (2013) 0.678 0.736
Shnarch (2013) - Backward Shnarch (2013) 0.686 0.754
Yih (2013) - LCLR Yih et al. (2013) 0.709 0.770
Yu (2014) - TRAIN-ALL bigram+count Yu et al. (2014) 0.711 0.785
W&N (2015) - Three-Layer BLSTM+BM25 Wang and Nyberg (2015) 0.713 0.791
Feng (2015) - Architecture-II Tan et al. (2015) 0.711 0.800
S&M (2015) Severyn and Moschitti (2015) 0.746 0.808
W&I (2015) Wang and Ittycheriah (2015) 0.746 0.820
H&L (2015) He and Lin (2016) 0.755 0.825
Rao (2016) - Pairwise + Multiple Perspective CNN Rao et al. (2016) 0.780 0.834


Algorithm -- Clean Version Reference MAP MRR
Tan (2015) - QA-LSTM/CNN+attention Tan et al. (2015) 0.728 0.832
dos Santos (2016) - Attentive Pooling CNN dos Santos et al. (2016) 0.753 0.851
Wang et al. (2016) - Lexical Decomposition and Composition Wang et al. (2016) 0.771 0.845
Rao et al. (2016) - Pairwise + Multiple Perspective CNN Rao et al. (2016) 0.801 0.877

References