Difference between revisions of "Question Answering (State of the art)"
Jump to navigation
Jump to search
Line 124: | Line 124: | ||
| 0.852 | | 0.852 | ||
| 0.891 | | 0.891 | ||
− | - | + | |- |
| Laskar et al. (2020) - CETE (RoBERTa-Large) | | Laskar et al. (2020) - CETE (RoBERTa-Large) | ||
| Laskar et al. (2020) | | Laskar et al. (2020) | ||
Line 218: | Line 218: | ||
| 0.943 | | 0.943 | ||
| 0.974 | | 0.974 | ||
− | - | + | |- |
| Laskar et al. (2020) - CETE (RoBERTa-Large) | | Laskar et al. (2020) - CETE (RoBERTa-Large) | ||
| Laskar et al. (2020) | | Laskar et al. (2020) |
Revision as of 10:23, 15 May 2020
Answer Sentence Selection
The task of answer sentence selection is designed for the open-domain question answering setting. Given a question and a set of candidate sentences, the task is to choose the correct sentence that contains the exact answer and can sufficiently support the answer choice.
- QA Answer Sentence Selection Dataset: labeled sentences using TREC QA track data, provided by Mengqiu Wang and first used in Wang et al. (2007).
- Over time, the original dataset diverged to two versions due to different pre-processing in recent publications: both have the same training set but their development and test sets differ. The Raw version has 82 questions in the development set and 100 questions in the test set; The Clean version (Wang and Ittycheriah et al. 2015, Tan et al. 2015, dos Santos et al. 2016, Wang et al. 2016) removed questions with no answers or with only positive/negative answers, thus has only 65 questions in the development set and 68 questions in the test set.
- Note: MAP/MRR scores on the two versions of TREC QA data (Clean vs Raw) are not comparable according to Rao et al. (2016).
Algorithm - Raw Version of TREC QA | Reference | MAP | MRR |
---|---|---|---|
Punyakanok (2004) | Wang et al. (2007) | 0.419 | 0.494 |
Cui (2005) | Wang et al. (2007) | 0.427 | 0.526 |
Wang (2007) | Wang et al. (2007) | 0.603 | 0.685 |
H&S (2010) | Heilman and Smith (2010) | 0.609 | 0.692 |
W&M (2010) | Wang and Manning (2010) | 0.595 | 0.695 |
Yao (2013) | Yao et al. (2013) | 0.631 | 0.748 |
S&M (2013) | Severyn and Moschitti (2013) | 0.678 | 0.736 |
Shnarch (2013) - Backward | Shnarch (2013) | 0.686 | 0.754 |
Yih (2013) - LCLR | Yih et al. (2013) | 0.709 | 0.770 |
Yu (2014) - TRAIN-ALL bigram+count | Yu et al. (2014) | 0.711 | 0.785 |
W&N (2015) - Three-Layer BLSTM+BM25 | Wang and Nyberg (2015) | 0.713 | 0.791 |
Feng (2015) - Architecture-II | Tan et al. (2015) | 0.711 | 0.800 |
S&M (2015) | Severyn and Moschitti (2015) | 0.746 | 0.808 |
Yang (2016) - Attention-Based Neural Matching Model | Yang et al. (2016) | 0.750 | 0.811 |
Tay (2017) - Holographic Dual LSTM Architecture | Tay et al. (2017) | 0.750 | 0.815 |
H&L (2016) - Pairwise Word Interaction Modelling | He and Lin (2016) | 0.758 | 0.822 |
H&L (2015) - Multi-Perspective CNN | He and Lin (2015) | 0.762 | 0.830 |
Tay (2017) - HyperQA (Hyperbolic Embeddings) | Tay et al. (2017) | 0.770 | 0.825 |
Rao (2016) - PairwiseRank + Multi-Perspective CNN | Rao et al. (2016) | 0.780 | 0.834 |
Rao (2019) - Hybrid Co-Attention Network (HCAN) | Rao et al. (2019) | 0.774 | 0.843 |
Tayyar Madabushi (2018) - Question Classification + PairwiseRank + Multi-Perspective CNN | Tayyar Madabushi et al. (2018) | 0.836 | 0.863 |
Kamath (2019) - Question Classification + RNN + Pre-Attention | Kamath et al. (2019) | 0.852 | 0.891 |
Laskar et al. (2020) - CETE (RoBERTa-Large) | Laskar et al. (2020) | 0.950 | 0.980 |
Algorithm - Clean Version of TREC QA | Reference | MAP | MRR |
---|---|---|---|
W&I (2015) | Wang and Ittycheriah (2015) | 0.746 | 0.820 |
Tan (2015) - QA-LSTM/CNN+attention | Tan et al. (2015) | 0.728 | 0.832 |
dos Santos (2016) - Attentive Pooling CNN | dos Santos et al. (2016) | 0.753 | 0.851 |
Wang et al. (2016) - L.D.C Model | Wang et al. (2016) | 0.771 | 0.845 |
H&L (2015) - Multi-Perspective CNN | He and Lin (2015) | 0.777 | 0.836 |
Tay et al. (2017) - HyperQA (Hyperbolic Embeddings) | Tay et al. (2017) | 0.784 | 0.865 |
Rao et al. (2016) - PairwiseRank + Multi-Perspective CNN | Rao et al. (2016) | 0.801 | 0.877 |
Wang et al. (2017) - BiMPM | Wang et al. (2017) | 0.802 | 0.875 |
Bian et al. (2017) - Compare-Aggregate | Bian et al. (2017) | 0.821 | 0.899 |
Shen et al. (2017) - IWAN | Shen et al. (2017) | 0.822 | 0.889 |
Tran et al. (2018) - IWAN + sCARNN | Tran et al. (2018) | 0.829 | 0.875 |
Tay et al. (2018) - Multi-Cast Attention Networks (MCAN) | Tay et al. (2018) | 0.838 | 0.904 |
Tayyar Madabushi (2018) - Question Classification + PairwiseRank + Multi-Perspective CNN | Tayyar Madabushi et al. (2018) | 0.865 | 0.904 |
Yoon et al. (2019) - Compare-Aggregate + LanguageModel + LatentClustering | Yoon et al. (2019) | 0.868 | 0.928 |
Lai et al. (2019) - BERT + GSAMN + Transfer Learning | Lai et al. (2019) | 0.914 | 0.957 |
Garg et al. (2019) - TANDA-RoBERTa (ASNQ, TREC-QA) | Garg et al. (2019) | 0.943 | 0.974 |
Laskar et al. (2020) - CETE (RoBERTa-Large) | Laskar et al. (2020) | 0.936 | 0.978 |
References
- Vasin Punyakanok, Dan Roth, and Wen-Tau Yih. 2004. Mapping dependencies trees: An application to question answering. In Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, FL, USA.
- Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-Seng Chua. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th ACM-SIGIR International Conference on Research and Development in Information Retrieval, Salvador, Brazil.
- Wang, Mengqiu and Smith, Noah A. and Mitamura, Teruko. 2007. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA. In EMNLP-CoNLL 2007.
- Heilman, Michael and Smith, Noah A. 2010. Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions. In NAACL-HLT 2010.
- Wang, Mengqiu and Manning, Christopher. 2010. Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering. In COLING 2010.
- E. Shnarch. 2013. Probabilistic Models for Lexical Inference. Ph.D. thesis, Bar Ilan University.
- Yao, Xuchen and Van Durme, Benjamin and Callison-Burch, Chris and Clark, Peter. 2013. Answer Extraction as Sequence Tagging with Tree Edit Distance. In NAACL-HLT 2013.
- Yih, Wen-tau and Chang, Ming-Wei and Meek, Christopher and Pastusiak, Andrzej. 2013. Question Answering Using Enhanced Lexical Semantic Models. In ACL 2013.
- Severyn, Aliaksei and Moschitti, Alessandro. 2013. Automatic Feature Engineering for Answer Selection and Extraction. In EMNLP 2013.
- Lei Yu, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. 2014. Deep Learning for Answer Sentence Selection. In NIPS deep learning workshop.
- Di Wang and Eric Nyberg. 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering. In ACL 2015.
- Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, Bowen Zhou. 2015. Applying deep learning to answer selection: A study and an open task. In ASRU 2015.
- Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks. In SIGIR 2015.
- Zhiguo Wang and Abraham Ittycheriah. 2015. FAQ-based Question Answering via Word Alignment. In eprint arXiv:1507.02628.
- Ming Tan, Cicero dos Santos, Bing Xiang & Bowen Zhou. 2015. LSTM-Based Deep Learning Models for Nonfactoid Answer Selection. In eprint arXiv:1511.04108.
- Cicero dos Santos, Ming Tan, Bing Xiang & Bowen Zhou. 2016. Attentive Pooling Networks. In eprint arXiv:1602.03609.
- Zhiguo Wang, Haitao Mi and Abraham Ittycheriah. 2016. Sentence Similarity Learning by Lexical Decomposition and Composition. In Coling 2016.
- Hua He, Kevin Gimpel and Jimmy Lin. 2015. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks. In EMNLP 2015.
- Hua He and Jimmy Lin. 2016. Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement. In NAACL 2016.
- Liu Yang, Qingyao Ai, Jiafeng Guo, W. Bruce Croft. 2016. aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model. In CIKM 2016.
- Jinfeng Rao, Hua He and Jimmy Lin. 2016. Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks. In CIKM 2016.
- Yi Tay, Minh C. Phan, Luu Anh Tuan and Siu Cheung Hui. 2017 Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture. In SIGIR 2017.
- Yi Tay, Luu Anh Tuan, Siu Cheung Hui. 2017 Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks. In eprint arXiv: 1707.07847.
- Zhiguo Wang, Wael Hamza and Radu Florian. 2017. Bilateral Multi-Perspective Matching for Natural Language Sentences. In eprint arXiv:1702.03814.
- Weijie Bian, Si Li, Zhao Yang, Guang Chen, Zhiqing Lin. 2017. A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection. In CIKM 2017.
- Gehui Shen, Yunlun Yang, Zhi-Hong Deng. 2017. Inter-Weighted Alignment Network for Sentence Pair Modeling.. In EMNLP 2017.
- Quan Hung Tran, Tuan Manh Lai, Gholamreza Haffari, Ingrid Zukerman, Trung Bui, Hung Bui, The Context-dependent Additive Recurrent Neural Net, In NAACL 2018
- Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Multi-Cast Attention Networks, In KDD 2018
- Harish Tayyar Madabushi, Mark Lee and John Barnden. Integrating Question Classification and Deep Learning for improved Answer Selection, In COLING 2018
- Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung. 2019. A Compare-Aggregate Model with Latent Clustering for Answer Selection. In CIKM 2019.
- Sanjay Kamath, Brigitte Grau and Yue Ma. 2019. Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection. In CICLING 2019
- Jinfeng Rao, Linqing Liu, Yi Tay, Wei Yang, Peng Shi, Jimmy Lin, Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling, In EMNLP 2019
- Tuan Lai, Quan Hung Tran, Trung Bui, Daisuke Kihara, A Gated Self-attention Memory Network for Answer Selection, In EMNLP 2019
- Siddhant Garg, Thuy Vu, Alessandro Moschitti, TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection, in AAAI 2020