Simple and Effective Multi-Paragraph Reading Comprehension

Christopher Clark, Matt Gardner


Abstract
We introduce a method of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Most current question answering models cannot scale to document or multi-document input, and naively applying these models to each paragraph independently often results in them being distracted by irrelevant text. We show that it is possible to significantly improve performance by using a modified training scheme that teaches the model to ignore non-answer containing paragraphs. Our method involves sampling multiple paragraphs from each document, and using an objective function that requires the model to produce globally correct output. We additionally identify and improve upon a number of other design decisions that arise when working with document-level data. Experiments on TriviaQA and SQuAD shows our method advances the state of the art, including a 10 point gain on TriviaQA.
Anthology ID:
P18-1078
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
845–855
Language:
URL:
https://aclanthology.org/P18-1078
DOI:
10.18653/v1/P18-1078
Bibkey:
Cite (ACL):
Christopher Clark and Matt Gardner. 2018. Simple and Effective Multi-Paragraph Reading Comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 845–855, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Simple and Effective Multi-Paragraph Reading Comprehension (Clark & Gardner, ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-1078.pdf
Presentation:
 P18-1078.Presentation.pdf
Video:
 https://aclanthology.org/P18-1078.mp4
Code
 allenai/document-qa
Data
SQuADTriviaQA