End-to-End Content and Plan Selection for Data-to-Text Generation

Sebastian Gehrmann, Falcon Dai, Henry Elder, Alexander Rush


Abstract
Learning to generate fluent natural language from structured data with neural networks has become an common approach for NLG. This problem can be challenging when the form of the structured data varies between examples. This paper presents a survey of several extensions to sequence-to-sequence models to account for the latent content selection process, particularly variants of copy attention and coverage decoding. We further propose a training method based on diverse ensembling to encourage models to learn distinct sentence templates during training. An empirical evaluation of these techniques shows an increase in the quality of generated text across five automated metrics, as well as human evaluation.
Anthology ID:
W18-6505
Volume:
Proceedings of the 11th International Conference on Natural Language Generation
Month:
November
Year:
2018
Address:
Tilburg University, The Netherlands
Editors:
Emiel Krahmer, Albert Gatt, Martijn Goudbeek
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–56
Language:
URL:
https://aclanthology.org/W18-6505
DOI:
10.18653/v1/W18-6505
Bibkey:
Cite (ACL):
Sebastian Gehrmann, Falcon Dai, Henry Elder, and Alexander Rush. 2018. End-to-End Content and Plan Selection for Data-to-Text Generation. In Proceedings of the 11th International Conference on Natural Language Generation, pages 46–56, Tilburg University, The Netherlands. Association for Computational Linguistics.
Cite (Informal):
End-to-End Content and Plan Selection for Data-to-Text Generation (Gehrmann et al., INLG 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6505.pdf
Code
 sebastianGehrmann/diverse_ensembling