Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity

Glorianna Jagfeld, Sabrina Jenne, Ngoc Thang Vu


Abstract
We present a comparison of word-based and character-based sequence-to-sequence models for data-to-text natural language generation, which generate natural language descriptions for structured inputs. On the datasets of two recent generation challenges, our models achieve comparable or better automatic evaluation results than the best challenge submissions. Subsequent detailed statistical and human analyses shed light on the differences between the two input representations and the diversity of the generated texts. In a controlled experiment with synthetic training data generated from templates, we demonstrate the ability of neural models to learn novel combinations of the templates and thereby generalize beyond the linguistic structures they were trained on.
Anthology ID:
W18-6529
Volume:
Proceedings of the 11th International Conference on Natural Language Generation
Month:
November
Year:
2018
Address:
Tilburg University, The Netherlands
Editors:
Emiel Krahmer, Albert Gatt, Martijn Goudbeek
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
221–232
Language:
URL:
https://aclanthology.org/W18-6529
DOI:
10.18653/v1/W18-6529
Bibkey:
Cite (ACL):
Glorianna Jagfeld, Sabrina Jenne, and Ngoc Thang Vu. 2018. Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity. In Proceedings of the 11th International Conference on Natural Language Generation, pages 221–232, Tilburg University, The Netherlands. Association for Computational Linguistics.
Cite (Informal):
Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity (Jagfeld et al., INLG 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6529.pdf
Data
WebNLG