Char2char Generation with Reranking for the E2E NLG Challenge

Shubham Agarwal, Marc Dymetman, Éric Gaussier


Abstract
This paper describes our submission to the E2E NLG Challenge. Recently, neural seq2seq approaches have become mainstream in NLG, often resorting to pre- (respectively post-) processing delexicalization (relexicalization) steps at the word-level to handle rare words. By contrast, we train a simple character level seq2seq model, which requires no pre/post-processing (delexicalization, tokenization or even lowercasing), with surprisingly good results. For further improvement, we explore two re-ranking approaches for scoring candidates. We also introduce a synthetic dataset creation procedure, which opens up a new way of creating artificial datasets for Natural Language Generation.
Anthology ID:
W18-6555
Volume:
Proceedings of the 11th International Conference on Natural Language Generation
Month:
November
Year:
2018
Address:
Tilburg University, The Netherlands
Editors:
Emiel Krahmer, Albert Gatt, Martijn Goudbeek
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
451–456
Language:
URL:
https://aclanthology.org/W18-6555
DOI:
10.18653/v1/W18-6555
Bibkey:
Cite (ACL):
Shubham Agarwal, Marc Dymetman, and Éric Gaussier. 2018. Char2char Generation with Reranking for the E2E NLG Challenge. In Proceedings of the 11th International Conference on Natural Language Generation, pages 451–456, Tilburg University, The Netherlands. Association for Computational Linguistics.
Cite (Informal):
Char2char Generation with Reranking for the E2E NLG Challenge (Agarwal et al., INLG 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6555.pdf
Data
E2E