Improving Neural Machine Translation by Incorporating Hierarchical Subword Features

Makoto Morishita, Jun Suzuki, Masaaki Nagata


Abstract
This paper focuses on subword-based Neural Machine Translation (NMT). We hypothesize that in the NMT model, the appropriate subword units for the following three modules (layers) can differ: (1) the encoder embedding layer, (2) the decoder embedding layer, and (3) the decoder output layer. We find the subword based on Sennrich et al. (2016) has a feature that a large vocabulary is a superset of a small vocabulary and modify the NMT model enables the incorporation of several different subword units in a single embedding layer. We refer these small subword features as hierarchical subword features. To empirically investigate our assumption, we compare the performance of several different subword units and hierarchical subword features for both the encoder and decoder embedding layers. We confirmed that incorporating hierarchical subword features in the encoder consistently improves BLEU scores on the IWSLT evaluation datasets.
Anthology ID:
C18-1052
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
618–629
Language:
URL:
https://aclanthology.org/C18-1052
DOI:
Bibkey:
Cite (ACL):
Makoto Morishita, Jun Suzuki, and Masaaki Nagata. 2018. Improving Neural Machine Translation by Incorporating Hierarchical Subword Features. In Proceedings of the 27th International Conference on Computational Linguistics, pages 618–629, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Improving Neural Machine Translation by Incorporating Hierarchical Subword Features (Morishita et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1052.pdf