Deep Neural Machine Translation with Linear Associative Unit

Mingxuan Wang, Zhengdong Lu, Jie Zhou, Qun Liu


Abstract
Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art Neural Machine Translation (NMT) with its capability in modeling complex functions and capturing complex linguistic structures. However NMT with deep architecture in its encoder or decoder RNNs often suffer from severe gradient diffusion due to the non-linear recurrent activations, which often makes the optimization much more difficult. To address this problem we propose a novel linear associative units (LAU) to reduce the gradient propagation path inside the recurrent unit. Different from conventional approaches (LSTM unit and GRU), LAUs uses linear associative connections between input and output of the recurrent unit, which allows unimpeded information flow through both space and time The model is quite simple, but it is surprisingly effective. Our empirical study on Chinese-English translation shows that our model with proper configuration can improve by 11.7 BLEU upon Groundhog and the best reported on results in the same setting. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.
Anthology ID:
P17-1013
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
136–145
Language:
URL:
https://aclanthology.org/P17-1013
DOI:
10.18653/v1/P17-1013
Bibkey:
Cite (ACL):
Mingxuan Wang, Zhengdong Lu, Jie Zhou, and Qun Liu. 2017. Deep Neural Machine Translation with Linear Associative Unit. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 136–145, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Deep Neural Machine Translation with Linear Associative Unit (Wang et al., ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-1013.pdf
Video:
 https://aclanthology.org/P17-1013.mp4