The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

Mia Xu Chen; Orhan Firat; Ankur Bapna; Melvin Johnson; Wolfgang Macherey; George Foster; Llion Jones; Mike Schuster; Noam Shazeer; Niki Parmar; Ashish Vaswani; Jakob Uszkoreit; Łukasz Kaiser; Zhifeng Chen; Yonghui Wu; Macduff Hughes

doi:10.18653/v1/P18-1008

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, Macduff Hughes

Abstract

The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT’14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets.

Anthology ID:: P18-1008
Volume:: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2018
Address:: Melbourne, Australia
Editors:: Iryna Gurevych, Yusuke Miyao
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 76–86
Language:
URL:: https://aclanthology.org/P18-1008
DOI:: 10.18653/v1/P18-1008
Bibkey:
Cite (ACL):: Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, and Macduff Hughes. 2018. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 76–86, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):: The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation (Chen et al., ACL 2018)
Copy Citation:
PDF:: https://aclanthology.org/P18-1008.pdf
Note:: P18-1008.Notes.pdf
Presentation:: P18-1008.Presentation.pdf
Video:: https://aclanthology.org/P18-1008.mp4
Code: tensorflow/lingvo + additional community code
Data: WMT 2014

PDF Cite Search Code Note Presentation Video