Minh-Thang Luong

Also published as: Thang Luong


pdf pdf bib
Semi-Supervised Sequence Modeling with Cross-View Training
Kevin Clark | Minh-Thang Luong | Christopher D. Manning | Quoc Le

Unsupervised representation learning algorithms such as word2vec and ELMo improve the accuracy of many supervised NLP models, mainly because they can take advantage of large amounts of unlabeled text. However, the supervised models only learn from task-specific labeled data during the main training phase. We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data. On labeled examples, standard supervised learning is used. On unlabeled examples, CVT teaches auxiliary prediction modules that see restricted views of the input (e.g., only part of a sentence) to match the predictions of the full model seeing the whole input. Since the auxiliary modules and the full model share intermediate representations, this in turn improves the full model. Moreover, we show that CVT is particularly effective when combined with multi-task learning. We evaluate CVT on five sequence tagging tasks, machine translation, and dependency parsing, achieving state-of-the-art results.

pdf pdf bib
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation
Alexandra Birch | Andrew Finch | Thang Luong | Graham Neubig | Yusuke Oda

pdf pdf bib
Findings of the Second Workshop on Neural Machine Translation and Generation
Alexandra Birch | Andrew Finch | Minh-Thang Luong | Graham Neubig | Yusuke Oda

This document describes the findings of the Second Workshop on Neural Machine Translation and Generation, held in concert with the annual conference of the Association for Computational Linguistics (ACL 2018). First, we summarize the research trends of papers presented in the proceedings, and note that there is particular interest in linguistic structure, domain adaptation, data augmentation, handling inadequate resources, and analysis of models. Second, we describe the results of the workshop’s shared task on efficient neural machine translation, where participants were tasked with creating MT systems that are both accurate and efficient.


pdf pdf bib
Proceedings of the First Workshop on Neural Machine Translation
Thang Luong | Alexandra Birch | Graham Neubig | Andrew Finch

pdf pdf bib
Efficient Attention using a Fixed-Size Memory Representation
Denny Britz | Melody Guan | Minh-Thang Luong

The standard content-based attention mechanism typically used in sequence-to-sequence models is computationally expensive as it requires the comparison of large encoder and decoder states at each time step. In this work, we propose an alternative attention mechanism based on a fixed size memory representation that is more efficient. Our technique predicts a compact set of K attention contexts during encoding and lets the decoder compute an efficient lookup that does not need to consult the memory. We show that our approach performs on-par with the standard attention mechanism while yielding inference speedups of 20% for real-world translation tasks and more for tasks with longer sequences. By visualizing attention scores we demonstrate that our models learn distinct, meaningful alignments.

pdf pdf bib
Massive Exploration of Neural Machine Translation Architectures
Denny Britz | Anna Goldie | Minh-Thang Luong | Quoc Le

Neural Machine Translation (NMT) has shown remarkable progress over the past few years, with production systems now being deployed to end-users. As the field is moving rapidly, it has become unclear which elements of NMT architectures have a significant impact on translation quality. In this work, we present a large-scale analysis of the sensitivity of NMT architectures to common hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on a WMT English to German translation task. Our experiments provide practical insights into the relative importance of factors such as embedding size, network depth, RNN cell type, residual connections, attention mechanism, and decoding heuristics. As part of this contribution, we also release an open-source NMT framework in TensorFlow to make it easy for others to reproduce our results and perform their own experiments.


pdf pdf bib
Models and Inference for Prefix-Constrained Machine Translation
Joern Wuebker | Spence Green | John DeNero | Saša Hasan | Minh-Thang Luong

pdf pdf bib
Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
Minh-Thang Luong | Christopher D. Manning

Neural Machine Translation
Thang Luong | Kyunghyun Cho | Christopher D. Manning

Neural Machine Translation (NMT) is a simple new architecture for getting machines to learn to translate. Despite being relatively new (Kalchbrenner and Blunsom, 2013; Cho et al., 2014; Sutskever et al., 2014), NMT has already shown promising results, achieving state-of-the-art performances for various language pairs (Luong et al, 2015a; Jean et al, 2015; Luong et al, 2015b; Sennrich et al., 2016; Luong and Manning, 2016). While many of these NMT papers were presented to the ACL community, research and practice of NMT are only at their beginning stage. This tutorial would be a great opportunity for the whole community of machine translation and natural language processing to learn more about a very promising new approach to MT. This tutorial has four parts.In the first part, we start with an overview of MT approaches, including: (a) traditional methods that have been dominant over the past twenty years and (b) recent hybrid models with the use of neural network components. From these, we motivate why an end-to-end approach like neural machine translation is needed. The second part introduces a basic instance of NMT. We start out with a discussion of recurrent neural networks, including the back-propagation-through-time algorithm and stochastic gradient descent optimizers, as these are the foundation on which NMT builds. We then describe in detail the basic sequence-to-sequence architecture of NMT (Cho et al., 2014; Sutskever et al., 2014), the maximum likelihood training approach, and a simple beam-search decoder to produce translations.The third part of our tutorial describes techniques to build state-of-the-art NMT. We start with approaches to extend the vocabulary coverage of NMT (Luong et al., 2015a; Jean et al., 2015; Chitnis and DeNero, 2015). We then introduce the idea of jointly learning both translations and alignments through an attention mechanism (Bahdanau et al., 2015); other variants of attention (Luong et al., 2015b; Tu et al., 2016) are discussed too. We describe a recent trend in NMT, that is to translate at the sub-word level (Chung et al., 2016; Luong and Manning, 2016; Sennrich et al., 2016), so that language variations can be effectively handled. We then give tips on training and testing NMT systems such as batching and ensembling. In the final part of the tutorial, we briefly describe promising approaches, such as (a) how to combine multiple tasks to help translation (Dong et al., 2015; Luong et al., 2016; Firat et al., 2016; Zoph and Knight, 2016) and (b) how to utilize monolingual corpora (Sennrich et al., 2016). Lastly, we conclude with challenges remained to be solved for future NMT.PS: we would also like to acknowledge the very first paper by Forcada and Ñeco (1997) on sequence-to-sequence models for translation!

pdf pdf bib
Compression of Neural Machine Translation Models via Pruning
Abigail See | Minh-Thang Luong | Christopher D. Manning


pdf pdf bib
Addressing the Rare Word Problem in Neural Machine Translation
Thang Luong | Ilya Sutskever | Quoc Le | Oriol Vinyals | Wojciech Zaremba

pdf pdf bib
A Hierarchical Neural Autoencoder for Paragraphs and Documents
Jiwei Li | Thang Luong | Dan Jurafsky

pdf pdf bib
Deep Neural Language Models for Machine Translation
Thang Luong | Michael Kayser | Christopher D. Manning

pdf pdf bib
Learning Distributed Representations for Multilingual Text Sequences
Hieu Pham | Thang Luong | Christopher Manning

pdf pdf bib
Bilingual Word Representations with Monolingual Quality in Mind
Thang Luong | Hieu Pham | Christopher D. Manning

pdf pdf bib
Evaluating Models of Computation and Storage in Human Sentence Processing
Thang Luong | Timothy O’Donnell | Noah Goodman

pdf pdf bib
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong | Hieu Pham | Christopher D. Manning

pdf pdf bib
When Are Tree Structures Necessary for Deep Learning of Representations?
Jiwei Li | Thang Luong | Dan Jurafsky | Eduard Hovy


pdf pdf bib
Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning
Minh-Thang Luong | Michael C. Frank | Mark Johnson

Grounded language learning, the task of mapping from natural language to a representation of meaning, has attracted more and more interest in recent years. In most work on this topic, however, utterances in a conversation are treated independently and discourse structure information is largely ignored. In the context of language acquisition, this independence assumption discards cues that are important to the learner, e.g., the fact that consecutive utterances are likely to share the same referent (Frank et al., 2013). The current paper describes an approach to the problem of simultaneously modeling grounded language at the sentence and discourse levels. We combine ideas from parsing and grammar induction to produce a parser that can handle long input strings with thousands of tokens, creating parse trees that represent full discourses. By casting grounded language learning as a grammatical inference task, we use our parser to extend the work of Johnson et al. (2012), investigating the importance of discourse continuity in children’s language acquisition and its interaction with social cues. Our model boosts performance in a language acquisition task and yields good discourse segmentations compared with human annotators.

pdf pdf bib
Better Word Representations with Recursive Neural Networks for Morphology
Thang Luong | Richard Socher | Christopher Manning


pdf pdf bib
WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure
Thuy Dung Nguyen | Minh-Thang Luong

pdf pdf bib
A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages
Minh-Thang Luong | Preslav Nakov | Min-Yen Kan

pdf pdf bib
Enhancing Morphological Alignment for Translating Highly Inflected Languages
Minh-Thang Luong | Min-Yen Kan