Domain Adaptation for Machine Translation by Mining Unseen Words

Hal Daume III1 and Jagadeesh Jagarlamudi2
1Univeristy of Maryland, 2University of Maryland


Abstract

We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrase based translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2071.pdf