Two Easy Improvements to Lexical Weighting

David Chiang,  Steve DeNeefe,  Michael Pust
USC Information Sciences Institute


Abstract

We introduce two simple improvements to the lexical weighting features of Koehn, Och, and Marcu for machine translation: one which smooths the probability of translating word f to word e by simplifying English morphology, and one which conditions it on the kind of training data that f and e co-occurred in. These new variations lead to improvements of up to +0.8 BLEU, with an average improvement of +0.6 BLEU across two language pairs, two genres, and two translation systems.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2080.pdf