An Unsupervised Model for Joint Phrase Alignment and Extraction

Graham Neubig1,  Taro Watanabe2,  Eiichiro Sumita2,  Shinsuke Mori3,  Tatsuya Kawahara3
1Kyoto University/National Institute of Information and Communications Technology, 2National Institute of Information and Communications Technology, 3Kyoto University


Abstract

We present an unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1064.pdf