Confidence-Weighted Learning of Factored Discriminative Language Models

Viet Ha Thuc1 and Nicola Cancedda2
1Computer Science Department, University of Iowa, 2Xerox Research Centre Europe


Abstract

Language models based on word surface forms only are unable to benefit from available linguistic knowledge, and tend to suffer from poor estimates for rare features. We propose an approach to overcome these two limitations. We use factored features that can flexibly capture linguistic regularities, and we adopt confidence-weighted learning, a form of discriminative online learning that can better take advantage of a heavy tail of rare features. Finally, we extend the confidence-weighted learning to deal with noise in training data, a common case with discriminative language modeling.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2077.pdf