Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

Bin Lu1,  Chenhao Tan2,  Claire Cardie2,  Benjamin K. Tsou3
1City University of Hong Kong, Hong Kong Institute of Education, 2Cornell University, 3Hong Kong Institute of Education, City University of Hong Kong


Abstract

Most previous work on multilingual sentiment analysis has focused on methods to adapt sentiment resources from resource-rich languages to resource-poor languages. We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data. We rely on the intuition that the sentiment labels for parallel sentences should be similar and present a model that jointly learns improved mono-lingual sentiment classifiers for each language. Experiments on multiple data sets show that the proposed approach (1) outperforms the mono-lingual baselines, significantly improving the accuracy for both languages by 3.44%-8.12%; (2) outperforms two standard approaches for leveraging unlabeled data; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1033.pdf