ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation

Els Lefever1,  VĂ©ronique Hoste1,  Martine De Cock2
1LT3, University College Ghent, Belgium, 2Ghent University, Belgium


Abstract

This paper describes a set of exploratory experiments for a multilingual classification-based approach to Word Sense Disambiguation. Instead of using a predefined monolingual sense-inventory such as WordNet, we use a language-independent framework where the word senses are derived automatically from word alignments on a parallel corpus. We built five classifiers with English as an input language and translations in the five supported languages (viz. French, Dutch, Italian, Spanish and German) as classification output. The feature vectors incorporate both the more traditional local context features, as well as binary bag-of-words features that are extracted from the aligned translations. Our results show that the ParaSense multilingual WSD system shows very competitive results compared to the best systems that were evaluated on the SemEval-2010 Cross-Lingual Word Sense Disambiguation task for all five target languages.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2055.pdf