Leveraging Meta-Embeddings for Bilingual Lexicon Extraction from Specialized Comparable Corpora

Amir Hazem, Emmanuel Morin


Abstract
Recent evaluations on bilingual lexicon extraction from specialized comparable corpora have shown contrasted performance while using word embedding models. This can be partially explained by the lack of large specialized comparable corpora to build efficient representations. Within this context, we try to answer the following questions: First, (i) among the state-of-the-art embedding models, whether trained on specialized corpora or pre-trained on large general data sets, which one is the most appropriate model for bilingual terminology extraction? Second (ii) is it worth it to combine multiple embeddings trained on different data sets? For that purpose, we propose the first systematic evaluation of different word embedding models for bilingual terminology extraction from specialized comparable corpora. We emphasize how the character-based embedding model outperforms other models on the quality of the extracted bilingual lexicons. Further more, we propose a new efficient way to combine different embedding models learned from specialized and general-domain data sets. Our approach leads to higher performance than the best individual embedding model.
Anthology ID:
C18-1080
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
937–949
Language:
URL:
https://aclanthology.org/C18-1080
DOI:
Bibkey:
Cite (ACL):
Amir Hazem and Emmanuel Morin. 2018. Leveraging Meta-Embeddings for Bilingual Lexicon Extraction from Specialized Comparable Corpora. In Proceedings of the 27th International Conference on Computational Linguistics, pages 937–949, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Leveraging Meta-Embeddings for Bilingual Lexicon Extraction from Specialized Comparable Corpora (Hazem & Morin, COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1080.pdf