Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary

Meng Fang, Trevor Cohn


Abstract
Cross-lingual model transfer is a compelling and popular method for predicting annotations in a low-resource language, whereby parallel corpora provide a bridge to a high-resource language, and its associated annotated corpora. However, parallel data is not readily available for many languages, limiting the applicability of these approaches. We address these drawbacks in our framework which takes advantage of cross-lingual word embeddings trained solely on a high coverage dictionary. We propose a novel neural network model for joint training from both sources of data based on cross-lingual word embeddings, and show substantial empirical improvements over baseline techniques. We also propose several active learning heuristics, which result in improvements over competitive benchmark methods.
Anthology ID:
P17-2093
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
587–593
Language:
URL:
https://aclanthology.org/P17-2093
DOI:
10.18653/v1/P17-2093
Bibkey:
Cite (ACL):
Meng Fang and Trevor Cohn. 2017. Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 587–593, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary (Fang & Cohn, ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-2093.pdf
Dataset:
 P17-2093.Datasets.zip
Code
 mengf1/trpos