Portuguese-English Word Alignment: some Experiments

Diana Santos, Alberto Simões


Abstract
In this paper we describe some studies of Portuguese-English word alignment, focusing on (i) measuring the importance of the coupling between dictionaries and corpus; (ii) assessing the relevance of using syntactic information (POS and lemma) or just word forms, and (iii) taking into account the direction of translation. We first provide some motivation for the studies, as well as insist in separating type from token anlignment. We then briefly describe the resources employed: the EuroParl and COMPARA corpora, and the alignment tools, NATools, introducing some measures to evaluate the two kinds of dictionaries obtained. We then present the results of several experiments, comparing sizes, overlap, translation fertility and alignment density of the several bilingual resources built. We also describe preliminary data as far as quality of the resulting dictionaries or alignment results is concerned.
Anthology ID:
L08-1364
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/760_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Diana Santos and Alberto Simões. 2008. Portuguese-English Word Alignment: some Experiments. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Portuguese-English Word Alignment: some Experiments (Santos & Simões, LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/760_paper.pdf