Cross-Corpus Evaluation of Word Alignment

Sylwia Ozdowska


Abstract
We present the procedures we implemented to carry out system oriented evaluation of a syntax-based word aligner, ALIBI. While cross-corpus evaluation is still relatively rare in NLP, we take the approach of regarding cross-corpus evaluation as part of system oriented evaluation. Our hypothesis is that the granularity of alignments and the level of syntactic correspondence depend on corpus type; our objective is to assess how this impacts on alignment quality. We test our system on three English-French parallel corpora. The evaluation procedures are defined in accordance with state-of-the-art word alignment evaluation principles. They include, for each corpus, the creation of a reference set containing multiple annotations of the same data, the assessment of inter-annotator agreement rates and an analysis of the reference set obtained. We show that alignment performance varies across corpora according to the multiple reference annotations produced and further motivate our choice of preserving all reference annotations without solving disagreements between annotators.
Anthology ID:
L08-1380
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/207_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Sylwia Ozdowska. 2008. Cross-Corpus Evaluation of Word Alignment. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Cross-Corpus Evaluation of Word Alignment (Ozdowska, LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/207_paper.pdf