Phrase Level Segmentation and Labelling of Machine Translation Errors

Frédéric Blain, Varvara Logacheva, Lucia Specia


Abstract
This paper presents our work towards a novel approach for Quality Estimation (QE) of machine translation based on sequences of adjacent words, the so-called phrases. This new level of QE aims to provide a natural balance between QE at word and sentence-level, which are either too fine grained or too coarse levels for some applications. However, phrase-level QE implies an intrinsic challenge: how to segment a machine translation into sequence of words (contiguous or not) that represent an error. We discuss three possible segmentation strategies to automatically extract erroneous phrases. We evaluate these strategies against annotations at phrase-level produced by humans, using a new dataset collected for this purpose.
Anthology ID:
L16-1356
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2240–2245
Language:
URL:
https://aclanthology.org/L16-1356
DOI:
Bibkey:
Cite (ACL):
Frédéric Blain, Varvara Logacheva, and Lucia Specia. 2016. Phrase Level Segmentation and Labelling of Machine Translation Errors. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2240–2245, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Phrase Level Segmentation and Labelling of Machine Translation Errors (Blain et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1356.pdf