QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

Arantxa Otegi, Nora Aranberri, Antonio Branco, Jan Hajič, Martin Popel, Kiril Simov, Eneko Agirre, Petya Osenova, Rita Pereira, João Silva, Steven Neale


Abstract
This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.
Anthology ID:
L16-1483
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3023–3030
Language:
URL:
https://aclanthology.org/L16-1483
DOI:
Bibkey:
Cite (ACL):
Arantxa Otegi, Nora Aranberri, Antonio Branco, Jan Hajič, Martin Popel, Kiril Simov, Eneko Agirre, Petya Osenova, Rita Pereira, João Silva, and Steven Neale. 2016. QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3023–3030, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (Otegi et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1483.pdf