Tagging Spanish Texts: the Problem of Problem of “SE

Guadalupe Aguado de Cea, Javier Puche, José Ángel Ramos


Abstract
Automatic tagging in Spanish has historically faced many problems because of some specific grammatical constructions. One of these traditional pitfalls is the “se” particle. This particle is a multifunctional and polysemous word used in many different contexts. Many taggers do not distinguish the possible uses of “se” and thus provide poor results at this point. In tune with the philosophy of free software, we have taken a free annotation tool as a basis, we have improved and enhanced its behaviour by adding new rules at different levels and by modifying certain parts in the code to allow for its possible implementation in other EAGLES-compliant tools. In this paper, we present the analysis carried out with different annotators for selecting the tool, the results obtained in all cases as well as the improvements added and the advantages of the modified tagger.
Anthology ID:
L08-1243
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/345_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Guadalupe Aguado de Cea, Javier Puche, and José Ángel Ramos. 2008. Tagging Spanish Texts: the Problem of Problem of “SE”. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Tagging Spanish Texts: the Problem of Problem of “SE” (de Cea et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/345_paper.pdf