Syntactic Dependencies for Multilingual and Multilevel Corpus Annotation

Simon Mille, Leo Wanner


Abstract
The relevance of syntactic dependency annotated corpora is nowadays unquestioned. However, a broad debate on the optimal set of dependency relation tags did not take place yet. As a result, largely varying tag sets of a largely varying size are used in different annotation initiatives. We propose a hierarchical dependency structure annotation schema that is more detailed and more flexible than the known annotation schemata. The schema allows us to choose the level of the desired detail of annotation, which facilitates the use of the schema for corpus annotation for different languages and for different NLP applications. Thanks to the inclusion of semantico-syntactic tags into the schema, we can annotate a corpus not only with syntactic dependency structures, but also with valency patterns as they are usually found in separate treebanks such as PropBank and NomBank. Semantico-syntactic tags and the level of detail of the schema furthermore facilitate the derivation of deep-syntactic and semantic annotations, leading to truly multilevel annotated dependency corpora. Such multilevel annotations can be readily used for the task of ML-based acquisition of grammar resources that map between the different levels of linguistic representation ― something which forms part of, for instance, any natural language text generator.
Anthology ID:
L10-1483
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/697_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Simon Mille and Leo Wanner. 2010. Syntactic Dependencies for Multilingual and Multilevel Corpus Annotation. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Syntactic Dependencies for Multilingual and Multilevel Corpus Annotation (Mille & Wanner, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/697_Paper.pdf