Disambiguating Compound Nouns for a Dynamic HPSG Treebank of Wall Street Journal Texts

Valia Kordoni, Yi Zhang


Abstract
The aim of this paper is twofold. We focus, on the one hand, on the task of dynamically annotating English compound nouns, and on the other hand we propose disambiguation methods and techniques which facilitate the annotation task. Both the aforementioned are part of a larger on-going effort which aims to create HPSG annotation for the texts from theWall Street Journal (henceforward WSJ) sections of the Penn Treebank (henceforward PTB) with the help of a hand-written large-scale and wide-coverage grammar of English, the English Resource Grammar (henceforward ERG; Flickinger (2002)). As we show in this paper, such annotations are very rich linguistically, since apart from syntax they also incorporate semantics, which does not only ensure that the treebank is guaranteed to be a truly sharable, re-usable and multi-functional linguistic resource, but also calls for the necessity of a better disambiguation of the internal (syntactic) structure of larger units of words, such as compound nouns, since this has an impact on the representation of their meaning, which is of utmost interest if the linguistic annotation of a given corpus is to be further understood as the practice of adding interpretative linguistic information of the highest quality in order to give “added value” to the corpus.
Anthology ID:
L10-1345
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/498_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Valia Kordoni and Yi Zhang. 2010. Disambiguating Compound Nouns for a Dynamic HPSG Treebank of Wall Street Journal Texts. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Disambiguating Compound Nouns for a Dynamic HPSG Treebank of Wall Street Journal Texts (Kordoni & Zhang, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/498_Paper.pdf