Unified Lexicon and Unified Morphosyntactic Specifications for Written and Spoken Italian

Monica Monachini, Nicoletta Calzolari, Khalid Choukri, Jochen Friedrich, Giulio Maltese, Michele Mammini, Jan Odijk, Marisa Ulivieri


Abstract
The goal of this paper is (1) to illustrate a specific procedure for merging different monolingual lexicons, focussing on techniques for detecting and mapping equivalent lexical entries, and (2) to sketch a production model that enables one to obtain lexical resources via unification of existing data. We describe the creation of a Unified Lexicon (UL) from a common sample of the Italian PAROLE-SIMPLE-CLIPS phonological lexicon and of the Italian LCSTAR pronunciation lexicon. We expand previous experiments carried out at ILC-CNR: based on a detailed mechanism for mapping grammatical classifications of candidate UL entries, a consensual set of Unified Morphosyntactic Specifications (UMS) shared by lexica for the written and spoken areas is proposed. The impact of the UL on cross-validation issues is analysed: by looking into conflicts, mismatches and diverging classifications can be detected in both resources. The work presented is in line with the activities promoted by ELRA towards the development of methods for packaging new language resources by combining independently created resources, and was carried out as part of the ELRA Production Committee activities. ELRA aims to exploit the UL experience to carry out such merging activities for resources available on the ELRA catalogue in order to fulfill the users' needs.
Anthology ID:
L06-1265
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/445_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Monica Monachini, Nicoletta Calzolari, Khalid Choukri, Jochen Friedrich, Giulio Maltese, Michele Mammini, Jan Odijk, and Marisa Ulivieri. 2006. Unified Lexicon and Unified Morphosyntactic Specifications for Written and Spoken Italian. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Unified Lexicon and Unified Morphosyntactic Specifications for Written and Spoken Italian (Monachini et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/445_pdf.pdf