Leaving no token behind: comprehensive (and delicious) annotation of MWEs and supersenses

Nathan Schneider


Abstract
I will describe an unorthodox approach to lexical semantic annotation that prioritizes corpus coverage, democratizing analysis of a wide range of expression types. I argue that a lexicon-free lexical semantics—defined in terms of units and supersense tags—is an appetizing direction for NLP, as it is robust, cost-effective, easily understood, not too language-specific, and can serve as a foundation for richer semantic structure. Linguistic delicacies from the STREUSLE and DiMSUM corpora, which have been multiword- and supersense-annotated, attest to the veritable smörgåsbord of noncanonical constructions in English, including various flavors of prepositions, MWEs, and other curiosities. Bio: Nathan Schneider is an annotation schemer and computational modeler for natural language. As Assistant Professor of Linguistics and Computer Science at Georgetown University, he looks for synergies between practical language technologies and the scientific study of language. He specializes in broad-coverage semantic analysis: designing linguistic meaning representations, annotating them in corpora, and automating them with statistical natural language processing techniques. A central focus in this research is the nexus between grammar and lexicon as manifested in multiword expressions and adpositions/case markers. He has inhabited UC Berkeley (BA in Computer Science and Linguistics), Carnegie Mellon University (Ph.D. in Language Technologies), and the University of Edinburgh (postdoc). Now a Hoya and leader of NERT, he continues to play with data and algorithms for linguistic meaning.
Anthology ID:
W18-4903
Volume:
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Agata Savary, Carlos Ramisch, Jena D. Hwang, Nathan Schneider, Melanie Andresen, Sameer Pradhan, Miriam R. L. Petruck
Venues:
LAW | MWE
SIGs:
SIGLEX | SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
5
Language:
URL:
https://aclanthology.org/W18-4903
DOI:
Bibkey:
Cite (ACL):
Nathan Schneider. 2018. Leaving no token behind: comprehensive (and delicious) annotation of MWEs and supersenses. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), page 5, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Leaving no token behind: comprehensive (and delicious) annotation of MWEs and supersenses (Schneider, LAW-MWE 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4903.pdf