Thomas Kollar


2018

pdf bib
The Alexa Meaning Representation Language
Thomas Kollar | Danielle Berry | Lauren Stuart | Karolina Owczarzak | Tagyoung Chung | Lambert Mathias | Michael Kayser | Bradford Snow | Spyros Matsoukas
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

This paper introduces a meaning representation for spoken language understanding. The Alexa meaning representation language (AMRL), unlike previous approaches, which factor spoken utterances into domains, provides a common representation for how people communicate in spoken language. AMRL is a rooted graph, links to a large-scale ontology, supports cross-domain queries, fine-grained types, complex utterances and composition. A spoken language dataset has been collected for Alexa, which contains ∼20k examples across eight domains. A version of this meaning representation was released to developers at a trade show in 2016.

2013

pdf bib
Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World
Jayant Krishnamurthy | Thomas Kollar
Transactions of the Association for Computational Linguistics, Volume 1

This paper introduces Logical Semantics with Perception (LSP), a model for grounded language acquisition that learns to map natural language statements to their referents in a physical environment. For example, given an image, LSP can map the statement “blue mug on the table” to the set of image segments showing blue mugs on tables. LSP learns physical representations for both categorical (“blue,” “mug”) and relational (“on”) language, and also learns to compose these representations to produce the referents of entire statements. We further introduce a weakly supervised training procedure that estimates LSP’s parameters using annotated referents for entire statements, without annotated referents for individual words or the parse structure of the statement. We perform experiments on two applications: scene understanding and geographical question answering. We find that LSP outperforms existing, less expressive models that cannot represent relational language. We further find that weakly supervised training is competitive with fully supervised training while requiring significantly less annotation effort.