Invited Talk: Argmax Search in Natural Language Processing

Daniel Marcu
Information Sciences Institute
University of Southern California

Tuesday 18th July, 9:00-10:00, Bayside Auditorium A

Abstract

As our field matures, the gap between the mathematical meaning of the argmax symbol and the way the symbol is used in state-of-the-art natural language processing applications is wid-ening.

  • We often claim that a given statistical model has certain empirical properties, although our argmax computations are carried out on severely simplified ver-sions of it.
  • The difficulty of argmax search in large, unfamiliar search spaces prevents or slows us down when we want to show that sophisticated, linguistically moti-vated models are more powerful than simple ones.
  • Since optimal search is impractical in exponential spaces, we often resort to validating statistical models against only well-formed solutions. This significantly diminishes our ability to make progress on many of the modeling challenges our community faces.
  • Argmax approximations are fertilizing the most flourishing hacking grounds. Who in our field has not studied the im-pact on some end-to-end performance metric of histogram and probabilistic beam settings, figures of merit, etc.?

In my presentation, I review first some of the argmax-related challenges our community is fac-ing and the effect that the ignoring of these chal-lenges is having on our field. I also present re-cent developments that have the potential to im-pact positively a wide range of natural language applications where argmax search is critical.

Biography

Daniel Marcu is a Research Project Leader at the Information Sciences Institute and the Chief Op-erations and Technology Officer of Language Weaver Inc. His published work includes an MIT Press book, “The Theory and Practice of Dis-course Parsing and Summarization”, and best paper awards, with his ISI colleagues, at AAAI-2000 and ACL-2001 for research on statistical-based summarization and translation. He is cur-rently focusing on developing efficient learning and decoding algorithms for large-scale natural language processing problems and believes that machine translation is an excellent playground for doing this.

Contact Information
Daniel Marcu
Information Sciences Institute
University of Southern California
4676 Admiralty Way, Suite 1001
Marina del Rey, CA 90292
marcu at isi.edu