| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This news letter includes: 1) News from Program Committee of Main Conference 2) Extended Deadline of Student Research Workshop 3) Life Time Achievement Award
6) Important Announcements from Several Associated Conferences and Workshops
1) News from Program Committee of Main Conference 376 papers were submitted to the
main conference. This is far more than we expected. Thank
you for your interest in ACL2003. The paper submission deadline of Student Research Workshop was extended: Paper submission deadline: March
15, 2003
(extended) We would appreciate it if you
could inform your students that the deadline has been
extended. A ceremony for the second Life
Time Achievement Award will be held during ACL 2003. The LTA
was established at the 40th anniversary conference of ACL
last year. The first winner of the LTA was Prof. Aravind
Joshi of the University of Pennsylvania. There will be four tutorials, to be given by leading experts in language and speech processing. The tutorials will take place on July 7. The abstracts of the tutorials and the profiles of the speakers will be described on the ACL-03 web site. For details, see the Web site http://www.ec-inc.co.jp/ACL2003/tutorials.html. 4-1) Finite
State Language Processing Finite state automata are well-understood, and inherently compact and efficient models of simple languages. In addition, finite state automata can be combined in various interesting ways, with the guarantee that the result again is a finite state automaton. In the introductory part of the tutorial, finite state acceptors and finite state transducers (both weighted and unweighted) are introduced, and we briefly review their formal and computational properties. In the second part of the tutorial, we illustrate the use of finite state methods in dictionary construction. In particular, we present an application of perfect hash automata in tuple dictionaries. Tuple dictionaries provide a very compact representation of huge language models of the kind typically used in NLP applications (including Ngram language models). In the third part of the tutorial we focus on regular expressions for NLP. The type of regular expressions used in modern NLP applications has evolved dramatically from the regular expressions found in standard Computer Science textbooks. In recent years, various high level regular expression operators have been introduced (such as contexted replacement operators). The availability of more and more abstract operators make the regular expression notation more and more attractive. The tutorial provides an introduction into the regular expression calculus. The examples use the notation of the Fsa Utilities toolkit: a freely available implementation of the regular expression calculus. We introduce various regular expression operators for acceptors and transducers. We then continue to show how new regular expression operators can be defined. In the last part of the tutorial,
we focus in more detail on regular expression operators that
turned out to be useful for the description of certain
aspects of phonology using ideas from Optimality Theory.
This part of the tutorial describes the lenient composition
operator of Karttunen, and the optimality operator of
Gerdemann and van Noord, as well as a number of alternatives
(Eisner, Jaeger). Maximum Entropy Models: What maximum entropy models are, from first principles, what they can and cannot do, and how they behave. Lots of examples. The equivalence of maxent models and maximum-likelihood exponential models. The relationship between maxent models and other classifiers. Smoothing methods for maxent models. Basic Optimization: Unconstrained optimization: convexity, gradient methods (both simple descent and more practical conjugate methods). Constrained optimization: Lagrange multipliers and several ways of turning them into a concrete optimization system. Other fun things to do with optimization. Specialized iterative scaling methods vs. general optimization. Model Structures: Conditional independence in graphical models (focusing on NB, HMMs, and PCFGs). Practical ramifications of various independence assumptions. Label and observation biases in conditional structures. Survey of sequence models (HMMs, MEMMs, CRFs, and dependency networks). Prerequisites: Familiarity with
basic calculus and a working knowledge of NB and HMMs are
required. Existent but possibly vague knowledge of general
Bayes' nets or basic information theory is a plus. Most
importantly: a low tolerance for conceptual black boxes
labeled "magic here". KDT, while deeply rooted in NLP, actually draws on methods from statistics, machine learning, reasoning, information extraction, knowledge management, cognitive science and others for its discovery process. The emphasis here is on the automatic discovery of new concepts and on the large number of semantic relations that link them. This tutorial presents recent results from KDT research and system implementations. Since the goal of KDT is to get insights into large quantities of text data and bring to bear text semantics, it plays an increasingly significant role in emerging applications, such as Question Answering, Summarization, Text Understanding and Ontology Development. This tutorial is aimed at
researchers, practitioners, educators, and research planners
who want to keep in sync with the newly emerging KDT
technology. This tutorial will chart the main advances that have been made in spoken language processing algorithms and applications over the past few years. The key enabling technologies of 'automatic speech recognition', 'text-to-speech synthesis' and 'spoken language dialogue' will be explained in some detail, with emphasis being placed on how the technology works and, perhaps more importantly, why it sometimes doesn't. Insight will also be given into the linguistic/paralinguistic properties of speech signals and human spoken language, and comparisons will be drawn between the capabilities of 'automatic' and 'natural' spoken language processing systems. The tutorial is aimed at both specialists and non-specialists in the language processing field, and will be of great interest to anyone who is keen to develop a greater understanding of the main issues involved in spoken language processing. Prof. Moore will cover theoretical and practical aspects of the inner workings of state-of-the-art spoken language systems, as well as providing a balanced overview of their capabilities in relation to other modes of human-machine interaction. The tutorial will incorporate
question-and-answer opportunities, and will conclude with a
survey of open research issues and some predictions for the
future. The student research workshop, the interactive poster/demo sessions, the associated conferences (EMNLP2003 and IRAL2003) and the workshops have their own submission deadlines and sites. Please see the web sites for the details. 5-1) Student Research Workshop Paper submission deadline: March
15, 2003 (extended) Paper submission deadline: May 1,
2003 AC1 The Eighth Conference on Empirical Methods in Natural Language Processing (EMNLP2003)
AC2 The Sixth International Workshop on Information Retrieval with Asian Languages (IRAL2003)
5-4) ACL Workshops WS1 Multilingual Summarization and Question Answering - Machine Learning and Beyond
WS2 Natural Language Processing in Biomedicine
WS3 The Lexicon and Figurative Language
WS4 Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models
WS5 The Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications
WS6 Second SIGHAN Workshop on Chinese Language Processing
WS7 Multiword Expressions: Analysis, Acquisition and Treatment
WS8 Linguistic Annotation: Getting the Model Right
WS9 Workshop on Patent Corpus Processing
WS10 Towards a Resources Information Infrastructure
5-5) Exhibits and Sponsorship Application Deadline for both: April 1, 2003 For details, see Exhibits and Sponsorship at http://www.ec-inc.co.jp/ACL2003/.
6) Important Announcements from Several Associated Conferences and Workshops AC1 The Eighth Conference on Empirical Methods in Natural Language Processing (EMNLP2003) Abstract: SIGDAT, the Association for Computational Linguistics' special interest group on linguistic data and corpus-based approaches to NLP, invites submissions to EMNLP 2003. The conference will be held on July 11-12 in Sapporo, Japan, immediately following the 41st meeting of the ACL (ACL 2003). URL: http://www.ai.mit.edu/people/mcollins/emnlp03 Abstract: Automatic summarization and question answering (QA) aim at producing a concise representation of the key information content. Rule-based or statistical-based approaches to summarization and QA systems have shown promising results; it is, however, very difficult to find good evaluation functions or rules that work well across domains. In consequence, various machine learning (ML) techniques have recently been applied to summarization and QA systems. The purpose of this workshop is to provide a forum for exploring the commonality underling this diversity of problem domains and approaches. Deadline: April 21, 2003 Invited speaker: Prof. Carol
Friedman, CUNY/ Columbia University The aim of this workshop is to bring together NLP researchers in biomedicine and to discuss recent advances in the computational analysis of text, which go beyond traditional keyword-based indexing methods and begin to offer content-based analysis. Knowledge discovery in the rapidly growing area of biomedicine is of paramount importance. Processing biomedical texts is a challenge especially in the areas of terminology, ontology building, information extraction, annotation tools, sharing and integration of knowledge from factual and textual data bases and evaluation of biomedical applications among others. One of the aims of the workshop is to create SIGs in areas of common interest such as annotation standards in biology, evaluation metrics, standardisation of terminological resources etc.
WS3 The Lexicon and Figurative Language Abstract: The lexicon has variously been treated as a list of word senses, a list of hierarchically related senses, (e.g. WordNet), and as a structured entity containing rich lexical representations and means to generate novel uses of words. Figurative language poses problems for all these approaches, and a common claim is that metaphor is a cognitive not a linguistic phenomenon; instead, word senses are related in terms of their underlying conceptual domains. The major theme of this SIGLEX endorsed workshop is to explore and attempt to reconcile these different approaches to figurative language and the lexicon - although papers exploring other aspects of figurative language will also be welcome. Deadline: April 13, 2003 Invited speaker: David Yarowsky Named Entity (NE) Recognition systems vary widely, from high-speed bulk methods optimized for indexing, to deep semantic parsers tuned for specific domains. Optimal ways to combine statistical and symbolic models also vary, depending on applications and tasks. Is it possible to: - maximize use of knowledge-rich resources (e.g. lexicons, NE grammars, parsing) while permitting corpus-based training for domain or language? - acquire and share resources (including lexicons and grammars) across languages? - balance performance speed with reasonable accuracy? - use specific language patterns while permitting rapid transfer to another language? - minimize variability in results across language types? We welcome research on combined models, in which these tradeoffs are calculated in particular ways. Demonstrations of implemented NE systems are also welcome. Submit papers by April 4 electronically in Word, PDF or PostScript format. Assign a filename based on the paper's title, transfer to ftp://ftp.research.microsoft.com/incoming/josephp then email an identification page with title, author(s), contact details, and filename to molsen@microsoft.com. URL: http://research.microsoft.com/conferences/mulner-acl03/ Abstract: Paraphrases, variant ways of conveying the same information, are of interest because they present challenges for many NLP tasks, such as MT, IR, QA, etc. This workshop is open to investigation of all aspects of paraphrase, with a particular focus on the automatic acquisition of paraphrases from corpora, and on the development of a standardized paraphrase framework or resource for use in applications. URL: http://nlp.nagaokaut.ac.jp/IWP2003/ Abstract: As more resources for Chinese NLP have become available to the public recently, it is crucial to set up a platform that allows easy comparison of different approaches to various NLP tasks. Sighan is conducting a word-segmentation bakeoff before the workshop. Researchers all over the world are welcome to participate. As a part of this Sighan workshop, we are going to release the bakeoff results, followed by the presentation of bakeoff participants and the general discussions on future evaluations. A second part of the workshop will consist of presentations of papers on all aspects of Chinese language processing.
WS7 Multiword Expressions: Analysis, Acquisition and Treatment The workshop will concentrate on the analysis, acquisition and treatment of multiword expressions (MWEs), such as phrasal verbs (e.g. "add up"), nominal compounds (e.g. "radar footprint"), and institutionalized phrases (e.g. "salt and pepper"). In particular we focus on addressing the problems that MWEs pose for natural language processing applications. URL: http://www.cl.cam.ac.uk/users/alk23/mwe/mwe.html Abstract: The goal of this workshop is to foster research and development of the technology for patent corpus processing, by providing a forum in which researchers and practitioners can exchange and share their ideas, approaches, perspectives, and experiences from their work in progress. We invite both research papers and project papers associated with, but not limited to, the rudiments of patent corpus processing. We also invite papers addressing applications and user studies. Deadline: April 10, 2003 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|