The Third International Joint Conference on Natural Language Processing (IJCNLP-08)

Organized by:

Tutorials

There will be three tutorials on Mon, 7 Jan 2008 at IIIT, Hyderabad from 0900 to 1800hrs. Each tutorial will take 3 hours. The time schedule of the tutorials is as in the table.

10:00-13:00	T2
13:00-14:00	Lunch
14:00-17:00	T1 & T3

T1. Social Network inspired Models of NLP and Language Evolution

Human language with all its intricacies is one of the finest examples of a complex system, which makes it absolutely necessary to study the faculty of language within the framework of the emerging new science of complexity. Complex systems are often modelled as networks of entities (nodes) and their interactions (edges) – popularly known as complex or social networks. Network based models are empirically analyzed to understand the structure of the underlying system, and they are synthesized from the first principles to study the evolutionary dynamics.

The objective of this tutorial is to show how language and its dynamics can be successfully studied within the framework of social networks as is being evident from the growing body of work in this area. We will particularly demonstrate the relevance of social network-based methods in the development of a large variety of natural language processing applications. The tutorial will also highlight the importance of these methods in understanding the mechanisms of language evolution and change. The tutorial will cover the basics of complex network theory followed by three case studies related to syntax, mental lexicon and language evolution.

Presenters

Monojit Choudhury is a post doctoral researcher in the Multilingual Systems Group, Microsoft Research, India.

Animesh Mukherjee is a PhD student in the Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur.

Niloy Ganguly is an assistant professor in the Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur.

T2. How to Add a New Language on the NLP Map: Building Resources and Tools for Languages with Scarce Resources

Rada Mihalcea, University of North Texas
Vivi Nastase, EML Research gGmbH

Those of us whose mother tongue is not English or are curious about applications involving other languages, often find ourselves in the situation where the tools we require are not available. According to recent studies there are about 7200 different languages spoken worldwide -- without including variations or dialects -- out of which very few have automatic language processing tools and machine readable resources.

In this tutorial we will show how we can take advantage of lessons learned from frequently studied and used languages in NLP, and of the wealth of information and collaborative efforts mediated by the World Wide Web. We structure the presentation around two major themes: mono-lingual and cross-lingual approaches. Within the mono-lingual area, we show how to quickly assemble a corpus for statistical processing, how to obtain a semantic network using on-line resources -- in particular Wikipedia -- and how to obtain automatically annotated corpora for a variety of applications. The cross-lingual half of the tutorial shows how to build upon NLP methods and resources for other languages, and adapt them for a new language. We will review automatic construction of parallel corpora, projecting annotations from one side of the parallel corpus to the other, building language models, and finally we will look at how all these can come together in higher-end applications such as machine translation and cross-language information retrieval.

Presenters

RADA MIHALCEA is an Assistant Professor of Computer Science at the University of North Texas. Her research interests are in lexical semantics, multilingual natural language processing, minimally supervised natural language learning, and graph-based algorithms for natural language processing. She serves on the editorial board of the Journal of Computational Linguistics, the Journal of Language Resources and Evaluations, the Journal of Natural Language Engineering, the Journal of Research in Language in Computation, and the recently established Journal of Interesting Negative Results in Natural Language Processing and Machine Learning.

VIVI NASTASE is a post-doctoral fellow at EML Research gGmbH, Heidelberg, Germany. Her research interests are in lexical semantics, semantic relations, knowledge extraction, multi-document summarization, graph-based algorithms for natural language processing, multilingual natural language processing. She is a co-founder of the Journal of Interesting Negative Results in Natural Language Processing and Machine Learning.

T3. Introduction to Text Summarization and Other Information Access Technologies

Horacio Saggion, University of Sheffield, UK

In recent years we have witnessed an explosion of on-line unstructured information in multiple languages, making natural language processing technologies such as automatic text summarization increasingly important for the information society. Text Summarization provides users with condensed descriptions of documents, allowing them to make informed decisions based on text summaries. Text summarization can be combined with Information Retrieval (IR) and Question Answering (QA) to provide users with focus-based or query-based summaries which are targeted towards the users' specific needs. When the information a user looks for is spread across multiple sources, text summarization can be used to condense facts and present a non-redundant account of the most relevant facts found across a set of documents.

The objective of this IJCNLP 2008 tutorial is to give an overview of a number of technologies in natural language processing for information access including: single and multi-document summarization, cross-lingual summarization; and summarization in the context of question answering.

The tutorial will discuss summarization concepts and techniques as well as its relation and relevance to other technologies such as information retrieval and question answering. It will also include description of available resources for development, training and evaluation of summarization components. A summarization toolkit will be used for demonstration purposes. A number of question answering components relevant for the creation of definitional summaries and profiles summaries will also be demonstrated.

Presenter

Dr Saggion is a research fellow in the Natural Language Processing group, Department of Computer Science, University of Sheffield, England, UK. His area of expertise is Text Summarization. He works in national and international projects on information extraction, ontology-based information extraction, question answering, and text summarization. He obtained his PhD. in 2000 from Universite de Montreal, Departement d'Informatique et de Recherche Operationnelle.

He has published over 50 works in conferences, workshops and journal papers. Together with his research career, he has been an active teacher, he was assistant professor and researcher at Universidad de Buenos Aires, and Universidad Nacional de Quilmes, and teaching assistant at Universite de Montreal. He has participated in a number of summarization and question answering evaluations including DUC (2004-2005) MSE (2005), TREC/QA (2003-2005). He has recently organised the workshops ``Multi-source Multi-lingual Information Extraction and Summarization'' and ``Crossing Barriers in Text Summarization Research'' in the RANLP Conferences.

He has given courses and tutorials on Text Summarization and other technologies such as question answering in a number of international venues such as ESSLLI and LREC.

TUTORIALS CHAIRS

Key-Sun Choi, KAIST, Republic of Korea
Kemal Oflazer, Sabanci University, Turkey

Please send further inquiries concerning tutorials at IJCNLP'08 to tutorial chairs at oflazer@sabanciuniv.edu or kschoi@cs.kaist.ac.kr.