Frequently asked questions about Computational Linguistics
The material below was last updated in 2005. For even more dated information, see the NLP FAQ.
Source: http://web.archive.org/web/20100805080705/http://www.ling.su.se/DaLi/cl_faq/index.htm
The following information is meant for people not familiar with Computational Linguistics. Here we try to answer the following questions:
What is Computational Linguistics?
see also a short definition in the ACL Archives
Computational Linguistics, or Natural Language Processing (NLP), is not a new field. As early as 1946, attempts have been undertaken to use computers to process natural language. These attempts concentrated mainly on Machine Translation and, due to the political situation at the time, almost exclusively on the translation from Russian into English. Considerable resources were dedicated to this task, both in the U.S.A. and in Great Britain, during the fifties and sixties. Other countries, mainly in continental Europe, joined the enterprise, and the first systems ("SYSTRAN") became operational at the end of this period. However, the limited performance of these systems made it clear that the underlying theoretical difficulties of the task had been grossly underestimated, and in the following years and decades much effort was spent on basic research in formal linguistics. Today, a number of Machine Translation systems are available commercially although there still is no system that produces fully automatic high-quality translations (and probably there will not be for some time). Human intervention in the form of pre- and/or post-editing is still required in all cases.
Another application that has become commercially viable in the last years is the analysis and synthesis of spoken language, i.e., speech understanding and speech generation. Potential applications go from help for the handicapped (e.g., text-to-speech systems for the blind) to telephony based information systems (e.g., inquiry systems for train or plane connections, telebanking) and further on to office dictation systems (as offered by several vendors). Several text-to-speech systems are commercially available, and are in daily use in many places. The difficulties of speech understanding are much greater than those for speech generation yet some of the speech understanding systems are also entering the marketplace.
An application that will become at least as important as those already mentioned is the creation, administration, and presentation of texts by computer. Even reliable access to written texts is a major bottleneck in science and commerce. The amount of textual information is enormous (and growing incessantly), and the traditional, word-based, information retrieval methods are getting increasingly insufficient as either precision or recall is always low (i.e., you get either a large number of irrelevant documents together with the relevant ones, or else you fail to get a large number of the relevant ones in the collection). Linguistically based retrieval methods, taking into account the meaning of sentences as encoded in the syntactic structure of natural language, promise to be a way out of this quandary. However, the creation of texts is also becoming a problem. Manuals of complex technical systems (airplanes, computers etc.) are constantly out of date as the systems themselves are upgraded ever faster. Writing manuals by hand is thus getting ever more expensive and unreliable, and if manuals have to be maintained in different languages, manual production becomes increasingly unmanageable. If different versions of the manuals have to be written (for service users, for technicians, for auditors etc.), things get out of hand altogether. The automatic creation of manuals from a common knowledge base, in different languages and for different types of readers is a possible solution of this cluster of problems. The creation of natural language texts has always been a bit of "poor cousin" in the field of Computational Linguistics. The situation described is about to change this in a fundamental manner.
Another topic that might come to the forefront of research in Computational Linguistics is the presentation of textual information. Traditionally, text generation systems have created standard, i.e., linear, text. If the amount of text is large, and/or if different types of readers must be addressed, hypertext is a better medium of presentation. The automatic creation of hypertext from an underlying knowledge base calls for an extension of this traditional approach.
What are the main application areas of Computational Linguistics?
Computational Linguistics tries to solve problems in the following areas:
- Machine Translation (see also Machine Translation: An Introductory Guide for a complete online book)
- Natural Language Interfaces
- Grammar and style checking
- Document processing and information retrieval
- Computer-Assisted Language Learning
How is the Computational Linguistics job market?
Many people with a degree in Computational Linguistics work in research groups in universities, governmental research labs, or in large enterprises. For example in Sweden Computational Linguists work in research groups at the various universities that offer courses in linguistics (like Göteborg or Uppsala), at research labs like SICS (The Swedish Institute of Computer Science), or for companies like Telia or IBM.
In addition there are development groups working on commercial products. These range from software houses like Microsoft, that employs Computational Linguists for their work on Grammar Checkers and Automatic Summarization, to the Munich based SailLabs, that develops a machine translation system, to Caterpillar which employs Computational Linguists for translations of technical manuals.
In recent years the demand for Computational Linguists has risen with the increase of language technology products in the Internet. Job offers come from developers improving Internet search engines with linguistic means, or facilitating the user interface with lingubots. Others are integrating speech recognition with language processing techniques.
In general one can say that currently the job market for Computational Linguists is good.
Where and how can I study Computational Linguistics?
Numerous European universities offer degree programs and/or courses in CL. The ACL distributes a "Directory of Graduate Programs in CL". CL is mostly offered as a minor either supplementing a major in Computer Science or in some Linguistics or Language Science.
What are the main professional organizations in Computational Linguistics?
The Association of Computational Linguistics (ACL)
The most influential organization. sbo
- They organize the annual ACL conferences and offer a public archive of all past conference papers.
- There is also a European chapter (EACL) with a separate chairing committee.
Association for Computers and the Humanities (ACH)
- ACH Homepage
- Its European counterpart and sister organization: Association of Literary and Linguistic Computing (ALLC)
International Association for Machine Translation (IAMT)
- Its European regional association: The European Association for Machine Translation (EAMT)
- Its American branch: Association for Machine Translation in the Americas (AMTA)
What are the main journals relevant to Computational Linguistics?
A very comprehensive list of journals related to linguistics is maintained by the LINGUIST LIST.
In Computational Lingustics and Linguistics:
- Computational Linguistics (the most important journal; published by the ACL)
- Language Resources and Evaluation (prior to March 2005 this publication was named Computers and the Humanities; published by Springer Science Media)
- Computer, Speech & Language (strongly oriented towards spoken language; published by Elsevier)
- Computer Assisted Language Learning (published by Routledge Taylor & Francis Group)
- International Journal of Corpus Linguistics (published by John Benjamins Publishing Company)
- Grammars. A Journal of mathematical research on formal and natural languages. (Published by Kluwer; discontinued as of December 2003.)
- Journal of Language and Computation. (at the interface of logic, linguistics, formal grammar, and computational linguistics; published by Springer)
- Language (Journal of the Linguistic Society of America)
- Linguistics. An Interdisciplinary Journal of the Language Sciences (published by de Gruyter)
- International Journal of Lexicography (published by Oxford University Press)
- Linguistics and Philosophy (published by Springer)
- Literary and Linguistic Computing (published by the ALLC and Oxford University Press)
- Natural Language Engineering (since 1995, published by Cambridge University Press)
- Natural Language and Linguistic Theory (published by Springer)
- Machine Translation (published by Springer)
- Journal of Semantics (published by Oxford University Press)
- Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication. (published by John Benjamins)
In Psychology:
- Cognition (published by Elsevier)
- Cognitive Linguistics. An Interdisciplinary Journal of Cognitive Science (published by de Gruyter)
In Computer Science:
- AI Magazine
- AI and Society (published by Springer)
- Artificial Intelligence (published by Elsevier)
- Int. Journal of Man-Machine Studies
- The Journal of Logic Programming (published by North-Holland)
What are the main sources of online papers?
- The ACL anthology with the sbobet.
- The Computation and Language e-print archive
- The Citeseer scientific literature digital library and search engine.
Author: Martin Volk
Date of last modification: 2005-11-22