Core body of knowledge
This page is a community-based effort to identify the core body of knowledge of the CL curriculum, following the ACM Model. It seeks to define "a minimal core consisting of those units for which there is a broad consensus that the corresponding material is essential" for any introductory course in computational linguistics. Following ACM's definition, "the core is not a complete curriculum ... the core must be supplemented by additional material."
The consensus of the participants of the Third Workshop on Teaching Computational Linguistics was to use the ACL wiki to develop and refine the definition of the CL core body of knowledge. Please use the talk page for discussion.
Computational Linguistics Core Body of Knowledge
- CL1. Goals of computational linguistics
- roots, philosophical underpinnings, ideology, contemporary divides
- CL2. Introduction to Language
- written vs spoken language; linguistic levels; typology, variation and change
- CL3. Words, morphology and the lexicon
- tokenization, lexical categories, POS-tagging, stemming, morphological analysis, FSAs
- CL4. Syntax, grammars and parsing
- grammar formalisms, grammar development, formal complexity of natural language
- CL5. Semantics and discourse
- lexical semantics, multiword expressions, discourse representation
- CL6. Generation
- text planning, syntactic realization
- CL7. Language engineering
- architecture, robustness, evaluation paradigms
- CL8. Language resources
- corpora, web as corpus, data-intensive linguistics, linguistic annotation, Unicode
- CL9. Language technologies
- named entity detection, coreference, IE, QA, summarization, MT, NL interfaces
References
Bird, Steven (2008). Defining a Core Body of Knowledge for the Introductory Computational Linguistics Curriculum. Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics [1]