Core body of knowledge
Revision as of 09:00, 21 June 2008 by StevenBird (talk | contribs) (New page: This page is a community-based effort to identify the core body of knowledge of the CL curriculum, following the [http://www.sigcse.org/cc2001/cs-overview-bok.html#Summary ACM Model]. It ...)
This page is a community-based effort to identify the core body of knowledge of the CL curriculum, following the ACM Model. It seeks to define "a minimal core consisting of those units for which there is a broad consensus that the corresponding material is essential" for any introductory course in computational linguistics. Following ACM's definition, "the core is not a complete curriculum ... the core must be supplemented by additional material." (Please use the talk page for discussion.)
- CL1. Goals of computational linguistics
- roots, philosophical underpinnings, ideology, contemporary divides
- CL2. Introduction to Language
- written vs spoken language; linguistic levels; typology, variation and change
- CL3. Words, morphology and the lexicon
- tokenization, lexical categories, POS-tagging, stemming, morphological analysis, FSAs
- CL4. Syntax, grammars and parsing
- grammar formalisms, grammar development, formal complexity of natural language
- CL5. Semantics and discourse
- lexical semantics, multiword expressions, discourse representation
- CL6. Generation
- text planning, syntactic realization
- CL7. Language engineering
- architecture, robustness, evaluation paradigms
- CL8. Language resources
- corpora, web as corpus, data-intensive linguistics, linguistic annotation, Unicode
- CL9. Language technologies
- named entity detection, coreference, IE, QA, summarization, MT, NL interfaces
References
Bird, Steven (2008). Defining a Core Body of Knowledge for the Introductory Computational Linguistics Curriculum. Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics [1]