Core body of knowledge

This page is a community-based effort to identify the core body of knowledge of the CL curriculum, following the ACM Model. It seeks to define "a minimal core consisting of those units for which there is a broad consensus that the corresponding material is essential" for any introductory course in computational linguistics. Following ACM's definition, "the core is not a complete curriculum ... the core must be supplemented by additional material." (Please use the talk page for discussion.)

CL1. Goals of computational linguistics: roots, philosophical underpinnings, ideology, contemporary divides

CL2. Introduction to Language: written vs spoken language; linguistic levels; typology, variation and change

CL3. Words, morphology and the lexicon: tokenization, lexical categories, POS-tagging, stemming, morphological analysis, FSAs

CL4. Syntax, grammars and parsing: grammar formalisms, grammar development, formal complexity of natural language

CL5. Semantics and discourse: lexical semantics, multiword expressions, discourse representation

CL6. Generation: text planning, syntactic realization

CL7. Language engineering: architecture, robustness, evaluation paradigms

CL8. Language resources: corpora, web as corpus, data-intensive linguistics, linguistic annotation, Unicode

CL9. Language technologies: named entity detection, coreference, IE, QA, summarization, MT, NL interfaces

References

Bird, Steven (2008). Defining a Core Body of Knowledge for the Introductory Computational Linguistics Curriculum. Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics [1]

Core body of knowledge

References

Navigation menu

Search