Resources for Coptic
Corpora
- Multiple corpora from the project Coptic Scriptorium are available for download from http://copticscriptorium.org under a CC-BY license
- The Coptic Universal Dependency Treebank is available at http://corpling.uis.georgetown.edu/coptic-treebank
Tools
- An online NLP pipeline interface and API are available at http://corpling.uis.georgetown.edu/coptic-nlp . Also includes a dependency parser.
- Individual command line tools used in the pipeline:
- Tokenizer - for UTF-8 plain text or XML input, also performs morphological analysis of tokens into constituent morphemes
- Tagging and lemmatization - models are available for use with TreeTagger
- Language of origin tagger - useful for detecting Greek and other loanwords
- Automatic normalization
- Coreference resolution and NER for Coptic have been implemented in xrenner