Difference between revisions of "Resources for Coptic"
Jump to navigation
Jump to search
Amir.zeldes (talk | contribs) (More Coptic resources) |
Amir.zeldes (talk | contribs) (URLs) |
||
Line 1: | Line 1: | ||
==Corpora== | ==Corpora== | ||
− | * Multiple corpora from the project Coptic Scriptorium are available for download from | + | * Multiple corpora from the project Coptic Scriptorium are available for download from https://copticscriptorium.org under a CC-BY license |
− | * The Coptic Universal Dependency Treebank is available at | + | * The Coptic Universal Dependency Treebank is available at https://copticscriptorium.org/treebank.html |
==Tools== | ==Tools== | ||
− | * An online NLP pipeline interface and API are available at | + | * An online NLP pipeline interface and API are available at https://tools.copticscriptorium.org/coptic-nlp/ . Also includes a dependency parser. |
* Individual command line tools used in the pipeline: | * Individual command line tools used in the pipeline: | ||
** [https://github.com/CopticScriptorium/tokenizers Tokenizer] - for UTF-8 plain text or XML input, also performs morphological analysis of tokens into constituent morphemes | ** [https://github.com/CopticScriptorium/tokenizers Tokenizer] - for UTF-8 plain text or XML input, also performs morphological analysis of tokens into constituent morphemes | ||
Line 11: | Line 11: | ||
** [https://github.com/CopticScriptorium/normalizer Automatic normalization] | ** [https://github.com/CopticScriptorium/normalizer Automatic normalization] | ||
− | * Coreference resolution and NER for Coptic have been implemented in [ | + | * Coreference resolution and NER for Coptic have been implemented in [https://gucorpling.org/xrenner xrenner] |
[[Category:Resources by language|Coptic]] | [[Category:Resources by language|Coptic]] |
Latest revision as of 09:04, 16 September 2022
Corpora
- Multiple corpora from the project Coptic Scriptorium are available for download from https://copticscriptorium.org under a CC-BY license
- The Coptic Universal Dependency Treebank is available at https://copticscriptorium.org/treebank.html
Tools
- An online NLP pipeline interface and API are available at https://tools.copticscriptorium.org/coptic-nlp/ . Also includes a dependency parser.
- Individual command line tools used in the pipeline:
- Tokenizer - for UTF-8 plain text or XML input, also performs morphological analysis of tokens into constituent morphemes
- Tagging and lemmatization - models are available for use with TreeTagger
- Language of origin tagger - useful for detecting Greek and other loanwords
- Automatic normalization
- Coreference resolution and NER for Coptic have been implemented in xrenner