Difference between revisions of "Resources for Coptic"
Jump to navigation
Jump to search
Amir.zeldes (talk | contribs) (Added Coptic resources page) |
Amir.zeldes (talk | contribs) (More Coptic resources) |
||
Line 4: | Line 4: | ||
==Tools== | ==Tools== | ||
− | * An online NLP pipeline interface and API are available at http://corpling.uis.georgetown.edu/coptic-nlp | + | * An online NLP pipeline interface and API are available at http://corpling.uis.georgetown.edu/coptic-nlp . Also includes a dependency parser. |
− | * Individual tools: | + | * Individual command line tools used in the pipeline: |
− | ** [https://github.com/CopticScriptorium/tokenizers Tokenizer] - for UTF-8 plain text or XML input | + | ** [https://github.com/CopticScriptorium/tokenizers Tokenizer] - for UTF-8 plain text or XML input, also performs morphological analysis of tokens into constituent morphemes |
** [https://github.com/CopticScriptorium/tagger-part-of-speech Tagging and lemmatization] - models are available for use with [http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ TreeTagger] | ** [https://github.com/CopticScriptorium/tagger-part-of-speech Tagging and lemmatization] - models are available for use with [http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ TreeTagger] | ||
+ | ** [https://github.com/CopticScriptorium/lexical-taggers Language of origin tagger] - useful for detecting Greek and other loanwords | ||
+ | ** [https://github.com/CopticScriptorium/normalizer Automatic normalization] | ||
+ | |||
+ | * Coreference resolution and NER for Coptic have been implemented in [http://corpling.uis.georgetown.edu/xrenner xrenner] | ||
+ | |||
[[Category:Resources by language|Coptic]] | [[Category:Resources by language|Coptic]] |
Revision as of 06:31, 10 June 2016
Corpora
- Multiple corpora from the project Coptic Scriptorium are available for download from http://copticscriptorium.org under a CC-BY license
- The Coptic Universal Dependency Treebank is available at http://corpling.uis.georgetown.edu/coptic-treebank
Tools
- An online NLP pipeline interface and API are available at http://corpling.uis.georgetown.edu/coptic-nlp . Also includes a dependency parser.
- Individual command line tools used in the pipeline:
- Tokenizer - for UTF-8 plain text or XML input, also performs morphological analysis of tokens into constituent morphemes
- Tagging and lemmatization - models are available for use with TreeTagger
- Language of origin tagger - useful for detecting Greek and other loanwords
- Automatic normalization
- Coreference resolution and NER for Coptic have been implemented in xrenner