Difference between revisions of "Resources for Russian"
Jump to navigation
Jump to search
(Added: Araneum) |
m (Typo) |
||
Line 7: | Line 7: | ||
<!-- Please keep this list in alphabetical order --> | <!-- Please keep this list in alphabetical order --> | ||
− | * [http://ucts.uniba.sk/aranea_about/ Araneum | + | * [http://ucts.uniba.sk/aranea_about/ Araneum Russicum], Gigaword Russian web corpus |
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style. | * [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style. | ||
* [http://www.helsinki.fi/venaja/english/e-material/hanco/index.htm HANCO: The Helsinki annotated corpus of Russian texts] (searchable, no visible download links) | * [http://www.helsinki.fi/venaja/english/e-material/hanco/index.htm HANCO: The Helsinki annotated corpus of Russian texts] (searchable, no visible download links) |
Revision as of 12:23, 8 March 2015
Corpora
Free open source
- MultiUN "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
- WMT corpora, including the Yandex 1M corpus, News Commentary, and News Crawl
Unknown license
- Araneum Russicum, Gigaword Russian web corpus
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- HANCO: The Helsinki annotated corpus of Russian texts (searchable, no visible download links)
- Russian Corpora (uni-tuebingen.de) (searchable, no visible download links)
- Russian Internet Corpus
- Russian National Corpus
- Russian Newspaper Corpus
- Various texts in Russian (lib.ru)
POS taggers
- AOT, morphological analyser
- Mocky, statistical taggers and lemmatiser
- Mystem, morphological analyser
Grammars
- KPML generation grammar
- Link Grammar Parser, includes Russian dictionaries.