Difference between revisions of "Resources for Russian"
Jump to navigation
Jump to search
(→Free open source: update WMT link) |
|||
(15 intermediate revisions by 11 users not shown) | |||
Line 1: | Line 1: | ||
==Corpora== | ==Corpora== | ||
− | * Russian | + | ===Free open source=== |
− | * | + | * [http://www.euromatrixplus.net/multi-un/ MultiUN] "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German |
+ | * [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including the Yandex 1M corpus, News Commentary, and News Crawl | ||
+ | |||
+ | ===Unknown license=== | ||
+ | <!-- Please keep this list in alphabetical order --> | ||
+ | |||
+ | * [http://ucts.uniba.sk/aranea_about/ Araneum Russicum], Gigaword Russian web corpus | ||
+ | * [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style. | ||
+ | * [http://www.helsinki.fi/venaja/english/e-material/hanco/index.htm HANCO: The Helsinki annotated corpus of Russian texts] (searchable, no visible download links) | ||
+ | * [http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora (uni-tuebingen.de)] (searchable, no visible download links) | ||
+ | * [http://corpus.leeds.ac.uk/ruscorpora.html Russian Internet Corpus] | ||
+ | * [http://www.ruscorpora.ru/ Russian National Corpus] | ||
+ | * [http://www.philol.msu.ru/~lex/corpus/ Russian Newspaper Corpus] | ||
+ | * [http://lib.ru/ Various texts in Russian (lib.ru)] | ||
+ | |||
+ | == POS taggers == | ||
+ | |||
+ | * [http://www.aot.ru/ AOT, morphological analyser] | ||
+ | * [http://corpus.leeds.ac.uk/mocky/ Mocky, statistical taggers and lemmatiser] | ||
+ | * [http://company.yandex.ru/technology/mystem/ Mystem, morphological analyser] | ||
+ | |||
+ | == Grammars == | ||
+ | * [[Generation grammars|KPML generation grammar]] | ||
+ | * [http://abisource.com/projects/link-grammar/ Link Grammar Parser], includes Russian dictionaries. | ||
+ | |||
+ | ==Various resources== | ||
+ | * [http://rykov-cl.narod.ru/r.html Russian Corpora (rykov-cl.narod.ru)] | ||
+ | * [http://corpus.leeds.ac.uk/serge/frqlist/ Russian frequency lists] | ||
+ | * [http://www.philol.msu.ru/rus/galya-1 Russian Phonetics on the Web] | ||
+ | * [http://schools.keldysh.ru/uvk1838/Sciper/volume2/langres/russiclr.htm Russicon Resources] | ||
+ | |||
+ | |||
+ | [[Category:Resources by language|Russian]] |
Latest revision as of 07:55, 17 June 2015
Corpora
Free open source
- MultiUN "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
- WMT corpora, including the Yandex 1M corpus, News Commentary, and News Crawl
Unknown license
- Araneum Russicum, Gigaword Russian web corpus
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- HANCO: The Helsinki annotated corpus of Russian texts (searchable, no visible download links)
- Russian Corpora (uni-tuebingen.de) (searchable, no visible download links)
- Russian Internet Corpus
- Russian National Corpus
- Russian Newspaper Corpus
- Various texts in Russian (lib.ru)
POS taggers
- AOT, morphological analyser
- Mocky, statistical taggers and lemmatiser
- Mystem, morphological analyser
Grammars
- KPML generation grammar
- Link Grammar Parser, includes Russian dictionaries.