Difference between revisions of "Resources for Russian"

Latest revision as of 07:55, 17 June 2015

MultiUN "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
WMT corpora, including the Yandex 1M corpus, News Commentary, and News Crawl

@@ Line 2: / Line 2: @@
 ===Free open source===
 * [http://www.euromatrixplus.net/multi-un/ MultiUN] "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
-* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including the Yandex 1M corpus, News Commentary, and News Crawl
+* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including the Yandex 1M corpus, News Commentary, and News Crawl
 ===Unknown license===
 <!-- Please keep this list in alphabetical order -->
+* [http://ucts.uniba.sk/aranea_about/ Araneum Russicum], Gigaword Russian web corpus
 * [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 * [http://www.helsinki.fi/venaja/english/e-material/hanco/index.htm HANCO: The Helsinki annotated corpus of Russian texts] (searchable, no visible download links)