Resources for Russian

From ACLWiki
(Difference between revisions)
Jump to: navigation, search
(Free open source: fix listing)
(11 intermediate revisions by 8 users not shown)
Line 1: Line 1:
Russian National Corpus (http://www.ruscorpora.ru)<p>
+
==Corpora==
Helsinki Annotated Corpus HANCO (http://www.slav.helsinki.fi/hanco/index_en.html)
+
===Free open source===
 +
* [http://www.euromatrixplus.net/multi-un/ MultiUN] "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
 +
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including the Yandex 1M corpus, News Commentary, and News Crawl
 +
 
 +
===Unknown license===
 +
<!-- Please keep this list in alphabetical order -->
 +
 
 +
* [http://www.helsinki.fi/venaja/english/e-material/hanco/index.htm HANCO: The Helsinki annotated corpus of Russian texts] (searchable, no visible download links)
 +
* [http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora (uni-tuebingen.de)] (searchable, no visible download links)
 +
* [http://corpus.leeds.ac.uk/ruscorpora.html Russian Internet Corpus]
 +
* [http://www.ruscorpora.ru/ Russian National Corpus]
 +
* [http://www.philol.msu.ru/~lex/corpus/ Russian Newspaper Corpus]
 +
* [http://lib.ru/ Various texts in Russian (lib.ru)]
 +
 
 +
== POS taggers ==
 +
 
 +
* [http://www.aot.ru/ AOT, morphological analyser]
 +
* [http://corpus.leeds.ac.uk/mocky/ Mocky, statistical taggers and lemmatiser]
 +
* [http://company.yandex.ru/technology/mystem/ Mystem, morphological analyser]
 +
 
 +
== Grammars ==
 +
* [[Generation grammars|KPML generation grammar]]
 +
 
 +
==Various resources==
 +
* [http://rykov-cl.narod.ru/r.html Russian Corpora (rykov-cl.narod.ru)]
 +
* [http://corpus.leeds.ac.uk/serge/frqlist/ Russian frequency lists]
 +
* [http://www.philol.msu.ru/rus/galya-1 Russian Phonetics on the Web]
 +
* [http://schools.keldysh.ru/uvk1838/Sciper/volume2/langres/russiclr.htm Russicon Resources]
 +
 
 +
 
 +
[[Category:Resources by language|Russian]]

Revision as of 13:09, 12 October 2013

Contents

Corpora

Free open source

  • MultiUN "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
  • WMT corpora, including the Yandex 1M corpus, News Commentary, and News Crawl

Unknown license

POS taggers

Grammars

Various resources

Personal tools