Resources for Russian
Revision as of 11:05, 12 October 2013 by Jonsafari (talk | contribs) (→Free open source: +WMT corpora)
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Corpora
Free open source
- MultiUN "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
- WMT corpora, including Europarl, News Commentary, and News Crawl
Unknown license
- HANCO: The Helsinki annotated corpus of Russian texts (searchable, no visible download links)
- Russian Corpora (uni-tuebingen.de) (searchable, no visible download links)
- Russian Internet Corpus
- Russian National Corpus
- Russian Newspaper Corpus
- Various texts in Russian (lib.ru)
POS taggers
- AOT, morphological analyser
- Mocky, statistical taggers and lemmatiser
- Mystem, morphological analyser