Difference between revisions of "Resources for Russian"

From ACL Wiki
Jump to: navigation, search
(Corpora)
(HamleDT)
 
(3 intermediate revisions by 2 users not shown)
Line 2: Line 2:
 
===Free open source===
 
===Free open source===
 
* [http://www.euromatrixplus.net/multi-un/ MultiUN] "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
 
* [http://www.euromatrixplus.net/multi-un/ MultiUN] "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
 +
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including the Yandex 1M corpus, News Commentary, and News Crawl
  
 
===Unknown license===
 
===Unknown license===
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
  
 +
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
* [http://www.helsinki.fi/venaja/english/e-material/hanco/index.htm HANCO: The Helsinki annotated corpus of Russian texts] (searchable, no visible download links)
 
* [http://www.helsinki.fi/venaja/english/e-material/hanco/index.htm HANCO: The Helsinki annotated corpus of Russian texts] (searchable, no visible download links)
 
* [http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora (uni-tuebingen.de)] (searchable, no visible download links)
 
* [http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora (uni-tuebingen.de)] (searchable, no visible download links)
Line 21: Line 23:
 
== Grammars ==
 
== Grammars ==
 
* [[Generation grammars|KPML generation grammar]]
 
* [[Generation grammars|KPML generation grammar]]
 +
* [http://abisource.com/projects/link-grammar/ Link Grammar Parser], includes Russian dictionaries.
  
 
==Various resources==
 
==Various resources==

Latest revision as of 08:51, 26 May 2014

Corpora

Free open source

  • MultiUN "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German
  • WMT corpora, including the Yandex 1M corpus, News Commentary, and News Crawl

Unknown license

POS taggers

Grammars

Various resources