Difference between revisions of "Resources for Slovenian"

From ACL Wiki
Jump to navigation Jump to search
(+slovenscina.eu corpora)
(HamleDT)
 
Line 7: Line 7:
 
===Non-free license===
 
===Non-free license===
 
* [http://eng.slovenscina.eu/korpusi "Communication in Slovene" corpora], includes written, spoken, web, learner's, and tagged corpora, up to 1.2 billion words
 
* [http://eng.slovenscina.eu/korpusi "Communication in Slovene" corpora], includes written, spoken, web, learner's, and tagged corpora, up to 1.2 billion words
 +
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
* [http://nl.ijs.si/ME/ Multext EAST] lexica, annotated "1984" corpus, parallel and comparable text and speech corpora.  Languages involved: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian
 
* [http://nl.ijs.si/ME/ Multext EAST] lexica, annotated "1984" corpus, parallel and comparable text and speech corpora.  Languages involved: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian
  

Latest revision as of 08:52, 26 May 2014

Corpora

Free license

  • Europarl corpus, sentence aligned with English
  • IJS - ELAN Slovene-English Parallel Corpus
  • JRC Acquis parallel texts. Languages involved: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.

Non-free license

  • "Communication in Slovene" corpora, includes written, spoken, web, learner's, and tagged corpora, up to 1.2 billion words
  • HamleDT, harmonized dependency treebanks of many languages, common annotation style.
  • Multext EAST lexica, annotated "1984" corpus, parallel and comparable text and speech corpora. Languages involved: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian