Difference between revisions of "Resources for Slovenian"

Latest revision as of 08:52, 26 May 2014

Corpora

Free license

Europarl corpus, sentence aligned with English
IJS - ELAN Slovene-English Parallel Corpus
JRC Acquis parallel texts. Languages involved: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.

Non-free license

"Communication in Slovene" corpora, includes written, spoken, web, learner's, and tagged corpora, up to 1.2 billion words
HamleDT, harmonized dependency treebanks of many languages, common annotation style.
Multext EAST lexica, annotated "1984" corpus, parallel and comparable text and speech corpora. Languages involved: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian

@@ Line 1: / Line 1: @@
 ==Corpora==
-* [http://nl.ijs.si/elan/ Slovene-English IJS - ELAN  Parallel Corpus] License: "freely available for downloading, but please acknowledge in any publications"
+===Free license===
-* [http://langtech.jrc.it/JRC-Acquis.html JRC Acquis] parallel texts in the following 22 languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish. License: Public domain.
+* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
+* [http://nl.ijs.si/elan/ IJS - ELAN] Slovene-English Parallel Corpus
+* [http://langtech.jrc.it/JRC-Acquis.html JRC Acquis] parallel texts.  Languages involved: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.
+===Non-free license===
+* [http://eng.slovenscina.eu/korpusi "Communication in Slovene" corpora], includes written, spoken, web, learner's, and tagged corpora, up to 1.2 billion words
+* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
+* [http://nl.ijs.si/ME/ Multext EAST] lexica, annotated "1984" corpus, parallel and comparable text and speech corpora.  Languages involved: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian
 [[Category:Resources by language|Solvenian]]

Difference between revisions of "Resources for Slovenian"

Latest revision as of 08:52, 26 May 2014

Corpora

Free license

Non-free license

Navigation menu

Search