Resources for Slovenian
Jump to navigation
Jump to search
Corpora
Free license
- Europarl corpus, sentence aligned with English
- IJS - ELAN Slovene-English Parallel Corpus
- JRC Acquis parallel texts. Languages involved: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.
Non-free license
- "Communication in Slovene" corpora, includes written, spoken, web, learner's, and tagged corpora, up to 1.2 billion words
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- Multext EAST lexica, annotated "1984" corpus, parallel and comparable text and speech corpora. Languages involved: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian