Difference between revisions of "Resources for Finnish"

From ACL Wiki
Jump to navigation Jump to search
(Added: Araneum)
(distinguish free vs. non-free corpora; +corpus link; etc.)
Line 1: Line 1:
 
==Corpora==
 
==Corpora==
* [http://ucts.uniba.sk/aranea_about/ Araneum Finnicum], Gigaword Finnish web corpus
+
===Free===
 
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
 
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
 +
* [http://www.statmt.org/wmt15/translation-task.html WMT News Crawl] monolingual corpus.  Currently 14M tokens.
 
* [http://corpora.informatik.uni-leipzig.de/ Finnish plain text and Co-occurrences at LCC]
 
* [http://corpora.informatik.uni-leipzig.de/ Finnish plain text and Co-occurrences at LCC]
* [http://www.csc.fi/english/research/sciences/linguistics/index_html CSC Kielipankki] Language Bank at the [http://www.csc.fi/ CSC] Scientific Computing Centre, including some 200 million word tokens of Finnish texts.
 
 
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 +
 +
===Non-Free===
 +
* [http://ucts.uniba.sk/aranea_about/ Araneum Finnicum], Gigaword Finnish web corpus
 +
* [http://www.kielipankki.fi CSC Kielipankki] Language Bank at the [http://www.csc.fi/ CSC] Scientific Computing Centre, including some 200 million word tokens of Finnish texts.
  
 
==Morphological analysers==
 
==Morphological analysers==

Revision as of 08:32, 17 June 2015

Corpora

Free

Non-Free

  • Araneum Finnicum, Gigaword Finnish web corpus
  • CSC Kielipankki Language Bank at the CSC Scientific Computing Centre, including some 200 million word tokens of Finnish texts.

Morphological analysers

Free software