Difference between revisions of "Resources for Finnish"

From ACL Wiki
Jump to navigation Jump to search
 
(10 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 
==Corpora==
 
==Corpora==
 +
===Free===
 +
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English
 +
* [http://www.statmt.org/wmt15/translation-task.html WMT News Crawl] monolingual corpus.  Currently 14M tokens.
 +
* [http://corpora.informatik.uni-leipzig.de/ Finnish plain text and Co-occurrences at LCC]
 +
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 +
 +
===Non-Free===
 +
* [http://ucts.uniba.sk/aranea_about/ Araneum Finnicum], Gigaword Finnish web corpus
 +
* [http://www.kielipankki.fi CSC Kielipankki] Language Bank at the [http://www.csc.fi/ CSC] Scientific Computing Centre, including some 200 million word tokens of Finnish texts.
 +
 +
==NLP Tools==
 +
===Free software===
 +
* [https://github.com/mikahama/uralicNLP UralicNLP] is a Python library that provides morphological tagging, generation, lemmatization and disambiguation in many Uralic languages including Finnish
 +
* [https://gna.org/projects/omorfi/ Omorfi] is an Open Morphology for Finnish, in association with the [[voikko]] speller project, see also https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion for installing with [[HFST]]. (LGPL/GPL)
 +
* [https://github.com/mikahama/finmeter FinMeter] can be used to analyze Finnish poetry. The functionalities include rhyme, meter, metaphor interpretation and sentiment analysis.
 +
* [https://github.com/mikahama/murre Murre] can normalize spoken or dialectal Finnsh text into the standard written norm. It can also generate dialectal forms from standard Finnish.
 +
* [https://github.com/mikahama/syntaxmaker SyntaxMaker]. A surface realization tool for Finnish NLG (natural language generation)
 +
* [http://turkunlp.org/Turku-neural-parser-pipeline/ Turku Neural Parser] is the state of the art syntactic parser for Finnish.
 +
 +
 +
==Models==
 +
 +
* [https://zenodo.org/record/1463685 SemFi]. Co-occurrences of Finnish words given their syntactic relations.
 +
* [http://doi.org/10.23728/b2share.5f1a5add29094d85800e5d4d2b852cdc Finnish relatedness model]. Word vectors to capture word relatedness.
 +
* [http://vectors.nlpl.eu/repository/# NLPL word embeddings]. Different models for Finnish (BERT, ELMo, word2vec)
 +
  
* [http://corpora.informatik.uni-leipzig.de/ Finnish plain text and Co-occurrences at LCC]
+
[[Category:Resources by language|Finnish]]
* [http://www.csc.fi/english/research/sciences/linguistics/index_html CSC Kielipankki] Language Bank at the [http://www.csc.fi/ CSC] Scientific Computing Centre, including some 200 million word tokens of Finnish texts.
 

Latest revision as of 03:38, 29 June 2020

Corpora

Free

Non-Free

  • Araneum Finnicum, Gigaword Finnish web corpus
  • CSC Kielipankki Language Bank at the CSC Scientific Computing Centre, including some 200 million word tokens of Finnish texts.

NLP Tools

Free software

  • UralicNLP is a Python library that provides morphological tagging, generation, lemmatization and disambiguation in many Uralic languages including Finnish
  • Omorfi is an Open Morphology for Finnish, in association with the voikko speller project, see also https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion for installing with HFST. (LGPL/GPL)
  • FinMeter can be used to analyze Finnish poetry. The functionalities include rhyme, meter, metaphor interpretation and sentiment analysis.
  • Murre can normalize spoken or dialectal Finnsh text into the standard written norm. It can also generate dialectal forms from standard Finnish.
  • SyntaxMaker. A surface realization tool for Finnish NLG (natural language generation)
  • Turku Neural Parser is the state of the art syntactic parser for Finnish.


Models