Resources for Finnish
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Corpora
Free
- Europarl corpus, sentence aligned with English
- WMT News Crawl monolingual corpus. Currently 14M tokens.
- Finnish plain text and Co-occurrences at LCC
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
Non-Free
- Araneum Finnicum, Gigaword Finnish web corpus
- CSC Kielipankki Language Bank at the CSC Scientific Computing Centre, including some 200 million word tokens of Finnish texts.
NLP Tools
Free software
- UralicNLP is a Python library that provides morphological tagging, generation, lemmatization and disambiguation in many Uralic languages including Finnish
- Omorfi is an Open Morphology for Finnish, in association with the voikko speller project, see also https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion for installing with HFST. (LGPL/GPL)
- FinMeter can be used to analyze Finnish poetry. The functionalities include rhyme, meter, metaphor interpretation and sentiment analysis.
- Murre can normalize spoken or dialectal Finnsh text into the standard written norm. It can also generate dialectal forms from standard Finnish.
- SyntaxMaker. A surface realization tool for Finnish NLG (natural language generation)
- Turku Neural Parser is the state of the art syntactic parser for Finnish.
Models
- SemFi. Co-occurrences of Finnish words given their syntactic relations.
- Finnish relatedness model. Word vectors to capture word relatedness.
- NLPL word embeddings. Different models for Finnish (BERT, ELMo, word2vec)