Resources for Finnish
Jump to navigation
Jump to search
Corpora
Free
- Europarl corpus, sentence aligned with English
- WMT News Crawl monolingual corpus. Currently 14M tokens.
- Finnish plain text and Co-occurrences at LCC
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
Non-Free
- Araneum Finnicum, Gigaword Finnish web corpus
- CSC Kielipankki Language Bank at the CSC Scientific Computing Centre, including some 200 million word tokens of Finnish texts.
NLP Tools
Free software
- UralicNLP is a Python library that provides morphological tagging, generation, lemmatization and disambiguation in many Uralic languages including Finnish
- Omorfi is an Open Morphology for Finnish, in association with the voikko speller project, see also https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion for installing with HFST. (LGPL/GPL)
- FinMeter can be used to analyze Finnish poetry. The functionalities include rhyme, meter, metaphor interpretation and sentiment analysis.
- Murre can normalize spoken or dialectal Finnsh text into the standard written norm. It can also generate dialectal forms from standard Finnish.
- SyntaxMaker. A surface realization tool for Finnish NLG (natural language generation)
- Turku Neural Parser is the state of the art syntactic parser for Finnish.
Models
- NLPL word embeddings. Different models for Finnish (BERT, ELMo, word2vec)
- Finnish relatedness model word vectors to capture word relatedness.