Difference between revisions of "Resources for Finnish"
Jump to navigation
Jump to search
(distinguish free vs. non-free corpora; +corpus link; etc.) |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
* [http://www.kielipankki.fi CSC Kielipankki] Language Bank at the [http://www.csc.fi/ CSC] Scientific Computing Centre, including some 200 million word tokens of Finnish texts. | * [http://www.kielipankki.fi CSC Kielipankki] Language Bank at the [http://www.csc.fi/ CSC] Scientific Computing Centre, including some 200 million word tokens of Finnish texts. | ||
− | == | + | ==NLP Tools== |
===Free software=== | ===Free software=== | ||
+ | * [https://github.com/mikahama/uralicNLP UralicNLP] is a Python library that provides morphological tagging, generation, lemmatization and disambiguation in many Uralic languages including Finnish | ||
* [https://gna.org/projects/omorfi/ Omorfi] is an Open Morphology for Finnish, in association with the [[voikko]] speller project, see also https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion for installing with [[HFST]]. (LGPL/GPL) | * [https://gna.org/projects/omorfi/ Omorfi] is an Open Morphology for Finnish, in association with the [[voikko]] speller project, see also https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion for installing with [[HFST]]. (LGPL/GPL) | ||
+ | * [https://github.com/mikahama/finmeter FinMeter] can be used to analyze Finnish poetry. The functionalities include rhyme, meter, metaphor interpretation and sentiment analysis. | ||
+ | * [https://github.com/mikahama/murre Murre] can normalize spoken or dialectal Finnsh text into the standard written norm. It can also generate dialectal forms from standard Finnish. | ||
+ | * [https://github.com/mikahama/syntaxmaker SyntaxMaker]. A surface realization tool for Finnish NLG (natural language generation) | ||
+ | * [http://turkunlp.org/Turku-neural-parser-pipeline/ Turku Neural Parser] is the state of the art syntactic parser for Finnish. | ||
+ | |||
+ | |||
+ | ==Models== | ||
+ | |||
+ | * [https://zenodo.org/record/1463685 SemFi]. Co-occurrences of Finnish words given their syntactic relations. | ||
+ | * [http://doi.org/10.23728/b2share.5f1a5add29094d85800e5d4d2b852cdc Finnish relatedness model]. Word vectors to capture word relatedness. | ||
+ | * [http://vectors.nlpl.eu/repository/# NLPL word embeddings]. Different models for Finnish (BERT, ELMo, word2vec) | ||
[[Category:Resources by language|Finnish]] | [[Category:Resources by language|Finnish]] |
Latest revision as of 03:38, 29 June 2020
Corpora
Free
- Europarl corpus, sentence aligned with English
- WMT News Crawl monolingual corpus. Currently 14M tokens.
- Finnish plain text and Co-occurrences at LCC
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
Non-Free
- Araneum Finnicum, Gigaword Finnish web corpus
- CSC Kielipankki Language Bank at the CSC Scientific Computing Centre, including some 200 million word tokens of Finnish texts.
NLP Tools
Free software
- UralicNLP is a Python library that provides morphological tagging, generation, lemmatization and disambiguation in many Uralic languages including Finnish
- Omorfi is an Open Morphology for Finnish, in association with the voikko speller project, see also https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion for installing with HFST. (LGPL/GPL)
- FinMeter can be used to analyze Finnish poetry. The functionalities include rhyme, meter, metaphor interpretation and sentiment analysis.
- Murre can normalize spoken or dialectal Finnsh text into the standard written norm. It can also generate dialectal forms from standard Finnish.
- SyntaxMaker. A surface realization tool for Finnish NLG (natural language generation)
- Turku Neural Parser is the state of the art syntactic parser for Finnish.
Models
- SemFi. Co-occurrences of Finnish words given their syntactic relations.
- Finnish relatedness model. Word vectors to capture word relatedness.
- NLPL word embeddings. Different models for Finnish (BERT, ELMo, word2vec)