Difference between revisions of "Resources for French"
Jump to navigation
Jump to search
m (→Corpora: update WMT link) |
|||
(5 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
==Corpora== | ==Corpora== | ||
− | + | * [http://www.statmt.org/wmt10/training-giga-fren.tar 10^9 French-English corpus] | |
+ | * [http://ucts.uniba.sk/aranea_about/ Araneum Francogallicum], Gigaword French web corpus | ||
* [http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais] | * [http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais] | ||
* [http://corpora.informatik.uni-leipzig.de/ French plain text and Co-occurrences at LCC] | * [http://corpora.informatik.uni-leipzig.de/ French plain text and Co-occurrences at LCC] | ||
Line 7: | Line 8: | ||
* [http://w3.univ-tlse2.fr/erss/verbaction/main.html Lexique Verbaction] | * [http://w3.univ-tlse2.fr/erss/verbaction/main.html Lexique Verbaction] | ||
* [http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel) | * [http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel) | ||
+ | * [http://www.euromatrixplus.net/multi-un/ UN parallel corpora] | ||
+ | * [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl | ||
+ | * [http://88milsms.huma-num.fr/ Large SMS corpus in French (88milSMS)] | ||
== Grammars/parsers == | == Grammars/parsers == | ||
Line 14: | Line 18: | ||
* [http://alpage.inria.fr/~sagot/lefff.html Lefff] – (Lexique des Formes Fléchies du Français) est un lexique morphologique et syntaxique à large couverture, distribué sous licence libre LGPL-LR (Lesser General Public License For Linguistic Resources), see also [http://gforge.inria.fr/projects/alexina/ Alexina] | * [http://alpage.inria.fr/~sagot/lefff.html Lefff] – (Lexique des Formes Fléchies du Français) est un lexique morphologique et syntaxique à large couverture, distribué sous licence libre LGPL-LR (Lesser General Public License For Linguistic Resources), see also [http://gforge.inria.fr/projects/alexina/ Alexina] | ||
* [http://sites.google.com/site/morfetteweb/ Morfette] data driven PoS tagger and lemmatizer, New BSD License | * [http://sites.google.com/site/morfetteweb/ Morfette] data driven PoS tagger and lemmatizer, New BSD License | ||
− | * [http://wiki.apertium.org/wiki/Main_Page Apertium] has analysers/generators in the [ | + | * [http://wiki.apertium.org/wiki/Main_Page Apertium] has analysers/generators in the [[lttoolbox]] format for French, along with statistical disambiguation models, see e.g. the files in [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-fr-ca fr-ca], [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-fr-es fr-es] and [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-br-fr br-fr] |
===Unknown licence=== | ===Unknown licence=== |
Latest revision as of 07:57, 17 June 2015
Corpora
- 10^9 French-English corpus
- Araneum Francogallicum, Gigaword French web corpus
- Base Textuelle de Moyen Francais
- French plain text and Co-occurrences at LCC
- French Stopword List
- Lexique Morphalou
- Lexique Verbaction
- Le Monde Diplomatique-Die Tageszeitung Translation Corpus - French-German, aligned (parallel)
- UN parallel corpora
- WMT corpora, including Europarl, News Commentary, and News Crawl
- Large SMS corpus in French (88milSMS)
Grammars/parsers
Free software
- HPSG FroG (under the LGPLLR according to this presentation)
- WOLF – Wordnet Libre du Français, distribuée sous licence Cecill-C (compatible LGPL)
- Lefff – (Lexique des Formes Fléchies du Français) est un lexique morphologique et syntaxique à large couverture, distribué sous licence libre LGPL-LR (Lesser General Public License For Linguistic Resources), see also Alexina
- Morfette data driven PoS tagger and lemmatizer, New BSD License
- Apertium has analysers/generators in the lttoolbox format for French, along with statistical disambiguation models, see e.g. the files in fr-ca, fr-es and br-fr
Unknown licence
- KPML generation grammar
- Treetagger has some French support (gratis for research)
- MeLT, data driven pos tagger
Morphology, dictionaries
Free software
- Dicollecte LEXIQUE FRANÇAIS, LISTE DES FORMES FLÉCHIES, MPL/GPL/LGPL
- Flemmv3.1 - inflectional morphology parser for French -- perl, GPL license.