Difference between revisions of "Resources for French"

From ACL Wiki
Jump to navigation Jump to search
m (→‎Corpora: update WMT link)
 
(14 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
==Corpora==
 
==Corpora==
 
+
* [http://www.statmt.org/wmt10/training-giga-fren.tar 10^9 French-English corpus]
 +
* [http://ucts.uniba.sk/aranea_about/ Araneum Francogallicum], Gigaword French web corpus
 
* [http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
 
* [http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
 
* [http://corpora.informatik.uni-leipzig.de/ French plain text and Co-occurrences at LCC]
 
* [http://corpora.informatik.uni-leipzig.de/ French plain text and Co-occurrences at LCC]
Line 6: Line 7:
 
* [http://www.cnrtl.fr/lexiques/morphalou/ Lexique Morphalou]
 
* [http://www.cnrtl.fr/lexiques/morphalou/ Lexique Morphalou]
 
* [http://w3.univ-tlse2.fr/erss/verbaction/main.html Lexique Verbaction]
 
* [http://w3.univ-tlse2.fr/erss/verbaction/main.html Lexique Verbaction]
 +
* [http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)
 +
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 +
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl
 +
* [http://88milsms.huma-num.fr/ Large SMS corpus in French (88milSMS)]
  
 
== Grammars/parsers ==
 
== Grammars/parsers ==
Line 12: Line 17:
 
* [http://alpage.inria.fr/~sagot/wolf.html WOLF] – Wordnet Libre du Français, distribuée sous licence Cecill-C (compatible LGPL)
 
* [http://alpage.inria.fr/~sagot/wolf.html WOLF] – Wordnet Libre du Français, distribuée sous licence Cecill-C (compatible LGPL)
 
* [http://alpage.inria.fr/~sagot/lefff.html Lefff] – (Lexique des Formes Fléchies du Français) est un lexique morphologique et syntaxique à large couverture, distribué sous licence libre LGPL-LR (Lesser General Public License For Linguistic Resources), see also [http://gforge.inria.fr/projects/alexina/ Alexina]
 
* [http://alpage.inria.fr/~sagot/lefff.html Lefff] – (Lexique des Formes Fléchies du Français) est un lexique morphologique et syntaxique à large couverture, distribué sous licence libre LGPL-LR (Lesser General Public License For Linguistic Resources), see also [http://gforge.inria.fr/projects/alexina/ Alexina]
 +
* [http://sites.google.com/site/morfetteweb/ Morfette] data driven PoS tagger and lemmatizer, New BSD License
 +
* [http://wiki.apertium.org/wiki/Main_Page Apertium] has analysers/generators in the [[lttoolbox]] format for French, along with statistical disambiguation models, see e.g. the files in [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-fr-ca fr-ca], [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-fr-es fr-es] and [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-br-fr br-fr]
  
 
===Unknown licence===
 
===Unknown licence===
 
* [[Generation grammars|KPML generation grammar]]
 
* [[Generation grammars|KPML generation grammar]]
 +
* [http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html Treetagger] has some French support (gratis for research)
 +
* [https://gforge.inria.fr/frs/download.php/27240/melt-0.6.tar.gz MeLT], data driven pos tagger
  
 
==Morphology, dictionaries==
 
==Morphology, dictionaries==
 
===Free software===
 
===Free software===
 
* [http://www.dicollecte.org/ Dicollecte] LEXIQUE FRANÇAIS, LISTE DES FORMES FLÉCHIES, MPL/GPL/LGPL
 
* [http://www.dicollecte.org/ Dicollecte] LEXIQUE FRANÇAIS, LISTE DES FORMES FLÉCHIES, MPL/GPL/LGPL
* [http://www.univ-nancy2.fr/pers/namer/Telecharger_Flemm.html Flemmv3.1] - inflectional morphology parser for French --free, GPL license.
+
* [http://www.univ-nancy2.fr/pers/namer/Telecharger_Flemm.html Flemmv3.1] - inflectional morphology parser for French -- perl, GPL license.
  
 
[[Category:Resources by language|French]]
 
[[Category:Resources by language|French]]

Latest revision as of 07:57, 17 June 2015

Corpora

Grammars/parsers

Free software

  • HPSG FroG (under the LGPLLR according to this presentation)
  • WOLF – Wordnet Libre du Français, distribuée sous licence Cecill-C (compatible LGPL)
  • Lefff – (Lexique des Formes Fléchies du Français) est un lexique morphologique et syntaxique à large couverture, distribué sous licence libre LGPL-LR (Lesser General Public License For Linguistic Resources), see also Alexina
  • Morfette data driven PoS tagger and lemmatizer, New BSD License
  • Apertium has analysers/generators in the lttoolbox format for French, along with statistical disambiguation models, see e.g. the files in fr-ca, fr-es and br-fr

Unknown licence

Morphology, dictionaries

Free software

  • Dicollecte LEXIQUE FRANÇAIS, LISTE DES FORMES FLÉCHIES, MPL/GPL/LGPL
  • Flemmv3.1 - inflectional morphology parser for French -- perl, GPL license.