Language Identification Tools

From ACLWiki
(Difference between revisions)
Jump to: navigation, search
(Free Software)
m (Free Software)
(One intermediate revision by one user not shown)
Line 12: Line 12:
 
*** http://opus.lingfil.uu.se/tools/public/language_guesser/textcat – perl version with more language models, encoding fixes
 
*** http://opus.lingfil.uu.se/tools/public/language_guesser/textcat – perl version with more language models, encoding fixes
 
** http://olivo.net/software/lc4j/ – a java reimplementation
 
** http://olivo.net/software/lc4j/ – a java reimplementation
 +
** http://thomas.mangin.com//content/texcat-in-python.html – a python implementation by Thomas Mangin
 +
** http://www.mnogosearch.org/guesser/ – another C reimplementation
  
  

Revision as of 05:25, 6 December 2012

A listing of language identification tools. Language identification can mean both identifiying text type (e.g. news vs literature) and language (e.g. English vs Frisian vs Dutch).

Most of these tools require training on a big corpus (see List of resources by language for corpora per language), but many come with some prebuilt language models.

Free Software



  • Compact Language Detector for Javascript https://github.com/jaukia/cld-js (3-clause license)
    • doesn't seem to include a method to add new languages, the existing ones were presumably generated by Google


Proprietary

See also

Personal tools