Difference between revisions of "Morphology software for English"

From ACL Wiki
Jump to navigation Jump to search
(moved stuff that's for italian and french into their specific pages, corrected license info)
m (Reverted edits by Ipodsoft (talk) to last revision by Erichorn)
 
(11 intermediate revisions by 7 users not shown)
Line 1: Line 1:
'''[[Software]] - Morphology and part of speech tagging'''
+
'''[[Tools and Software for English]] - Morphology and part of speech tagging'''
  
 
For languages other than English, see [[List of resources by language]].
 
For languages other than English, see [[List of resources by language]].
Line 9: Line 9:
  
 
*[http://clipdemos.umiacs.umd.edu/catvar/ Catvar 2.0] - The Categorial Variation Database for English (OSL)
 
*[http://clipdemos.umiacs.umd.edu/catvar/ Catvar 2.0] - The Categorial Variation Database for English (OSL)
*[http://wiki.apertium.org/wiki/lttoolbox lttoolbox] -- lexical processing tools for building morphological analysers/generators with XML specification files. Includes data for English (both analysis and disambiguation). (GPL)
+
*[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ HFST] - Helsinki Finite-State Transducer Technology - FST library, command line tools, hfst-twolc (a rule compiler for two-level rules), and several spellers and morphological analyzers (GPL)
*[http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html SFST] - Stuttgart Finite State Transducer Tools (GPL)
+
*[http://en.wikipedia.org/wiki/Foma_%28software%29 FOMA] - finite-state toolkit (similar to Xerox XFST), created and maintained by Måns Huldén (GPL)
 +
*[[lttoolbox]] -- lexical processing tools for building morphological analysers/generators with XML specification files. Includes data for English (both analysis and disambiguation). (GPL)
 +
*[http://www.sil.org/pckimmo/about_pc-kimmo.html PC-KIMMO] - a Two-level Processor for Morphological Analysis, including KGEN, KTEXT, and Englex
 +
*[[SFST]] - Stuttgart Finite State Transducer Tools (GPL)
 
** Where is the data for English?
 
** Where is the data for English?
 
*[http://www.issco.unige.ch/en/research/projects/MULTEXT.html MULTEXT mmorph] - (unmaintained) two-level morphology, package includes some data for English and German, (GPL2 or later)
 
*[http://www.issco.unige.ch/en/research/projects/MULTEXT.html MULTEXT mmorph] - (unmaintained) two-level morphology, package includes some data for English and German, (GPL2 or later)
Line 29: Line 32:
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/morfogen/0.html MORFOGEN] - a Morphology Grammar Builder and Dictionary Interface Tool
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/morfogen/0.html MORFOGEN] - a Morphology Grammar Builder and Dictionary Interface Tool
 
*[http://nlp.cs.nyu.edu/nomlex/index.html NOMLEX] - a dictionary of English nominalizations
 
*[http://nlp.cs.nyu.edu/nomlex/index.html NOMLEX] - a dictionary of English nominalizations
*[http://www.sil.org/pckimmo/about_pc-kimmo.html PC-KIMMO] - a Two-level Processor for Morphological Analysis, including KGEN, KTEXT, and Englex
 
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/tulip/0.html TULIP] - a two level phonological formalism
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/tulip/0.html TULIP] - a two level phonological formalism
 
*[http://www.fsmbook.com  Xerox/PARC] - finite-state morphological analysis/generation using xfst, lexc, twolc
 
*[http://www.fsmbook.com  Xerox/PARC] - finite-state morphological analysis/generation using xfst, lexc, twolc
Line 38: Line 40:
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
 
*[http://acopost.sourceforge.net/ ACOPOST - A Collection Of PoS Taggers] Maximum Entropy Tagger, Trigram Tagger, Transformation-based Tagger, Example-based tagger
 
*[http://acopost.sourceforge.net/ ACOPOST - A Collection Of PoS Taggers] Maximum Entropy Tagger, Trigram Tagger, Transformation-based Tagger, Example-based tagger
*[http://l2r.cs.uiuc.edu/~cogcomp/asoftware.php?skey=FLBJPOS LBJ POS Tagger] - Uses averaged perceptron based sequential model. Java API, Free, open source license.
+
*[http://cogcomp.cs.illinois.edu/page/software_view/3 Illinois LBJ POS Tagger] - Uses averaged [http://en.wikipedia.org/wiki/Perceptron Perceptron] based sequential model. Java API, Free, open source license.
 
*[http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ GENiA]- part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text. C++, BSD license.
 
*[http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ GENiA]- part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text. C++, BSD license.
 
*[http://nltk.sourceforge.net/ NLTK - Natural Language Toolkit] Regexp Tagger, N-Gram Tagger, Brill Tagger, HMM Tagger, plus a freely downloadable book with a chapter on tagging
 
*[http://nltk.sourceforge.net/ NLTK - Natural Language Toolkit] Regexp Tagger, N-Gram Tagger, Brill Tagger, HMM Tagger, plus a freely downloadable book with a chapter on tagging
Line 45: Line 47:
 
*[http://wiki.apertium.org/wiki/Tagger_training Tagger training] on the Apertium Wiki (HMM + constraint based)
 
*[http://wiki.apertium.org/wiki/Tagger_training Tagger training] on the Apertium Wiki (HMM + constraint based)
 
* [http://beta.visl.sdu.dk/cg3.html VISL Constraint Grammar] rule based disambiguation (GPL)
 
* [http://beta.visl.sdu.dk/cg3.html VISL Constraint Grammar] rule based disambiguation (GPL)
 +
** Is there a Free set of rules for English?
  
 
===Proprietary software===
 
===Proprietary software===

Latest revision as of 07:27, 4 November 2012

Tools and Software for English - Morphology and part of speech tagging

For languages other than English, see List of resources by language.

Morphology

Free software

  • Catvar 2.0 - The Categorial Variation Database for English (OSL)
  • HFST - Helsinki Finite-State Transducer Technology - FST library, command line tools, hfst-twolc (a rule compiler for two-level rules), and several spellers and morphological analyzers (GPL)
  • FOMA - finite-state toolkit (similar to Xerox XFST), created and maintained by Måns Huldén (GPL)
  • lttoolbox -- lexical processing tools for building morphological analysers/generators with XML specification files. Includes data for English (both analysis and disambiguation). (GPL)
  • PC-KIMMO - a Two-level Processor for Morphological Analysis, including KGEN, KTEXT, and Englex
  • SFST - Stuttgart Finite State Transducer Tools (GPL)
    • Where is the data for English?
  • MULTEXT mmorph - (unmaintained) two-level morphology, package includes some data for English and German, (GPL2 or later)

Unknown license

  • MAP - Cambridge/Edinburgh Morphological Analyzer and Dictionary System (gratis download, no license information)

Proprietary software

Part of speech tagging

Free software

  • ACOPOST - A Collection Of PoS Taggers Maximum Entropy Tagger, Trigram Tagger, Transformation-based Tagger, Example-based tagger
  • Illinois LBJ POS Tagger - Uses averaged Perceptron based sequential model. Java API, Free, open source license.
  • GENiA- part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text. C++, BSD license.
  • NLTK - Natural Language Toolkit Regexp Tagger, N-Gram Tagger, Brill Tagger, HMM Tagger, plus a freely downloadable book with a chapter on tagging
  • RelEx - provides English-language part-of-speech tagging, entity tagging, as well as other types of tags (gender, date, money ...), after performing a deep parse, so that tags agree with parse. Also provides resulting stems. Apache 2.0 License.
  • Spejd - Shallow Parsing and Disambiguation Engine a GPL tool for simultaneous rule-based morphosyntactic disambiguation and partial parsing
  • Tagger training on the Apertium Wiki (HMM + constraint based)
  • VISL Constraint Grammar rule based disambiguation (GPL)
    • Is there a Free set of rules for English?

Proprietary software

Combined morphology and tagging

Free software

Proprietary software

See also