Difference between revisions of "Morphology software for English"

From ACL Wiki
Jump to: navigation, search
(Morphology: categorising into free/non-free and removing a 404)
m (Reverted edits by Ipodsoft (talk) to last revision by Erichorn)
 
(31 intermediate revisions by 12 users not shown)
Line 1: Line 1:
'''[[Software]] - Morphology and part of speech tagging'''
+
'''[[Tools and Software for English]] - Morphology and part of speech tagging'''
 +
 
 +
For languages other than English, see [[List of resources by language]].
  
 
== Morphology ==
 
== Morphology ==
Line 7: Line 9:
  
 
*[http://clipdemos.umiacs.umd.edu/catvar/ Catvar 2.0] - The Categorial Variation Database for English (OSL)
 
*[http://clipdemos.umiacs.umd.edu/catvar/ Catvar 2.0] - The Categorial Variation Database for English (OSL)
*[http://wiki.apertium.org/wiki/lttoolbox lttoolbox] -- lexical processing tools for building morphological analysers/generators with XML specification files (GPL)
+
*[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ HFST] - Helsinki Finite-State Transducer Technology - FST library, command line tools, hfst-twolc (a rule compiler for two-level rules), and several spellers and morphological analyzers (GPL)
*[http://sslmitdev-online.sslmit.unibo.it/linguistics/morph-it.php Morph-It! version 0.31] - a free morphological resource for the Italian language
+
*[http://en.wikipedia.org/wiki/Foma_%28software%29 FOMA] - finite-state toolkit (similar to Xerox XFST), created and maintained by Måns Huldén (GPL)
*[http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html SFST] - Stuttgart Finite State Transducer Tools (GPL)
+
*[[lttoolbox]] -- lexical processing tools for building morphological analysers/generators with XML specification files. Includes data for English (both analysis and disambiguation). (GPL)
 +
*[http://www.sil.org/pckimmo/about_pc-kimmo.html PC-KIMMO] - a Two-level Processor for Morphological Analysis, including KGEN, KTEXT, and Englex
 +
*[[SFST]] - Stuttgart Finite State Transducer Tools (GPL)
 +
** Where is the data for English?
 +
*[http://www.issco.unige.ch/en/research/projects/MULTEXT.html MULTEXT mmorph] - (unmaintained) two-level morphology, package includes some data for English and German, (GPL2 or later)
 +
 
 +
===Unknown license===
 +
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/map/0.html MAP] - Cambridge/Edinburgh Morphological Analyzer and Dictionary System (gratis download, no license information)
  
 
===Proprietary software===
 
===Proprietary software===
Line 16: Line 25:
  
 
*[http://www.ru.nl/celex/ CELEX database] - Dutch, English, and German word forms
 
*[http://www.ru.nl/celex/ CELEX database] - Dutch, English, and German word forms
*[http://www.univ-nancy2.fr/pers/namer/Telecharger_Flemm.html Flemmv3.1] - inflectional morphology parser for French
 
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/fonol/0.html FONOL] - Phonological Programming Language (non-commercial only)
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/fonol/0.html FONOL] - Phonological Programming Language (non-commercial only)
 
*[http://services.canoo.com/MorphologyBrowser.html German Morphology Browser]
 
*[http://services.canoo.com/MorphologyBrowser.html German Morphology Browser]
 
*[http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Hebrew Morphological Parser]
 
*[http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Hebrew Morphological Parser]
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/map/0.html MAP] - Cambridge/Edinburgh Morphological Analyzer and Dictionary System (freeware)
 
 
*[http://bach.arts.kuleuven.ac.be/~piet/morlex/index.html MORLEX] - A lexical database for French
 
*[http://bach.arts.kuleuven.ac.be/~piet/morlex/index.html MORLEX] - A lexical database for French
 
*[http://www.informatics.susx.ac.uk/research/nlp/carroll/morph.html morpha and morphg] - fast and robust morphological analysis and generation for English, from John A. Carroll (non-commercial only)
 
*[http://www.informatics.susx.ac.uk/research/nlp/carroll/morph.html morpha and morphg] - fast and robust morphological analysis and generation for English, from John A. Carroll (non-commercial only)
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/morfogen/0.html MORFOGEN] - a Morphology Grammar Builder and Dictionary Interface Tool
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/morfogen/0.html MORFOGEN] - a Morphology Grammar Builder and Dictionary Interface Tool
 
*[http://nlp.cs.nyu.edu/nomlex/index.html NOMLEX] - a dictionary of English nominalizations
 
*[http://nlp.cs.nyu.edu/nomlex/index.html NOMLEX] - a dictionary of English nominalizations
*[http://www.sil.org/pckimmo/about_pc-kimmo.html PC-KIMMO] - a Two-level Processor for Morphological Analysis, including KGEN, KTEXT, and Englex
 
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/tulip/0.html TULIP] - a two level phonological formalism
 
*[http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/nlp/morph/tulip/0.html TULIP] - a two level phonological formalism
 +
*[http://www.fsmbook.com  Xerox/PARC] - finite-state morphological analysis/generation using xfst, lexc, twolc
  
 
==Part of speech tagging==
 
==Part of speech tagging==
<!-- Please keep this list in alphabetical order -->
 
  
 +
===Free software===
 +
<!-- Please keep this list in alphabetical order -->
 
*[http://acopost.sourceforge.net/ ACOPOST - A Collection Of PoS Taggers] Maximum Entropy Tagger, Trigram Tagger, Transformation-based Tagger, Example-based tagger
 
*[http://acopost.sourceforge.net/ ACOPOST - A Collection Of PoS Taggers] Maximum Entropy Tagger, Trigram Tagger, Transformation-based Tagger, Example-based tagger
*[http://www.cs.jhu.edu/~brill/RBT1_14.tar.Z Brill Tagger] - supervised, trainable
+
*[http://cogcomp.cs.illinois.edu/page/software_view/3 Illinois LBJ POS Tagger] - Uses averaged [http://en.wikipedia.org/wiki/Perceptron Perceptron] based sequential model. Java API, Free, open source license.
*[http://www.connexor.com/software/tagger/ Connexor Machinese Phrase Tagger]
+
*[http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ GENiA]- part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text. C++, BSD license.
 
*[http://nltk.sourceforge.net/ NLTK - Natural Language Toolkit] Regexp Tagger, N-Gram Tagger, Brill Tagger, HMM Tagger, plus a freely downloadable book with a chapter on tagging
 
*[http://nltk.sourceforge.net/ NLTK - Natural Language Toolkit] Regexp Tagger, N-Gram Tagger, Brill Tagger, HMM Tagger, plus a freely downloadable book with a chapter on tagging
 +
*[http://opencog.org/wiki/RelEx RelEx] - provides English-language part-of-speech tagging, entity tagging, as well as other types of tags (gender, date, money ...), after performing a deep parse, so that tags agree with parse. Also provides resulting stems. Apache 2.0 License.
 
*[http://nlp.ipipan.waw.pl/Spejd/ Spejd - Shallow Parsing and Disambiguation Engine] a GPL tool for simultaneous rule-based morphosyntactic disambiguation and partial parsing
 
*[http://nlp.ipipan.waw.pl/Spejd/ Spejd - Shallow Parsing and Disambiguation Engine] a GPL tool for simultaneous rule-based morphosyntactic disambiguation and partial parsing
 +
*[http://wiki.apertium.org/wiki/Tagger_training Tagger training] on the Apertium Wiki (HMM + constraint based)
 +
* [http://beta.visl.sdu.dk/cg3.html VISL Constraint Grammar] rule based disambiguation (GPL)
 +
** Is there a Free set of rules for English?
 +
 +
===Proprietary software===
 +
<!-- Please keep this list in alphabetical order -->
  
 
==Combined morphology and tagging==
 
==Combined morphology and tagging==
 +
 +
===Free software===
 +
<!-- Please keep this list in alphabetical order -->
 +
 +
*[http://www.cis.upenn.edu/~xtag/swrelease.html XTAG] - tools for parsing and grammar development, including morphological analysis and tagging, as described in [http://www.cs.mu.oz.au/acl/C/C94/C94-2149.pdf XTAG System - A Wide Coverage Grammar for English] and [http://arxiv.org/abs/cmp-lg/9410024 A Freely Available Wide Coverage Morphological Analyzer for English]
 +
 +
===Proprietary software===
 +
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
  
 
*[http://nlp.postech.ac.kr/DownLoad/k_api.html Korean morphological analyzer and part-of-speech tagger]
 
*[http://nlp.postech.ac.kr/DownLoad/k_api.html Korean morphological analyzer and part-of-speech tagger]
 
*[http://www.nlplab.cn/resource/CIP/neucsp.zip NEUCSP] - a tool for Chinese Word Segmentation and POS tagging
 
*[http://www.nlplab.cn/resource/CIP/neucsp.zip NEUCSP] - a tool for Chinese Word Segmentation and POS tagging
*[http://www.cis.upenn.edu/~xtag/swrelease.html XTAG] - tools for parsing and grammar development, including morphological analysis and tagging, as described in [http://www.cs.mu.oz.au/acl/C/C94/C94-2149.pdf XTAG System - A Wide Coverage Grammar for English] and [http://arxiv.org/abs/cmp-lg/9410024 A Freely Available Wide Coverage Morphological Analyzer for English]
+
 
 +
==See also==
 +
*[[Part-of-speech tagging]]
  
 
[[Category:Morphology]]
 
[[Category:Morphology]]
 
[[Category:Software]]
 
[[Category:Software]]
 +
[[Category:Resources for English]]

Latest revision as of 06:27, 4 November 2012

Tools and Software for English - Morphology and part of speech tagging

For languages other than English, see List of resources by language.

Morphology

Free software

  • Catvar 2.0 - The Categorial Variation Database for English (OSL)
  • HFST - Helsinki Finite-State Transducer Technology - FST library, command line tools, hfst-twolc (a rule compiler for two-level rules), and several spellers and morphological analyzers (GPL)
  • FOMA - finite-state toolkit (similar to Xerox XFST), created and maintained by Måns Huldén (GPL)
  • lttoolbox -- lexical processing tools for building morphological analysers/generators with XML specification files. Includes data for English (both analysis and disambiguation). (GPL)
  • PC-KIMMO - a Two-level Processor for Morphological Analysis, including KGEN, KTEXT, and Englex
  • SFST - Stuttgart Finite State Transducer Tools (GPL)
    • Where is the data for English?
  • MULTEXT mmorph - (unmaintained) two-level morphology, package includes some data for English and German, (GPL2 or later)

Unknown license

  • MAP - Cambridge/Edinburgh Morphological Analyzer and Dictionary System (gratis download, no license information)

Proprietary software

Part of speech tagging

Free software

  • ACOPOST - A Collection Of PoS Taggers Maximum Entropy Tagger, Trigram Tagger, Transformation-based Tagger, Example-based tagger
  • Illinois LBJ POS Tagger - Uses averaged Perceptron based sequential model. Java API, Free, open source license.
  • GENiA- part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text. C++, BSD license.
  • NLTK - Natural Language Toolkit Regexp Tagger, N-Gram Tagger, Brill Tagger, HMM Tagger, plus a freely downloadable book with a chapter on tagging
  • RelEx - provides English-language part-of-speech tagging, entity tagging, as well as other types of tags (gender, date, money ...), after performing a deep parse, so that tags agree with parse. Also provides resulting stems. Apache 2.0 License.
  • Spejd - Shallow Parsing and Disambiguation Engine a GPL tool for simultaneous rule-based morphosyntactic disambiguation and partial parsing
  • Tagger training on the Apertium Wiki (HMM + constraint based)
  • VISL Constraint Grammar rule based disambiguation (GPL)
    • Is there a Free set of rules for English?

Proprietary software

Combined morphology and tagging

Free software

Proprietary software

See also