Difference between revisions of "Resources for Arabic"
Jump to navigation
Jump to search
(HamleDT) |
|||
Line 7: | Line 7: | ||
===Proprietary=== | ===Proprietary=== | ||
*[http://www.arabic-morphology.com Xerox Arabic Morphological Analyzer and Generator] | *[http://www.arabic-morphology.com Xerox Arabic Morphological Analyzer and Generator] | ||
+ | |||
+ | ==WordNets== | ||
+ | |||
+ | ===Free software=== | ||
+ | * http://compling.hss.ntu.edu.sg/omw/ Hebrew Wordnet with links to all the other Open Multilingual Wordnets | ||
+ | |||
+ | ===Proprietary=== | ||
+ | * http://babelnet.org/ (available for download for "Non-Commercial" use) | ||
==Parsers== | ==Parsers== |
Revision as of 02:21, 20 April 2015
Morphology
Free software
- AraMorph - Perl - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
- AraMorph - Java - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for Lucene
Proprietary
WordNets
Free software
- http://compling.hss.ntu.edu.sg/omw/ Hebrew Wordnet with links to all the other Open Multilingual Wordnets
Proprietary
- http://babelnet.org/ (available for download for "Non-Commercial" use)
Parsers
Free software
- Bikel's implementation of Collins Parser by Dan Bikel.
- Arabic dictionaries, by Jon Dehdari, for the Link-Grammar parser. These require the Aramorph stemming package, above.
- ElixirFM (online interface here) is a Functional Arabic Morphology written in Haskell and Perl; the lexicon is a "re-processed" version of the Buckwalter analyser.
- Sarf - Arabic Morphology System (all in Java)
Corpora
Proprietary
- Arabic Newswire Part 1, 76 million tokens, annotation: paragraphs
Free/open licence
- Meedan-Memory, Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, Open Database Licence
- Quranic Arabic Corpus, 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; dependency grammar annotation is planned.
- Arabic NER corpora by Yassine Benajiba, 150,000+ words.
- UN parallel corpora
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.