Difference between revisions of "Resources for Arabic"

From ACL Wiki
Jump to navigation Jump to search
(8 intermediate revisions by 7 users not shown)
Line 4: Line 4:
 
*[https://sourceforge.net/projects/aramorph/ AraMorph - Perl] - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
 
*[https://sourceforge.net/projects/aramorph/ AraMorph - Perl] - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
 
*[http://www.nongnu.org/aramorph/ AraMorph - Java] - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for [http://lucene.apache.org/ Lucene]
 
*[http://www.nongnu.org/aramorph/ AraMorph - Java] - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for [http://lucene.apache.org/ Lucene]
 +
*[http://sourceforge.net/projects/aracomlex/ AraComLex] - An open source finite state morphology for Modern Standard Arabic. The source files can be compiled by the open source compiler, foma, or Xerox xfst.
  
 
===Proprietary===
 
===Proprietary===
 
*[http://www.arabic-morphology.com Xerox Arabic Morphological Analyzer and Generator]
 
*[http://www.arabic-morphology.com Xerox Arabic Morphological Analyzer and Generator]
 +
 +
==WordNets==
 +
 +
===Free software===
 +
* http://compling.hss.ntu.edu.sg/omw/ Hebrew Wordnet with links to all the other Open Multilingual Wordnets
 +
 +
===Proprietary===
 +
* http://babelnet.org/ (available for download for "Non-Commercial" use)
  
 
==Parsers==
 
==Parsers==
Line 17: Line 26:
 
==Corpora==
 
==Corpora==
 
===Proprietary===
 
===Proprietary===
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
+
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1], 76 million tokens, annotation: paragraphs
  
 
===Free/open licence===
 
===Free/open licence===
 
* [http://github.com/anastaw/Meedan-Memory Meedan-Memory], Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, [http://www.opendatacommons.org/licenses/odbl/ Open Database Licence]
 
* [http://github.com/anastaw/Meedan-Memory Meedan-Memory], Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, [http://www.opendatacommons.org/licenses/odbl/ Open Database Licence]
 
* [http://quran.uk.net/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned.
 
* [http://quran.uk.net/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned.
 +
* [http://www1.ccls.columbia.edu/~ybenajiba/downloads.html Arabic NER corpora] by [http://www1.ccls.columbia.edu/~ybenajiba/ Yassine Benajiba], 150,000+ words.
 +
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 +
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
  
 
==Bibliography==
 
==Bibliography==

Revision as of 13:11, 9 January 2016

Morphology

Free software

  • AraMorph - Perl - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
  • AraMorph - Java - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for Lucene
  • AraComLex - An open source finite state morphology for Modern Standard Arabic. The source files can be compiled by the open source compiler, foma, or Xerox xfst.

Proprietary

WordNets

Free software

Proprietary

Parsers

Free software

Corpora

Proprietary

Free/open licence

Bibliography

External links