Difference between revisions of "Resources for Arabic"

From ACL Wiki
Jump to navigation Jump to search
m (Reverted edits by Creek (talk) to last revision by Makrai)
(4 intermediate revisions by 3 users not shown)
Line 17: Line 17:
 
==Corpora==
 
==Corpora==
 
===Proprietary===
 
===Proprietary===
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
+
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1], 76 million tokens, annotation: paragraphs
  
 
===Free/open licence===
 
===Free/open licence===
 
* [http://github.com/anastaw/Meedan-Memory Meedan-Memory], Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, [http://www.opendatacommons.org/licenses/odbl/ Open Database Licence]
 
* [http://github.com/anastaw/Meedan-Memory Meedan-Memory], Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, [http://www.opendatacommons.org/licenses/odbl/ Open Database Licence]
 
* [http://quran.uk.net/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned.
 
* [http://quran.uk.net/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned.
 +
* [http://www1.ccls.columbia.edu/~ybenajiba/downloads.html Arabic NER corpora] by [http://www1.ccls.columbia.edu/~ybenajiba/ Yassine Benajiba], 150,000+ words.
  
 
==Bibliography==
 
==Bibliography==

Revision as of 05:17, 25 June 2012

Morphology

Free software

  • AraMorph - Perl - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
  • AraMorph - Java - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for Lucene

Proprietary

Parsers

Free software

Corpora

Proprietary

Free/open licence

Bibliography

External links