Difference between revisions of "Resources for Arabic"
Jump to navigation
Jump to search
(→Free/open licence: +MultiUN) |
|||
(One intermediate revision by one other user not shown) | |||
Line 23: | Line 23: | ||
* [http://quran.uk.net/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned. | * [http://quran.uk.net/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned. | ||
* [http://www1.ccls.columbia.edu/~ybenajiba/downloads.html Arabic NER corpora] by [http://www1.ccls.columbia.edu/~ybenajiba/ Yassine Benajiba], 150,000+ words. | * [http://www1.ccls.columbia.edu/~ybenajiba/downloads.html Arabic NER corpora] by [http://www1.ccls.columbia.edu/~ybenajiba/ Yassine Benajiba], 150,000+ words. | ||
+ | * [http://www.euromatrixplus.net/multi-un/ UN parallel corpora] | ||
==Bibliography== | ==Bibliography== | ||
Line 30: | Line 31: | ||
*[http://www1.cs.columbia.edu/~mdiab/software/ASVMTools_2.0.tar.gz Basic Arabic Processing Tools] | *[http://www1.cs.columbia.edu/~mdiab/software/ASVMTools_2.0.tar.gz Basic Arabic Processing Tools] | ||
*[http://acl.ldc.upenn.edu/coling2004/W5/index.html COLING 2004 Workshop on computational approaches to Arabic script-based languages] | *[http://acl.ldc.upenn.edu/coling2004/W5/index.html COLING 2004 Workshop on computational approaches to Arabic script-based languages] | ||
− | + | ||
[[Category:Resources by language|Arabic]] | [[Category:Resources by language|Arabic]] |
Revision as of 15:42, 10 December 2013
Morphology
Free software
- AraMorph - Perl - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
- AraMorph - Java - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for Lucene
Proprietary
Parsers
Free software
- Bikel's implementation of Collins Parser by Dan Bikel.
- Arabic dictionaries, by Jon Dehdari, for the Link-Grammar parser. These require the Aramorph stemming package, above.
- ElixirFM (online interface here) is a Functional Arabic Morphology written in Haskell and Perl; the lexicon is a "re-processed" version of the Buckwalter analyser.
- Sarf - Arabic Morphology System (all in Java)
Corpora
Proprietary
- Arabic Newswire Part 1, 76 million tokens, annotation: paragraphs
Free/open licence
- Meedan-Memory, Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, Open Database Licence
- Quranic Arabic Corpus, 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; dependency grammar annotation is planned.
- Arabic NER corpora by Yassine Benajiba, 150,000+ words.
- UN parallel corpora