Difference between revisions of "Resources for Arabic"

From ACL Wiki
Jump to navigation Jump to search
 
Line 28: Line 28:
 
===Proprietary===
 
===Proprietary===
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1], 76 million tokens, annotation: paragraphs
 
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1], 76 million tokens, annotation: paragraphs
 +
 +
==Diacritization==
 +
===Free software===
 +
*[https://github.com/mikahama/haracat hAraCat] a free tool for predicting vowels and other diacritics.
  
 
===Free/open licence===
 
===Free/open licence===

Latest revision as of 04:36, 29 June 2020

Morphology

Free software

  • AraMorph - Perl - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)
  • AraMorph - Java - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for Lucene
  • AraComLex - An open source finite state morphology for Modern Standard Arabic. The source files can be compiled by the open source compiler, foma, or Xerox xfst.
  • UralicNLP is a Python library that provides morphological tagging, generation, lemmatization and disambiguation in many languages including Arabic

Proprietary

WordNets

Free software

Proprietary

Parsers

Free software

Corpora

Proprietary

Diacritization

Free software

  • hAraCat a free tool for predicting vowels and other diacritics.

Free/open licence

Bibliography

External links