Resources for Persian
Corpora
Free
- VOA Persian Corpus 2003-2008 (public domain)
Proprietary
- Bijankhan corpus (gratis for research/non-commercial purposes)
- CALLFRIEND Farsi (speech), LDC
- Hamshahri corpus (gratis for research/non-commercial purposes)
- Persian speech database Farsdat, ELRA
Lexical resources
Free
- Persian - English dictionary, derived from Wikipedia article names. Retains Wikipedia's CC-BY-SA 3.0 license.
Proprietary
Machine translation
Free
- Tehran English-Persian Parallel Corpus by Mohammad Taher Pilevar, NLP Lab, University of Tehran. For research or non-commercial use.
Proprietary
- The Shiraz project (Persian -> English)
Morphology tools
Free
- Perstem - Persian stemmer, light morphological analyzer, and character set converter.
- Morphological dictionary — compiled using lttoolbox.
- BLARK by Mojgan Seraji – normaliser, tokeniser, segmentation, hunpos model for PoS-tagging and (java) dependency parser, all GPL
Parsing
Free
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- Persian dictionaries for the Link-Grammar parser. By Jon Dehdari. These require the Perstem stemming package, above.
- Uppsala Persian Dependency Treebank, Creative Commons Attribution 3.0 License
Proprietary
- Dadegan Dependency Treebank for research purposes only.
- HPSG Persian Treebank (PerTreeBank) for academic research purposes only.
Bibliography
- Dehdari, Jon, and Deryle Lonsdale. 2008. A link grammar parser for Persian. In Karimi, S., Samiian, V., and Stilo, D., editors, Aspects of Iranian Linguistics, volume 1. Cambridge Scholars Press. ISBN: 978-18-471-8639-3 (BIB)
- Feili, H. and G. Ghassem-Sani (2004) "An Application of Lexicalized Grammars in English-Persian Translation". Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), 24-27 Aug. 2004, Universidad Politecnica de Valencia, Valencia, Spain, pp. 596-600.
- Megerdoomian, K. (2000) "Unification-Based Persian Morphology". Proceedings of CICLing 2000, Alexander Gelbukh, Center of Investigation on Computation-IPN, Mexico, 2000.
- Megerdoomian, K. (2004) "Finite-State Morphological Analysis of Persian". COLING 2004 Computational Approaches to Arabic Script-based Languages. Ali Farghaly and Karine Megerdoomian editors, Geneva, Switzerland, 2004, pgs. 35-41.
- Mohammad Amin Farajian (2011). PEN: Parallel English-Persian News Corpus. Proceedings of 2011 International Conference on Artificial Intelligence (ICAI'11), Nevada, USA.
See also
External links
- https://wiki.iranianlinguistics.org/wiki/Main_Page: NLP Resources for Persian]
- the Jon safari (link parser, small lexicon, stemmer, morphological analysis tools)