Resources for Persian
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Corpora
Free
- VOA Persian Corpus 2003-2008 (public domain)
- Orwell's 1984 Corpus in MULTEXT-EAST (public domain)
Proprietary
- Bijankhan corpus (gratis for research/non-commercial purposes)
- CALLFRIEND Farsi (speech), LDC
- Hamshahri corpus (gratis for research/non-commercial purposes)
- Persian speech database Farsdat, ELRA
Online Concordance Tools
- Orwell's 1984 Corpus (public domain)
Lexical resources
Free
- Persian - English dictionary, derived from Wikipedia article names. Retains Wikipedia's CC-BY-SA 3.0 license.
Proprietary
Machine translation
Free
- Tehran English-Persian Parallel Corpus by Mohammad Taher Pilevar, NLP Lab, University of Tehran. For research or non-commercial use.
Proprietary
- The Shiraz project (Persian -> English)
Morphology tools
Free
- Perstem - Persian stemmer, light morphological analyzer, and character set converter.
- Morphological dictionary — compiled using lttoolbox.
- BLARK by Mojgan Seraji – normaliser, tokeniser, segmentation, hunpos model for PoS-tagging and (java) dependency parser, all GPL
Parsing
Free
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- Persian dictionaries for the Link-Grammar parser. By Jon Dehdari. These require the Perstem stemming package, above.
- Uppsala Persian Dependency Treebank, Creative Commons Attribution 3.0 License
Proprietary
- Dadegan Dependency Treebank for research purposes only.
- HPSG Persian Treebank (PerTreeBank) for academic research purposes only.
Bibliography
- Dehdari, Jon, and Deryle Lonsdale. 2008. A link grammar parser for Persian. In Karimi, S., Samiian, V., and Stilo, D., editors, Aspects of Iranian Linguistics, volume 1. Cambridge Scholars Press. ISBN: 978-18-471-8639-3 (BIB)
- QasemiZadeh, Behrang and Rahimi Saeed. Persian in MULTEXT-East Framework, FinTAL, 2006, pp 541-551 ([1]).
- Feili, H. and G. Ghassem-Sani (2004) "An Application of Lexicalized Grammars in English-Persian Translation". Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), 24-27 Aug. 2004, Universidad Politecnica de Valencia, Valencia, Spain, pp. 596-600.
- Megerdoomian, K. (2000) "Unification-Based Persian Morphology". Proceedings of CICLing 2000, Alexander Gelbukh, Center of Investigation on Computation-IPN, Mexico, 2000.
- Megerdoomian, K. (2004) "Finite-State Morphological Analysis of Persian". COLING 2004 Computational Approaches to Arabic Script-based Languages. Ali Farghaly and Karine Megerdoomian editors, Geneva, Switzerland, 2004, pgs. 35-41.
- Mohammad Amin Farajian (2011). PEN: Parallel English-Persian News Corpus. Proceedings of 2011 International Conference on Artificial Intelligence (ICAI'11), Nevada, USA.
See also
External links
- https://wiki.iranianlinguistics.org/wiki/Main_Page: NLP Resources for Persian]
- the Jon safari (link parser, small lexicon, stemmer, morphological analysis tools)