Difference between revisions of "Resources for Hindi"
(One intermediate revision by one other user not shown) | |||
Line 2: | Line 2: | ||
*[http://bit.ly/ytAT95 Hindi Computing : Tools and Techniques] | *[http://bit.ly/ytAT95 Hindi Computing : Tools and Techniques] | ||
+ | |||
+ | ==Corpora== | ||
+ | |||
+ | * [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style. | ||
==Dependency Parser== | ==Dependency Parser== | ||
* [http://sivareddy.in/downloads Download the Parser] | * [http://sivareddy.in/downloads Download the Parser] | ||
− | * [http://sivareddy.in/papers/files/hindi.dependency.parser.out. | + | * [http://sivareddy.in/papers/files/hindi.dependency.parser.out.pdf Sample output of the parser] |
==POS Tagger, Morphological Analyzer, Lemmatizer, Corpus== | ==POS Tagger, Morphological Analyzer, Lemmatizer, Corpus== | ||
Line 12: | Line 16: | ||
* [http://sivareddy.in/downloads Download the tagger] | * [http://sivareddy.in/downloads Download the tagger] | ||
− | * [http://sivareddy.in/papers/files/hindi.sample.out. | + | * [http://sivareddy.in/papers/files/hindi.sample.out.pdf Sample output of the tagger] |
The tagger and its related files are distributed under GNU GPL license. Corpus is licensed. | The tagger and its related files are distributed under GNU GPL license. Corpus is licensed. |
Latest revision as of 08:18, 30 June 2014
Hindi computing is gaining momentum very fast. thousands of Hindi sites, blogs and portals have come as a result of availability of computing tools and ease of use. Following link has a list of important tools and softwares for Hindi and Devanaagarii:
Corpora
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
Dependency Parser
POS Tagger, Morphological Analyzer, Lemmatizer, Corpus
The tagger and its related files are distributed under GNU GPL license. Corpus is licensed.
Related Publication: Siva Reddy, Serge Sharoff. Cross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using Telugu Resources. In Proceedings of IJCNLP workshop on Cross Lingual Information Access: Computational Linguistics and the Information Need of Multilingual Societies. (CLIA 2011 at IJNCLP 2011), Chiang Mai, Thailand Bibtex
Morphological analysis
Free software
- Hindi analyser for lttoolbox (~29,385 lemmata) -- GPL (by the University of Hyderabad — converted from the Anusaaraka analyser)
Machine translation
Free software
- Anusaaraka Hindi—English and others.
Shallow Parser
Keywords: Hindi, Part of Speech tagger, Lemmatizer, Morph Analyzer, Corpus