Resources for Indonesian
Jump to navigation
Jump to search
Corpora
- Kompas and Tempo Online Collection for evaluation purposes.
- 500,000 Word Bahasa Indonesia Corpus and Parallel English Translation (A-NC-SA 3.0 licence)
- 500,000 Word Bahasa Indonesia Parallel Corpus with Penn Treebank (A-NC-SA 3.0 licence)
- One Million POS Tagged Corpus of Bahasa Indonesia (A-NC-SA 3.0 licence)
Tools
- Part of Speech Tagger for Bahasa Indonesia (GPL licence)
- Rule-based Indonesian-Malay Machine Translation by Septina Dian Larasati. Possible to use for morphological tagging.
- Link Grammar Parser, includes prototype Indonesian dictionaries.
Grammars
- Broad-coverage Indonesian Resource Grammar (INDRA) based on HPSG, using the DELPH-IN infrastructure.
Lexicons
- Wordnet Bahasa Semantic lexicon for Indonesian and Malay, linked to the Open Multilingual Wordnet.