Resources for Indonesian
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Corpora
- Kompas and Tempo Online Collection for evaluation purposes.
- 500,000 Word Bahasa Indonesia Corpus and Parallel English Translation (A-NC-SA 3.0 licence)
- 500,000 Word Bahasa Indonesia Parallel Corpus with Penn Treebank (A-NC-SA 3.0 licence)
- One Million POS Tagged Corpus of Bahasa Indonesia (A-NC-SA 3.0 licence)
Tools
- Part of Speech Tagger for Bahasa Indonesia (GPL licence)
- Rule-based Indonesian-Malay Machine Translation by Septina Dian Larasati. Possible to use for morphological tagging.
- Link Grammar Parser, includes prototype Indonesian dictionaries.
Grammars
- Broad-coverage Indonesian Resource Grammar (INDRA) based on HPSG, using the DELPH-IN infrastructure.
Lexicons
- Wordnet Bahasa Semantic lexicon for Indonesian and Malay, linked to the Open Multilingual Wordnet.