Resources for Croatian
From ACL Wiki
Revision as of 14:04, 9 April 2009 by Mtadic
- IHJJ - Institute of Croatian Language and Linguistics
- Croatian Language Technologies Portal - exhaustive lists of corpora, dictionaries, tools, associations, institutions and projects in LT. Developed in the Institute of Linguistics, Facutly of Humanities and Social Sciences, University of Zagreb.
- Croatian National Corpus - 101.2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hybrid tagger CroTag and lemmatiser. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb since 1998.
- Croatian Language Corpus (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for online search)
- Southeast European Times (paragraph aligned corpus, Albanian, Bosnian, Bulgarian, Croatian, English, Greek, Macedonian, Romanian, Serbian, Turkish — 9,678 paragraphs, 92,450— 122,912 words per language)
- Croatian Morphological Lexicon - Croatian inflectional lexicon comprising more than 110,000 lemmas yielding more than 3.8 mln word-forms; freely searchable. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb.