Difference between revisions of "Resources for Croatian"

From ACL Wiki
Jump to navigation Jump to search
Line 15: Line 15:
  
 
==Lexicons==
 
==Lexicons==
 +
===Free===
 +
 +
===Proprietary===
  
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->

Revision as of 13:07, 25 March 2010

General

Corpora

  • Croatian National Corpus - 101.2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hybrid tagger CroTag and lemmatiser. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb since 1998.
  • Croatian Language Corpus (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for online search)

Free

  • Southeast European Times (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish — approximately 4.5 million words per language)

Lexicons

Free

Proprietary

  • Croatian Morphological Lexicon - Croatian inflectional lexicon comprising more than 110,000 lemmas yielding more than 3.8 mln word-forms; freely searchable. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb.