Difference between revisions of "Resources for Croatian"

From ACL Wiki
Jump to navigation Jump to search
m (Reverted edits by Creek (talk) to last revision by Mtadic)
 
(5 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
 
* [http://www.ihjj.hr/index_en.html IHJJ - Institute of Croatian Language and Linguistics]
 
* [http://www.ihjj.hr/index_en.html IHJJ - Institute of Croatian Language and Linguistics]
* [http://jthj.ffzg.hr/default_english.htm UoZ - Institute of linguistics - Croatian Portal]
+
* [http://jthj.ffzg.hr/default_english.htm Croatian Language Technologies Portal] - exhaustive lists of corpora, dictionaries, tools, associations, institutions and projects in LT. Developed in the Institute of Linguistics, Facutly of Humanities and Social Sciences, University of Zagreb.
  
 
==Corpora==
 
==Corpora==
 
+
* [http://hnk.ffzg.hr/ Croatian National Corpus] - 101.2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hybrid tagger CroTag and lemmatiser. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb since 1998.
 
* [http://riznica.ihjj.hr/en/ Croatian Language Corpus] (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for online search)
 
* [http://riznica.ihjj.hr/en/ Croatian Language Corpus] (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for online search)
  
Line 12: Line 12:
  
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
* [http://xixona.dlsi.ua.es/~fran/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish &mdash; 9,678 paragraphs, 92,450&mdash; 122,912 words per language)
+
* [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bosnian, Bulgarian, Croatian, English, Greek, Macedonian, Romanian, Serbian, Turkish &mdash; approximately 4.5 million words per language)
  
 
==Lexicons==
 
==Lexicons==
 +
===Free===
 +
 +
===Proprietary===
  
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
* [http://hml.ffzg.hr Croatian Morphological Lexicon] - University of Zagreb
+
* [http://hml.ffzg.hr Croatian Morphological Lexicon] - Croatian inflectional lexicon comprising more than 110,000 lemmas yielding more than 3.8 mln word-forms; freely searchable. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb.
  
  
 
[[Category:Resources by language|Croatian]]
 
[[Category:Resources by language|Croatian]]

Latest revision as of 05:17, 25 June 2012

General

Corpora

  • Croatian National Corpus - 101.2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hybrid tagger CroTag and lemmatiser. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb since 1998.
  • Croatian Language Corpus (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for online search)

Free

  • Southeast European Times (sentence aligned corpus, Albanian, Bosnian, Bulgarian, Croatian, English, Greek, Macedonian, Romanian, Serbian, Turkish — approximately 4.5 million words per language)

Lexicons

Free

Proprietary

  • Croatian Morphological Lexicon - Croatian inflectional lexicon comprising more than 110,000 lemmas yielding more than 3.8 mln word-forms; freely searchable. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb.