Difference between revisions of "Resources for Croatian"
Jump to navigation
Jump to search
(5 intermediate revisions by 3 users not shown) | |||
Line 3: | Line 3: | ||
<!-- Please keep this list in alphabetical order --> | <!-- Please keep this list in alphabetical order --> | ||
* [http://www.ihjj.hr/index_en.html IHJJ - Institute of Croatian Language and Linguistics] | * [http://www.ihjj.hr/index_en.html IHJJ - Institute of Croatian Language and Linguistics] | ||
− | * [http://jthj.ffzg.hr/default_english.htm | + | * [http://jthj.ffzg.hr/default_english.htm Croatian Language Technologies Portal] - exhaustive lists of corpora, dictionaries, tools, associations, institutions and projects in LT. Developed in the Institute of Linguistics, Facutly of Humanities and Social Sciences, University of Zagreb. |
==Corpora== | ==Corpora== | ||
− | + | * [http://hnk.ffzg.hr/ Croatian National Corpus] - 101.2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hybrid tagger CroTag and lemmatiser. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb since 1998. | |
* [http://riznica.ihjj.hr/en/ Croatian Language Corpus] (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for online search) | * [http://riznica.ihjj.hr/en/ Croatian Language Corpus] (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for online search) | ||
Line 12: | Line 12: | ||
<!-- Please keep this list in alphabetical order --> | <!-- Please keep this list in alphabetical order --> | ||
− | * [http:// | + | * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bosnian, Bulgarian, Croatian, English, Greek, Macedonian, Romanian, Serbian, Turkish — approximately 4.5 million words per language) |
==Lexicons== | ==Lexicons== | ||
+ | ===Free=== | ||
+ | |||
+ | ===Proprietary=== | ||
<!-- Please keep this list in alphabetical order --> | <!-- Please keep this list in alphabetical order --> | ||
− | * [http://hml.ffzg.hr Croatian Morphological Lexicon] - | + | * [http://hml.ffzg.hr Croatian Morphological Lexicon] - Croatian inflectional lexicon comprising more than 110,000 lemmas yielding more than 3.8 mln word-forms; freely searchable. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb. |
[[Category:Resources by language|Croatian]] | [[Category:Resources by language|Croatian]] |
Latest revision as of 04:17, 25 June 2012
General
- IHJJ - Institute of Croatian Language and Linguistics
- Croatian Language Technologies Portal - exhaustive lists of corpora, dictionaries, tools, associations, institutions and projects in LT. Developed in the Institute of Linguistics, Facutly of Humanities and Social Sciences, University of Zagreb.
Corpora
- Croatian National Corpus - 101.2 mil. tokens synchronic (text from 1990 on), standard Croatian reference corpus; lemmatised and MSD-tagged with the Croatian MultText East tagset using hybrid tagger CroTag and lemmatiser. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb since 1998.
- Croatian Language Corpus (continuously growing (currently approx. 100 mil. tokens) corpus of Croatian covering various genres and time periods, using Philologic for online search)
Free
- Southeast European Times (sentence aligned corpus, Albanian, Bosnian, Bulgarian, Croatian, English, Greek, Macedonian, Romanian, Serbian, Turkish — approximately 4.5 million words per language)
Lexicons
Free
Proprietary
- Croatian Morphological Lexicon - Croatian inflectional lexicon comprising more than 110,000 lemmas yielding more than 3.8 mln word-forms; freely searchable. Developed at the Institute of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb.