Difference between revisions of "Resources for German"
Jump to navigation
Jump to search
(→Free software: revise section title, since corpora aren't software) |
|||
(22 intermediate revisions by 9 users not shown) | |||
Line 1: | Line 1: | ||
==Corpora== | ==Corpora== | ||
+ | ===Free license=== | ||
+ | * [http://www.computing.dcu.ie/~ygraham/software.html RIA Open Source Rule Induction Tool] includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL) | ||
+ | * [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl | ||
+ | |||
+ | ===Unknown license=== | ||
<!-- Please keep this list in alphabetical order --> | <!-- Please keep this list in alphabetical order --> | ||
Line 7: | Line 12: | ||
* [http://www.wortschatz.uni-leipzig.de/ German plain text and Co-occurrences at LCC] | * [http://www.wortschatz.uni-leipzig.de/ German plain text and Co-occurrences at LCC] | ||
* [http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html NEGRA Corpus] | * [http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html NEGRA Corpus] | ||
− | * [http://www. | + | * [http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/ TIGER treebank] |
+ | * [http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml Tübingen Treebank of Written German (TüBa-D/Z)] | ||
+ | * [http://www.sfs.uni-tuebingen.de/en_tuebads.shtml Tübingen Treebank of Spoken German (TüBa-D/S, aka Verbmobil treebank)] | ||
+ | * [http://www.sfs.uni-tuebingen.de/en_tuepp.shtml Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)] | ||
+ | * [http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel) | ||
==Evaluation datasets== | ==Evaluation datasets== | ||
* [http://www.ukp.tu-darmstadt.de/data/semRelDatasets Semantic relatedness evaluation] | * [http://www.ukp.tu-darmstadt.de/data/semRelDatasets Semantic relatedness evaluation] | ||
+ | |||
+ | == Grammars == | ||
+ | * [[Generation grammars|KPML generation grammar]] | ||
+ | |||
+ | == Morphological analysis == | ||
+ | === Free software === | ||
+ | * [https://code.google.com/p/morphisto/ Morphisto], based on [[SMOR]], is an [[SFST]]-based analyser and generator for German. (The morphology is GPLv2, but the lexicon is proprietary/non-commercial: CC-BY-SA-NC v3) | ||
+ | * [http://www.danielnaber.de/morphologie/index_en.html German morphology data], based on [http://www.wolfganglezius.de/doku.php?id=cl:morphy Morhpy], licensed under CC-BY-SA 3.0 | ||
+ | |||
+ | ==Lexicons== | ||
+ | ===Free software=== | ||
+ | * [http://www-user.tu-chemnitz.de/~fri/ding/ DING] - German-English Dictionary with approximately 253,000 entries (GPL 2 or later). | ||
+ | * [http://www.openthesaurus.de/ OpenThesaurus] - German synonyms and associated terms (LGPL) | ||
+ | |||
+ | ===Proprietary/gratis=== | ||
+ | * [http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html Lexical information for German] ("The data is freely available for education, research and other '''non-commercial''' purposes.") | ||
+ | * [http://www.canoo.net/ Canoo.net] - German Dictionaries and Grammars | ||
+ | |||
+ | ===Unknown license=== | ||
+ | * [http://www.ims.uni-stuttgart.de/projekte/IMSLex/ IMSLex German Lexicon] (no license information, but only "sample" download) | ||
+ | * [http://www.cl.uzh.ch/CL/siclemat/sprachanalyse/molif/ mOlif morphological analyzer] (broken link) | ||
==Resource Access== | ==Resource Access== | ||
Line 17: | Line 47: | ||
==Timeline Analysis== | ==Timeline Analysis== | ||
* [http://wortschatz.uni-leipzig.de/wort-des-tages/ German Words of the Day] | * [http://wortschatz.uni-leipzig.de/wort-des-tages/ German Words of the Day] | ||
− | + | * [http://www.sfs.uni-tuebingen.de/~lothar/nw/ Wortwarte (selection of German neologisms for each day) ] | |
[[Category:Resources by language|German]] | [[Category:Resources by language|German]] |
Revision as of 11:00, 12 October 2013
Corpora
Free license
- RIA Open Source Rule Induction Tool includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL)
- WMT corpora, including Europarl, News Commentary, and News Crawl
Unknown license
- Bavarian Archive for Speech Signals Corpora
- COSMAS II
- Experimental Corpus Query System (University of Stuttgart, Germany)
- German plain text and Co-occurrences at LCC
- NEGRA Corpus
- TIGER treebank
- Tübingen Treebank of Written German (TüBa-D/Z)
- Tübingen Treebank of Spoken German (TüBa-D/S, aka Verbmobil treebank)
- Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)
- Le Monde Diplomatique-Die Tageszeitung Translation Corpus - French-German, aligned (parallel)
Evaluation datasets
Grammars
Morphological analysis
Free software
- Morphisto, based on SMOR, is an SFST-based analyser and generator for German. (The morphology is GPLv2, but the lexicon is proprietary/non-commercial: CC-BY-SA-NC v3)
- German morphology data, based on Morhpy, licensed under CC-BY-SA 3.0
Lexicons
Free software
- DING - German-English Dictionary with approximately 253,000 entries (GPL 2 or later).
- OpenThesaurus - German synonyms and associated terms (LGPL)
Proprietary/gratis
- Lexical information for German ("The data is freely available for education, research and other non-commercial purposes.")
- Canoo.net - German Dictionaries and Grammars
Unknown license
- IMSLex German Lexicon (no license information, but only "sample" download)
- mOlif morphological analyzer (broken link)