Resources for German
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Corpora
Free license
- RIA Open Source Rule Induction Tool includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL)
- UN parallel corpora
- WMT corpora, including Europarl, News Commentary, and News Crawl
Unknown license
- Bavarian Archive for Speech Signals Corpora
- COSMAS II
- Experimental Corpus Query System (University of Stuttgart, Germany)
- German plain text and Co-occurrences at LCC
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- NEGRA Corpus
- TIGER treebank
- Tübingen Treebank of Written German (TüBa-D/Z)
- Tübingen Treebank of Spoken German (TüBa-D/S, aka Verbmobil treebank)
- Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)
- Le Monde Diplomatique-Die Tageszeitung Translation Corpus - French-German, aligned (parallel)
Evaluation datasets
Grammars
Morphological analysis
Free software
- Morphisto, based on SMOR, is an SFST-based analyser and generator for German. (The morphology is GPLv2, but the lexicon is proprietary/non-commercial: CC-BY-SA-NC v3)
- German morphology data, based on Morhpy, licensed under CC-BY-SA 3.0
Lexicons
Free software
- DING - German-English Dictionary with approximately 253,000 entries (GPL 2 or later).
- OpenThesaurus - German synonyms and associated terms (LGPL)
Proprietary/gratis
- Lexical information for German ("The data is freely available for education, research and other non-commercial purposes.")
- Canoo.net - German Dictionaries and Grammars
Unknown license
- IMSLex German Lexicon (no license information, but only "sample" download)
- mOlif morphological analyzer (broken link)