Difference between revisions of "Resources for German"
Jump to navigation
Jump to search
(Added: Araneum) |
m (The link under "Lexical information for German" seems to be broken) |
||
(3 intermediate revisions by 3 users not shown) | |||
Line 3: | Line 3: | ||
* [http://www.computing.dcu.ie/~ygraham/software.html RIA Open Source Rule Induction Tool] includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL) | * [http://www.computing.dcu.ie/~ygraham/software.html RIA Open Source Rule Induction Tool] includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL) | ||
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora] | * [http://www.euromatrixplus.net/multi-un/ UN parallel corpora] | ||
− | * [http://www.statmt.org/ | + | * [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl |
===Unknown license=== | ===Unknown license=== | ||
Line 23: | Line 23: | ||
==Evaluation datasets== | ==Evaluation datasets== | ||
* [http://www.ukp.tu-darmstadt.de/data/semRelDatasets Semantic relatedness evaluation] | * [http://www.ukp.tu-darmstadt.de/data/semRelDatasets Semantic relatedness evaluation] | ||
+ | * [https://www.lt.informatik.tu-darmstadt.de/de/data/german-named-entity-recognition/ Named Entity Tagging] | ||
+ | * [https://www.ukp.tu-darmstadt.de/data/lexical-substitution/lexical-substitution-dataset-german/ Lexical Substitution] | ||
+ | * [https://www.lt.informatik.tu-darmstadt.de/de/data/open-source-acoustic-models-for-german-distant-speech-recognition/ Distant Speech recognition] | ||
== Grammars == | == Grammars == | ||
* [[Generation grammars|KPML generation grammar]] | * [[Generation grammars|KPML generation grammar]] | ||
+ | * [http://abisource.com/projects/link-grammar/ Link Grammar Parser], includes prototype German dictionaries. | ||
== Morphological analysis == | == Morphological analysis == | ||
Line 36: | Line 40: | ||
* [http://www-user.tu-chemnitz.de/~fri/ding/ DING] - German-English Dictionary with approximately 253,000 entries (GPL 2 or later). | * [http://www-user.tu-chemnitz.de/~fri/ding/ DING] - German-English Dictionary with approximately 253,000 entries (GPL 2 or later). | ||
* [http://www.openthesaurus.de/ OpenThesaurus] - German synonyms and associated terms (LGPL) | * [http://www.openthesaurus.de/ OpenThesaurus] - German synonyms and associated terms (LGPL) | ||
+ | * [https://github.com/tudarmstadt-lt/GermaNER] - German Named Entity Tagger, mixed LGPL/ASL2.0, free for commercial and academic use | ||
+ | * [https://www.lt.informatik.tu-darmstadt.de/de/software/dependency-collapsing/] Dependency Collapser/propagator to produce Stanford Colla[sed Dependency-style annotations on top of dependency parser output | ||
===Proprietary/gratis=== | ===Proprietary/gratis=== | ||
− | * [http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html Lexical information for German] ("The data is freely available for education, research and other '''non-commercial''' purposes.") | + | * [http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html Lexical information for German] ("The data is freely available for education, research and other '''non-commercial''' purposes.") (broken link) |
* [http://www.canoo.net/ Canoo.net] - German Dictionaries and Grammars | * [http://www.canoo.net/ Canoo.net] - German Dictionaries and Grammars | ||
Latest revision as of 01:10, 26 August 2016
Corpora
Free license
- RIA Open Source Rule Induction Tool includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL)
- UN parallel corpora
- WMT corpora, including Europarl, News Commentary, and News Crawl
Unknown license
- Araneum Germanicum, Gigaword German web corpus
- Bavarian Archive for Speech Signals Corpora
- COSMAS II
- Experimental Corpus Query System (University of Stuttgart, Germany)
- German plain text and Co-occurrences at LCC
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
- NEGRA Corpus
- TIGER treebank
- Tübingen Treebank of Written German (TüBa-D/Z)
- Tübingen Treebank of Spoken German (TüBa-D/S, aka Verbmobil treebank)
- Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)
- Le Monde Diplomatique-Die Tageszeitung Translation Corpus - French-German, aligned (parallel)
Evaluation datasets
- Semantic relatedness evaluation
- Named Entity Tagging
- Lexical Substitution
- Distant Speech recognition
Grammars
- KPML generation grammar
- Link Grammar Parser, includes prototype German dictionaries.
Morphological analysis
Free software
- Morphisto, based on SMOR, is an SFST-based analyser and generator for German. (The morphology is GPLv2, but the lexicon is proprietary/non-commercial: CC-BY-SA-NC v3)
- German morphology data, based on Morhpy, licensed under CC-BY-SA 3.0
Lexicons
Free software
- DING - German-English Dictionary with approximately 253,000 entries (GPL 2 or later).
- OpenThesaurus - German synonyms and associated terms (LGPL)
- [1] - German Named Entity Tagger, mixed LGPL/ASL2.0, free for commercial and academic use
- [2] Dependency Collapser/propagator to produce Stanford Colla[sed Dependency-style annotations on top of dependency parser output
Proprietary/gratis
- Lexical information for German ("The data is freely available for education, research and other non-commercial purposes.") (broken link)
- Canoo.net - German Dictionaries and Grammars
Unknown license
- IMSLex German Lexicon (no license information, but only "sample" download)
- mOlif morphological analyzer (broken link)