Difference between revisions of "Resources for German"

From ACL Wiki
Jump to navigation Jump to search
(→‎Free software: revise section title, since corpora aren't software)
m (The link under "Lexical information for German" seems to be broken)
 
(6 intermediate revisions by 5 users not shown)
Line 2: Line 2:
 
===Free license===
 
===Free license===
 
* [http://www.computing.dcu.ie/~ygraham/software.html RIA Open Source Rule Induction Tool] includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL)
 
* [http://www.computing.dcu.ie/~ygraham/software.html RIA Open Source Rule Induction Tool] includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL)
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl
+
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 +
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl
  
 
===Unknown license===
 
===Unknown license===
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
  
 +
* [http://ucts.uniba.sk/aranea_about/ Araneum Germanicum], Gigaword German web corpus
 
* [http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora]
 
* [http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora]
 
* [http://corpora.ids-mannheim.de/~cosmas/ COSMAS II]
 
* [http://corpora.ids-mannheim.de/~cosmas/ COSMAS II]
 
* [http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]
 
* [http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]
 
* [http://www.wortschatz.uni-leipzig.de/ German plain text and Co-occurrences at LCC]
 
* [http://www.wortschatz.uni-leipzig.de/ German plain text and Co-occurrences at LCC]
 +
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
* [http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html NEGRA Corpus]
 
* [http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html NEGRA Corpus]
 
* [http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/ TIGER treebank]
 
* [http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/ TIGER treebank]
Line 20: Line 23:
 
==Evaluation datasets==
 
==Evaluation datasets==
 
* [http://www.ukp.tu-darmstadt.de/data/semRelDatasets Semantic relatedness evaluation]
 
* [http://www.ukp.tu-darmstadt.de/data/semRelDatasets Semantic relatedness evaluation]
 +
* [https://www.lt.informatik.tu-darmstadt.de/de/data/german-named-entity-recognition/ Named Entity Tagging]
 +
* [https://www.ukp.tu-darmstadt.de/data/lexical-substitution/lexical-substitution-dataset-german/ Lexical Substitution]
 +
* [https://www.lt.informatik.tu-darmstadt.de/de/data/open-source-acoustic-models-for-german-distant-speech-recognition/ Distant Speech recognition]
  
 
== Grammars ==
 
== Grammars ==
 
* [[Generation grammars|KPML generation grammar]]
 
* [[Generation grammars|KPML generation grammar]]
 +
* [http://abisource.com/projects/link-grammar/ Link Grammar Parser], includes prototype German dictionaries.
  
 
== Morphological analysis ==
 
== Morphological analysis ==
Line 33: Line 40:
 
* [http://www-user.tu-chemnitz.de/~fri/ding/ DING] - German-English Dictionary with approximately 253,000 entries (GPL 2 or later).
 
* [http://www-user.tu-chemnitz.de/~fri/ding/ DING] - German-English Dictionary with approximately 253,000 entries (GPL 2 or later).
 
* [http://www.openthesaurus.de/ OpenThesaurus] - German synonyms and associated terms (LGPL)
 
* [http://www.openthesaurus.de/ OpenThesaurus] - German synonyms and associated terms (LGPL)
 +
* [https://github.com/tudarmstadt-lt/GermaNER] - German Named Entity Tagger, mixed LGPL/ASL2.0, free for commercial and academic use
 +
* [https://www.lt.informatik.tu-darmstadt.de/de/software/dependency-collapsing/] Dependency Collapser/propagator to produce Stanford Colla[sed Dependency-style annotations on top of dependency parser output
  
 
===Proprietary/gratis===
 
===Proprietary/gratis===
* [http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html Lexical information for German] ("The data is freely available for education, research and other '''non-commercial''' purposes.")
+
* [http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html Lexical information for German] ("The data is freely available for education, research and other '''non-commercial''' purposes.") (broken link)
 
* [http://www.canoo.net/ Canoo.net] - German Dictionaries and Grammars
 
* [http://www.canoo.net/ Canoo.net] - German Dictionaries and Grammars
  

Latest revision as of 01:10, 26 August 2016

Corpora

Free license

Unknown license

Evaluation datasets

Grammars

Morphological analysis

Free software

  • Morphisto, based on SMOR, is an SFST-based analyser and generator for German. (The morphology is GPLv2, but the lexicon is proprietary/non-commercial: CC-BY-SA-NC v3)
  • German morphology data, based on Morhpy, licensed under CC-BY-SA 3.0

Lexicons

Free software

  • DING - German-English Dictionary with approximately 253,000 entries (GPL 2 or later).
  • OpenThesaurus - German synonyms and associated terms (LGPL)
  • [1] - German Named Entity Tagger, mixed LGPL/ASL2.0, free for commercial and academic use
  • [2] Dependency Collapser/propagator to produce Stanford Colla[sed Dependency-style annotations on top of dependency parser output

Proprietary/gratis

Unknown license

Resource Access

Timeline Analysis