Difference between revisions of "Resources for Dutch"
Jump to navigation
Jump to search
Verhoevenben (talk | contribs) |
(HamleDT) |
||
Line 3: | Line 3: | ||
* [http://www.statmt.org/europarl Europarl corpus] - sentence-aligned with English | * [http://www.statmt.org/europarl Europarl corpus] - sentence-aligned with English | ||
* [http://www.clips.uantwerpen.be/datasets/csi-corpus CLiPS Stylometry Investigation (CSI) corpus] - multi-purpose text corpus, main use in stylometry | * [http://www.clips.uantwerpen.be/datasets/csi-corpus CLiPS Stylometry Investigation (CSI) corpus] - multi-purpose text corpus, main use in stylometry | ||
+ | * [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style. | ||
== Tools == | == Tools == |
Revision as of 08:39, 26 May 2014
Corpora
- Dutch Plain text and Co-occurrences at LCC
- Europarl corpus - sentence-aligned with English
- CLiPS Stylometry Investigation (CSI) corpus - multi-purpose text corpus, main use in stylometry
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
Tools
- Dutch HPSG-based parser Includes the Alpino treebank (7137 sentences, newspaper, manually corrected)