Difference between revisions of "Resources for Chinese"
Jump to navigation
Jump to search
(+MultiUN corpora) |
|||
Line 7: | Line 7: | ||
** where is the source code? | ** where is the source code? | ||
− | == | + | ==Corpora== |
− | ===Free | + | ===Free license=== |
* [http://corpora.heliohost.org/ HC Corpora] 1606811 lines of [http://en.wikipedia.org/wiki/Fair_use Fair Use] excerpts from news, blogs, twitter | * [http://corpora.heliohost.org/ HC Corpora] 1606811 lines of [http://en.wikipedia.org/wiki/Fair_use Fair Use] excerpts from news, blogs, twitter | ||
+ | * [http://www.euromatrixplus.net/multi-un/ UN parallel corpora] | ||
− | ===Unknown license=== | + | ===Nonfree or Unknown license=== |
* [http://www.chinesecomputing.com Chinese Computing] | * [http://www.chinesecomputing.com Chinese Computing] | ||
* [http://www.icl.pku.edu.cn/icl_groups/corpus/dwldform1.asp Word Segmented and POS tagged People Daily Corpus at ICL of Peking University] | * [http://www.icl.pku.edu.cn/icl_groups/corpus/dwldform1.asp Word Segmented and POS tagged People Daily Corpus at ICL of Peking University] |
Revision as of 14:46, 10 December 2013
Tools
Free software
- rseg word segmentation; written in ruby (no compilation, no hard dependencies apart from ruby), comes with a model (MIT license)
- ctbparser word segmentation, POS tagging, NER, dependency parsing, all using Conditional Random Fields; written in C++ (LGPL license)
- ZPar word segmentation, POS tagging, CFG/dep/CCG parsing of Chinese and English; written in C++ (GPL3 license)
- DuDuPlus: a graph-based dependency parser for English and Chinese ("Other Open Source" license?)
- where is the source code?
Corpora
Free license
- HC Corpora 1606811 lines of Fair Use excerpts from news, blogs, twitter
- UN parallel corpora