Resources for Chinese
Revision as of 00:19, 20 February 2014 by Bond (talk | contribs) (→Nonfree or Unknown license: added link to Lancaster)
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Tools
Free software
- rseg word segmentation; written in ruby (no compilation, no hard dependencies apart from ruby), comes with a model (MIT license)
- ctbparser word segmentation, POS tagging, NER, dependency parsing, all using Conditional Random Fields; written in C++ (LGPL license)
- ZPar word segmentation, POS tagging, CFG/dep/CCG parsing of Chinese and English; written in C++ (GPL3 license)
- DuDuPlus: a graph-based dependency parser for English and Chinese ("Other Open Source" license?)
- where is the source code?
Corpora
Free license
- HC Corpora 1606811 lines of Fair Use excerpts from news, blogs, twitter
- UN parallel corpora
Nonfree or Unknown license
- Chinese Computing
- Word Segmented and POS tagged People Daily Corpus at ICL of Peking University
- Frequency list of characters in the Internet corpus
- Frequency list of lexical items in the Internet corpus
- Lancaster Corpus of Mandarin Chinese
- A collection of Chinese corpora and frequency lists Online query with three corpora