Difference between revisions of "Resources for Chinese"
Jump to navigation
Jump to search
(+MultiUN corpora) |
(→Nonfree or Unknown license: added link to Lancaster) |
||
Line 18: | Line 18: | ||
* [http://corpus.leeds.ac.uk/frqc/internet-zh.num Frequency list of lexical items in the Internet corpus] | * [http://corpus.leeds.ac.uk/frqc/internet-zh.num Frequency list of lexical items in the Internet corpus] | ||
* [http://www.ling.lancs.ac.uk/corplang/lcmc/ Lancaster Corpus of Mandarin Chinese] | * [http://www.ling.lancs.ac.uk/corplang/lcmc/ Lancaster Corpus of Mandarin Chinese] | ||
− | + | * [http://corpus.leeds.ac.uk/query-zh.html A collection of Chinese corpora and frequency lists] Online query with three corpora | |
[[Category:Resources by language|Chinese]] | [[Category:Resources by language|Chinese]] |
Revision as of 23:19, 19 February 2014
Tools
Free software
- rseg word segmentation; written in ruby (no compilation, no hard dependencies apart from ruby), comes with a model (MIT license)
- ctbparser word segmentation, POS tagging, NER, dependency parsing, all using Conditional Random Fields; written in C++ (LGPL license)
- ZPar word segmentation, POS tagging, CFG/dep/CCG parsing of Chinese and English; written in C++ (GPL3 license)
- DuDuPlus: a graph-based dependency parser for English and Chinese ("Other Open Source" license?)
- where is the source code?
Corpora
Free license
- HC Corpora 1606811 lines of Fair Use excerpts from news, blogs, twitter
- UN parallel corpora
Nonfree or Unknown license
- Chinese Computing
- Word Segmented and POS tagged People Daily Corpus at ICL of Peking University
- Frequency list of characters in the Internet corpus
- Frequency list of lexical items in the Internet corpus
- Lancaster Corpus of Mandarin Chinese
- A collection of Chinese corpora and frequency lists Online query with three corpora