Difference between revisions of "Resources for Chinese"

From ACL Wiki
Jump to: navigation, search
(Nonfree or Unknown license: added link to Lancaster)
 
(10 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 +
==Tools==
 +
===Free software===
 +
* [https://github.com/yzhang/rseg rseg] word segmentation; written in ruby (no compilation, no hard dependencies apart from ruby), comes with a model (MIT license)
 +
* [https://code.google.com/p/ctbparser/ ctbparser] word segmentation, POS tagging, NER, dependency parsing, all using Conditional Random Fields; written in C++ (LGPL license)
 +
* [http://www.cl.cam.ac.uk/~yz360/zpar.html ZPar] word segmentation, POS tagging, CFG/dep/CCG parsing of Chinese and English; written in C++ (GPL3 license)
 +
* [http://code.google.com/p/duduplus/ DuDuPlus: a graph-based dependency parser for English and Chinese] ("Other Open Source" license?)
 +
** where is the source code?
 +
 +
==Corpora==
 +
===Free license===
 +
* [http://corpora.heliohost.org/ HC Corpora] 1606811 lines of [http://en.wikipedia.org/wiki/Fair_use Fair Use] excerpts from news, blogs, twitter
 +
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 +
 +
===Nonfree or Unknown license===
 +
* [http://www.chinesecomputing.com Chinese Computing]
 
* [http://www.icl.pku.edu.cn/icl_groups/corpus/dwldform1.asp Word Segmented and POS tagged People Daily Corpus at ICL of Peking University]
 
* [http://www.icl.pku.edu.cn/icl_groups/corpus/dwldform1.asp Word Segmented and POS tagged People Daily Corpus at ICL of Peking University]
* [http://corpus.leeds.ac.uk/frqc/i-zh-char.num Frequency list of characters in the Internet corpus]
+
* [http://corpus.leeds.ac.uk/frqc/i-zh-char.num.html Frequency list of characters in the Internet corpus]
 
* [http://corpus.leeds.ac.uk/frqc/internet-zh.num Frequency list of lexical items in the Internet corpus]
 
* [http://corpus.leeds.ac.uk/frqc/internet-zh.num Frequency list of lexical items in the Internet corpus]
 
* [http://www.ling.lancs.ac.uk/corplang/lcmc/ Lancaster Corpus of Mandarin Chinese]
 
* [http://www.ling.lancs.ac.uk/corplang/lcmc/ Lancaster Corpus of Mandarin Chinese]
 +
* [http://corpus.leeds.ac.uk/query-zh.html A collection of Chinese corpora and frequency lists]  Online query with three corpora
 +
 +
[[Category:Resources by language|Chinese]]

Latest revision as of 00:19, 20 February 2014

Tools

Free software

  • rseg word segmentation; written in ruby (no compilation, no hard dependencies apart from ruby), comes with a model (MIT license)
  • ctbparser word segmentation, POS tagging, NER, dependency parsing, all using Conditional Random Fields; written in C++ (LGPL license)
  • ZPar word segmentation, POS tagging, CFG/dep/CCG parsing of Chinese and English; written in C++ (GPL3 license)
  • DuDuPlus: a graph-based dependency parser for English and Chinese ("Other Open Source" license?)
    • where is the source code?

Corpora

Free license

Nonfree or Unknown license