NP Chunking (State of the art)

From ACL Wiki
Revision as of 09:22, 27 June 2007 by Pdturney (talk | contribs)

Jump to: navigation, search
  • Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
  • Precision: percentage of NPs found by the algorithm that are correct
  • Recall: percentage of NPs defined in the corpus that were found by the chunking program
  • Training data: sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
  • Testing data: section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
  • original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
  • data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag
  • dataset is available from
  • more information is available from

System Name Short Description Main Publications Software (if available) Results Comments (i.e. extra resources used, train/test times, ...)
KM00 B-I-O tagging using SVM classifiers with polynomial kernel KM00 [1] YAMCHA Toolkit [2] (but models are not provided) F: 93.79
KM01 Learning like in KM00, but voting between different representation. KM01 [3] No. F: 94.22
--- Specialized HMM + voting between different representation. Sarkar2005 [4] No. F: 95.23
  • KM00 - Taku Kudo and Yuji Matsumoto. 2000b. Use of Support Vector Learning for Chunk Identification. In Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000.
  • KM01 - Taku Kudo and Yuji Matsumoto. Chunking with support vector machines. In NAACL-2001
  • Sarkar2005 - Hong Shen and Anoop Sarkar. Voting between Multiple Data Representations for Text Chunking. In proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005.