NP Chunking (State of the art)
- Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
- Precision: percentage of NPs found by the algorithm that are correct
- Recall: percentage of NPs defined in the corpus that were found by the chunking program
- Training data: sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
- Testing data: section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
- original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
- data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag
- dataset is available from ftp://ftp.cis.upenn.edu/pub/chunker/
- more information is available from NP Chunking
|System name||Short description||Main publications||Software||Results (F)|
|KM00||B-I-O tagging using SVM classifiers with polynomial kernel||Kudo and Matsumoto (2000)||YAMCHA Toolkit (but models are not provided)||93.79%|
|KM01||learning as in KM00, but voting between different representations||Kudo and Matsumoto (2001)||No||94.22%|
|SS05||specialized HMM + voting between different representations||Shen and Sarkar (2005)||No||95.23%|
Kudo, T., and Matsumoto, Y. (2000). Use of support vector learning for chunk identification. Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000, pages 142-144, Lisbon, Portugal.
Kudo, T., and Matsumoto, Y. (2001). Chunking with support vector machines. Proceedings of NAACL-2001.
Shen, H., and Sarkar, A. (2005). Voting between multiple data representations for text chunking. Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005.