Difference between revisions of "NP Chunking (State of the art)"

Revision as of 10:22, 27 June 2007

Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
Precision: percentage of NPs found by the algorithm that are correct
Recall: percentage of NPs defined in the corpus that were found by the chunking program
Training data: sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
Testing data: section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag
dataset is available from ftp://ftp.cis.upenn.edu/pub/chunker/
more information is available from http://ifarm.nl/erikt/research/np-chunking.html

System Name	Short Description	Main Publications	Software (if available)	Results
KM00	B-I-O tagging using SVM classifiers with polynomial kernel	KM00 [1]	YAMCHA Toolkit [2] (but models are not provided)	F: 93.79
KM01	Learning like in KM00, but voting between different representation.	KM01 [3]	No.	F: 94.22
---	Specialized HMM + voting between different representation.	Sarkar2005 [4]	No.	F: 95.23

KM00 - Taku Kudo and Yuji Matsumoto. 2000b. Use of Support Vector Learning for Chunk Identification. In Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000.
KM01 - Taku Kudo and Yuji Matsumoto. Chunking with support vector machines. In NAACL-2001
Sarkar2005 - Hong Shen and Anoop Sarkar. Voting between Multiple Data Representations for Text Chunking. In proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005.

@@ Line 1: / Line 1: @@
-== "Standard" measure: ==
+* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
-The performance of the algorithm is measured with two scores: precision and recall. Precision measures how many NPs found by the algorithm are correct and the recall rate contains the percentage of NPs defined in the corpus that were found by the chunking program.
+* '''Precision:''' percentage of NPs found by the algorithm that are correct
+* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
+* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
+* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
+* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
+* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag
+* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
+* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html http://ifarm.nl/erikt/research/np-chunking.html]
-The two rates can be combined in one measure: the F rate in which F = 2*precision*recall / (recall+precision)
-== "Standard" datasets: ==
-The original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus. The data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger and the correct IOB tag.
-The standard data set put forward by Ramshaw and Marcus consists of sections 15-18 of the Wall Street Journal corpus as training material and section 20 of that corpus as test material.
-Dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/].
-== More information: ==
-See here: [http://ifarm.nl/erikt/research/np-chunking.html]

Difference between revisions of "NP Chunking (State of the art)"

Revision as of 10:22, 27 June 2007

Navigation menu

Search