Difference between revisions of "NP Chunking (State of the art)"
Jump to navigation
Jump to search
m (Wikipedia capitalization) |
|||
Line 1: | Line 1: | ||
− | + | * '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision) | |
− | + | * '''Precision:''' percentage of NPs found by the algorithm that are correct | |
+ | * '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program | ||
+ | * '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus) | ||
+ | * '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus) | ||
+ | * original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus | ||
+ | * data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag | ||
+ | * dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/] | ||
+ | * more information is available from [http://ifarm.nl/erikt/research/np-chunking.html http://ifarm.nl/erikt/research/np-chunking.html] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Revision as of 09:22, 27 June 2007
- Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
- Precision: percentage of NPs found by the algorithm that are correct
- Recall: percentage of NPs defined in the corpus that were found by the chunking program
- Training data: sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
- Testing data: section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
- original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
- data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag
- dataset is available from ftp://ftp.cis.upenn.edu/pub/chunker/
- more information is available from http://ifarm.nl/erikt/research/np-chunking.html
System Name | Short Description | Main Publications | Software (if available) | Results | Comments (i.e. extra resources used, train/test times, ...) | |
---|---|---|---|---|---|---|
KM00 | B-I-O tagging using SVM classifiers with polynomial kernel | KM00 [1] | YAMCHA Toolkit [2] (but models are not provided) | F: 93.79 | ||
KM01 | Learning like in KM00, but voting between different representation. | KM01 [3] | No. | F: 94.22 | ||
--- | Specialized HMM + voting between different representation. | Sarkar2005 [4] | No. | F: 95.23 |
- KM00 - Taku Kudo and Yuji Matsumoto. 2000b. Use of Support Vector Learning for Chunk Identification. In Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000.
- KM01 - Taku Kudo and Yuji Matsumoto. Chunking with support vector machines. In NAACL-2001
- Sarkar2005 - Hong Shen and Anoop Sarkar. Voting between Multiple Data Representations for Text Chunking. In proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005.