Difference between revisions of "NP Chunking (State of the art)"
Jump to navigation
Jump to search
Line 7: | Line 7: | ||
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag | * data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag | ||
* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/] | * dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/] | ||
− | * more information is available from [http://ifarm.nl/erikt/research/np-chunking.html | + | * more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking] |
Line 39: | Line 39: | ||
− | + | Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal. | |
− | |||
− | + | Kudo, T., and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''. | |
− | [http:// | ||
− | + | Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''. | |
− | [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf] | ||
[[Category:State of the art]] | [[Category:State of the art]] |
Revision as of 09:58, 27 June 2007
- Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
- Precision: percentage of NPs found by the algorithm that are correct
- Recall: percentage of NPs defined in the corpus that were found by the chunking program
- Training data: sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
- Testing data: section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
- original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
- data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag
- dataset is available from ftp://ftp.cis.upenn.edu/pub/chunker/
- more information is available from NP Chunking
System name | Short description | Main publications | Software | Results (F) |
---|---|---|---|---|
KM00 | B-I-O tagging using SVM classifiers with polynomial kernel | Kudo and Matsumoto (2000) | YAMCHA Toolkit (but models are not provided) | 93.79 |
KM01 | learning as in KM00, but voting between different representations | Kudo and Matsumoto (2001) | No | 94.22 |
SS05 | specialized HMM + voting between different representations | Shen and Sarkar (2005) | No | 95.23 |
Kudo, T., and Matsumoto, Y. (2000). Use of support vector learning for chunk identification. Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000, pages 142-144, Lisbon, Portugal.
Kudo, T., and Matsumoto, Y. (2001). Chunking with support vector machines. Proceedings of NAACL-2001.
Shen, H., and Sarkar, A. (2005). Voting between multiple data representations for text chunking. Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005.