Difference between revisions of "NP Chunking (State of the art)"

From ACL Wiki
Jump to: navigation, search
Line 8: Line 8:
 
* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
 
* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
 
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]
 
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]
 +
 +
 +
== Table of results ==
  
  
Line 38: Line 41:
 
|}
 
|}
  
 +
 +
== References ==
  
 
Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.
 
Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.
Line 44: Line 49:
  
 
Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.  
 
Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.  
 +
  
 
[[Category:State of the art]]
 
[[Category:State of the art]]

Revision as of 09:21, 2 July 2007

  • Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
  • Precision: percentage of NPs found by the algorithm that are correct
  • Recall: percentage of NPs defined in the corpus that were found by the chunking program
  • Training data: sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
  • Testing data: section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
  • original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
  • data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag
  • dataset is available from ftp://ftp.cis.upenn.edu/pub/chunker/
  • more information is available from NP Chunking


Table of results

System name Short description Main publications Software Results (F)
KM00 B-I-O tagging using SVM classifiers with polynomial kernel Kudo and Matsumoto (2000) YAMCHA Toolkit (but models are not provided) 93.79%
KM01 learning as in KM00, but voting between different representations Kudo and Matsumoto (2001) No 94.22%
SS05 specialized HMM + voting between different representations Shen and Sarkar (2005) No 95.23%


References

Kudo, T., and Matsumoto, Y. (2000). Use of support vector learning for chunk identification. Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000, pages 142-144, Lisbon, Portugal.

Kudo, T., and Matsumoto, Y. (2001). Chunking with support vector machines. Proceedings of NAACL-2001.

Shen, H., and Sarkar, A. (2005). Voting between multiple data representations for text chunking. Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005.