Difference between revisions of "NP Chunking (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
Line 21: Line 21:
 
| KM00
 
| KM00
 
| B-I-O tagging using SVM classifiers with polynomial kernel
 
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000)
+
| Kudo and Matsumoto (2000), CONLL
 
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
 
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
 
| 93.79%
 
| 93.79%
Line 27: Line 27:
 
| KM01
 
| KM01
 
| learning as in KM00, but voting between different representations
 
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (2001), NAACL'01
+
| Kudo and Matsumoto (2001), NAACL
 
| No
 
| No
 
| 94.22%
 
| 94.22%
Line 33: Line 33:
 
| SP03
 
| SP03
 
| Second order conditional random fields
 
| Second order conditional random fields
| Fei Sha and Fernando Pereira (2003), HLT/NAACL'03
+
| Fei Sha and Fernando Pereira (2003), HLT/NAACL
 
| No
 
| No
 
| 94.3%
 
| 94.3%
Line 45: Line 45:
 
| M05
 
| M05
 
| Second order conditional random fields + multi-label classification
 
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP'05
+
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP
 
| No
 
| No
 
| 94.29%
 
| 94.29%
Line 51: Line 51:
 
| S08
 
| S08
 
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
 
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING'08
+
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING
 
| HCRF Library
 
| HCRF Library
 
| 94.34%
 
| 94.34%

Revision as of 10:28, 10 January 2009

  • Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
  • Precision: percentage of NPs found by the algorithm that are correct
  • Recall: percentage of NPs defined in the corpus that were found by the chunking program
  • Training data: sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
  • Testing data: section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
  • original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
  • data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag


Table of results

System name Short description Main publications Software Results (F)
KM00 B-I-O tagging using SVM classifiers with polynomial kernel Kudo and Matsumoto (2000), CONLL YAMCHA Toolkit (but models are not provided) 93.79%
KM01 learning as in KM00, but voting between different representations Kudo and Matsumoto (2001), NAACL No 94.22%
SP03 Second order conditional random fields Fei Sha and Fernando Pereira (2003), HLT/NAACL No 94.3%
SS05 specialized HMM + voting between different representations Shen and Sarkar (2005) No 95.23%
M05 Second order conditional random fields + multi-label classification Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP No 94.29%
S08 Second order latent-dynamic conditional random fields + an improved inference method based on A* search Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING HCRF Library 94.34%

References

Kudo, T., and Matsumoto, Y. (2000). Use of support vector learning for chunk identification. Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000, pages 142-144, Lisbon, Portugal.

Kudo, T., and Matsumoto, Y. (2001). Chunking with support vector machines. Proceedings of NAACL-2001.

Sha, F., and F. Pereira (2003). Shallow Parsing with Conditional Random Fields. Proceedings of HLT-NAACL 2003, pages 213-220. Edmonton, Canada.

Shen, H., and Sarkar, A. (2005). Voting between multiple data representations for text chunking. Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005.

McDonald, R., K. Crammer and F. Pereira (2005). Flexible Text Segmentation with Structured Multilabel Classification. Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005

Sun, X., L.P. Morency, D. OKanohara and J. Tsujii (2008). Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference. Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008). Pages 841-848. Manchester, UK.

See also


External links