Difference between revisions of "NP Chunking (State of the art)"

Revision as of 05:16, 25 June 2012

Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
Precision: percentage of NPs found by the algorithm that are correct
Recall: percentage of NPs defined in the corpus that were found by the chunking program
Training data: sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
Testing data: section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

Table of results

System name	Short description	Main publications	Software	Reports (F)
KM00	B-I-O tagging using SVM classifiers with polynomial kernel	Kudo and Matsumoto (2000), CONLL	YAMCHA Toolkit (but models are not provided)	93.79%
KM01	learning as in KM00, but voting between different representations	Kudo and Matsumoto (2001), NAACL	No	94.22%
SP03	Second order conditional random fields	Fei Sha and Fernando Pereira (2003), HLT/NAACL	No	94.3%
SS05	specialized HMM + voting between different representations	Shen and Sarkar (2005)	No	95.23%
M05	Second order conditional random fields + multi-label classification	Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP	No	94.29%
V06	Conditional random fields + Stochastic Meta Decent (SMD)	S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark Schmidt, and Kevin Murphy (2006), ICML	No	93.6%
S08	Second order latent-dynamic conditional random fields + an improved inference method based on A* search	Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING	HCRF Library	94.34%
C00	Chunks from the Charniak Parser	Hollingshead, Fisher and Roark (2005), Charniak (2000)	?	94.20%

References

E. Charniak (2000). A Maximum-Entropy inspired parser, NAACL 2000

K. Hollingshead, S. Fisher and B. Roark (2005). Comparing and combining finite-state and context-free parsers. HLT/EMNLP 2005.

T. Kudo and Y. Matsumoto (2000). Use of support vector learning for chunk identification. Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000, pages 142-144, Lisbon, Portugal.

T. Kudo and Y. Matsumoto (2001). Chunking with support vector machines. Proceedings of NAACL-2001.

F. Sha and F. Pereira (2003). Shallow Parsing with Conditional Random Fields. Proceedings of HLT-NAACL 2003, pages 213-220. Edmonton, Canada.

H. Shen and A. Sarkar (2005). Voting between multiple data representations for text chunking. Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005.

R. McDonald, K. Crammer and F. Pereira (2005). Flexible Text Segmentation with Structured Multilabel Classification. Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005

S. V. N. Vishwanathan, N. Schraudolph, M. Schmidt, and K. Murphy. Accelerated Training Conditional Random Fields with Stochastic Gradient Methods. In Proc. Intl. Conf. Machine Learning, pp. 969 – 976, ACM Press, New York, NY, USA, 2006.

X. Sun, L.P. Morency, D. OKanohara and J. Tsujii (2008). Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference. Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008). Pages 841-848. Manchester, UK.

External links

dataset is available from ftp://ftp.cis.upenn.edu/pub/chunker/
more information is available from NP Chunking

@@ Line 17: / Line 17: @@
 ! Main publications
 ! Software
-! Results (F)
+! Reports (F)
 |-
 | KM00
 | B-I-O tagging using SVM classifiers with polynomial kernel
-| Kudo and Matsumoto (2000)
+| Kudo and Matsumoto (2000), CONLL
 | [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
 | 93.79%
@@ Line 27: / Line 27: @@
 | KM01
 | learning as in KM00, but voting between different representations
-| Kudo and Matsumoto (2001)
+| Kudo and Matsumoto (2001), NAACL
 | No
 | 94.22%
+|-
+| SP03
+| Second order conditional random fields
+| Fei Sha and Fernando Pereira (2003), HLT/NAACL
+| No
+| 94.3%
 |-
 | SS05
@@ Line 37: / Line 43: @@
 | 95.23%
 |-
+| M05
+| Second order conditional random fields + multi-label classification
+| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP
+| No
+| 94.29%
+|-
+| V06
+| Conditional random fields + Stochastic Meta Decent (SMD)
+| S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark Schmidt, and Kevin Murphy (2006), ICML
+| No
+| 93.6%
+|-
+| S08
+| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
+| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING
+| HCRF Library
+| 94.34%
+|-
+| C00
+| Chunks from the Charniak Parser
+| Hollingshead, Fisher and Roark (2005), Charniak (2000)
+| ?
+| 94.20%
 |}
+== References ==
+E. Charniak (2000). [http://aclweb.org/anthology-new/A/A00/A00-2018.pdf A Maximum-Entropy inspired parser], NAACL 2000
+K. Hollingshead, S. Fisher and B. Roark (2005). [http://www.aclweb.org/anthology-new/H/H05/H05-1099.pdf Comparing and combining finite-state and context-free parsers.]  HLT/EMNLP 2005.
+T. Kudo and Y. Matsumoto (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.
-== References ==
+T. Kudo and Y. Matsumoto (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.
+F. Sha and F. Pereira (2003). [http://www-rcf.usc.edu/~feisha/htmls/Papers.html Shallow Parsing with Conditional Random Fields]. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.
-Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.
+H. Shen and A. Sarkar (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.
-Kudo, T., and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.
+R. McDonald, K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''
-Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.
+S. V. N. Vishwanathan, N. Schraudolph, M. Schmidt, and K. Murphy. Accelerated Training Conditional Random Fields with Stochastic Gradient Methods. In Proc. Intl. Conf. Machine Learning, pp. 969 – 976, ACM Press, New York, NY, USA, 2006.
+X. Sun, L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.
 == See also ==

Difference between revisions of "NP Chunking (State of the art)"

Revision as of 05:16, 25 June 2012

Contents

Table of results

References

See also

External links

Navigation menu

Search