POS Tagging (State of the art)
Revision as of 23:09, 1 January 2010 by ChristopherManning
- Performance measure: per token accuracy. (The convention is for this to be measured on all tokens, including punctuation tokens and other unambiguous tokens.)
- Training data: sections 0-18 of Wall Street Journal corpus
- Testing data: sections 22-24 of Wall Street Journal corpus
Table of results
|System name||Short description||Main publications||Software||Results|
|SVMTool||SVM-based tagger and tagger generator||Giménez and Márquez (2004)||SVMTool||97.16%|
|Stanford Tagger||learning with cyclic dependency network||Toutanova et al. (2003)||Stanford Tagger||97.24%|
|LTAG-spinal||bidirectional perceptron learning||Shen et al. (2007)||LTAG-spinal||97.33%|
|GENiA Tagger||?||Tsuruoka, et al (2005)||GENiA||96.94% on WSJ, 98.26% on biomed.|
- Giménez, J., and Márquez, L. (2004). SVMTool: A general POS tagger generator based on Support Vector Machines. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04). Lisbon, Portugal.
- Shen, L., Satta, G., and Joshi, A. (2007). Guided learning for bidirectional sequence classification. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007), pages 760-767.
- Toutanova, K., Klein, D., Manning, C.D., Yoram Singer, Y. (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of HLT-NAACL 2003, pages 252-259.
- Yoshimasa Tsuruoka, Yuka Tateishi, Jin-Dong Kim, Tomoko Ohta, John McNaught, Sophia Ananiadou, and Jun'ichi Tsujii, "Developing a Robust Part-of-Speech Tagger for Biomedical Text, Advances in Informatics" - 10th Panhellenic Conference on Informatics, LNCS 3746, pp. 382-392, 2005
- Yoshimasa Tsuruoka and Jun'ichi Tsujii, "Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data", Proceedings of HLT/EMNLP 2005, pp. 467-474.