Difference between revisions of "CONLL-2003 (State of the art)"

Revision as of 12:25, 27 August 2015

Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
Precision: percentage of named entities found by the algorithm that are correct
Recall: percentage of named entities defined in the corpus that were found by the program
Exact match (for all words of a chunk) is used in the calculation of precision and recall (see CONLL scoring software)

Training data: Train split of CONLL-2003 corpus
Dryrun data: Testa split of CONLL-2003 corpus
Testing data: Testb split of CONLL-2003 corpus
The corpus contains a very high ratio of metonymic references (city names standing for sport teams)

System name	Short description	System type (1)	Main publications	Software	Results
FIJZ	Best CONLL-2003 participant	S	Florian, Ittycheriah, Jing and Zhang (2003)	-	88.76%
Baseline	Vocabulary transfer from training to testing	S	Tjong Kim Sang and De Meulder(2003)	-	59.61%
Balie	Unsupervised approach: no prior training	U	Nadeau, Turney and Matwin (2006)	sourceforge.net	55.98%
BI-LSTM-CRF	Bidirectional LSTM-CRF Model	S	Huang et al. (2015)	-	90.10%

(1) System type: R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid

Florian, R., Ittycheriah, A., Jing, H. and Zhang, T. (2003) Named Entity Recognition through Classifier Combination. Proceedings of CoNLL-2003. Edmonton, Canada.

Nadeau, D., Turney, P. D. and Matwin, S. (2006) Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. Proceedings 19th Canadian Conference on Artificial Intelligence. Québec, Canada.

Tjong Kim Sang, E. F. and De Meulder, F. (2003) Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. Proceedings of CoNLL-2003. Edmonton, Canada.

Z. H. Huang, W. Xu, and K. Yu. (2015) Bidirectional LSTM-CRF Models for Sequence Tagging. In arXiv:1508.01991. 2015.

@@ Line 44: / Line 44: @@
 | 55.98%
 |-
+| BI-LSTM-CRF
+| Bidirectional LSTM-CRF Model
+| S
+| Huang et al. (2015)
+| -
+| 90.10%
 |}
 * (1) '''System type''': R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid
 == References ==
@@ Line 55: / Line 60: @@
 Nadeau, D., Turney, P. D. and Matwin, S. (2006) [http://iit-iti.nrc-cnrc.gc.ca/publications/nrc-48727_e.html Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity]. ''Proceedings 19th Canadian Conference on Artificial Intelligence''. Québec, Canada.
 Tjong Kim Sang, E. F. and De Meulder, F. (2003) [http://www.cnts.ua.ac.be/conll2003/pdf/14247tjo.pdf Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition]. ''Proceedings of CoNLL-2003''. Edmonton, Canada.
+Z. H. Huang, W. Xu, and K. Yu. (2015) [http://arxiv.org/abs/1508.01991 Bidirectional LSTM-CRF Models for Sequence Tagging]. ''In arXiv:1508.01991''. 2015.
 == See also ==