CONLL-2003 (State of the art)

  • Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
  • Precision: percentage of named entities found by the algorithm that are correct
  • Recall: percentage of named entities defined in the corpus that were found by the program
  • Exact match (for all words of a chunk) is used in the calculation of precision and recall (see CONLL scoring software)

  • Training data: Train split of CONLL-2003 corpus
  • Dryrun data: Testa split of CONLL-2003 corpus
  • Testing data: Testb split of CONLL-2003 corpus
  • The corpus contains a very high ratio of metonymic references (city names standing for sport teams)

Table of results

System name Short description System type (1) Main publications Software Results
FIJZ Best CONLL-2003 participant S Florian, Ittycheriah, Jing and Zhang (2003) - 88.76%
Baseline Vocabulary transfer from training to testing S Tjong Kim Sang and De Meulder(2003) - 59.61%
Balie Unsupervised approach: no prior training U Nadeau, Turney and Matwin (2006) 55.98%
BI-LSTM-CRF Bidirectional LSTM-CRF Model S Huang et al. (2015) - 90.10%
  • (1) System type: R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid


