Difference between revisions of "CONLL-2003 (State of the art)"
Simzalabim (talk | contribs) |
|||
(7 intermediate revisions by 4 users not shown) | |||
Line 9: | Line 9: | ||
* '''Testing data:''' Testb split of CONLL-2003 corpus | * '''Testing data:''' Testb split of CONLL-2003 corpus | ||
* The corpus contains a very high ratio of metonymic references (city names standing for sport teams) | * The corpus contains a very high ratio of metonymic references (city names standing for sport teams) | ||
− | + | ||
== Table of results == | == Table of results == | ||
Line 44: | Line 44: | ||
| 55.98% | | 55.98% | ||
|- | |- | ||
+ | | BI-LSTM-CRF | ||
+ | | Bidirectional LSTM-CRF Model | ||
+ | | S | ||
+ | | Huang et al. (2015) | ||
+ | | - | ||
+ | | 90.10% | ||
+ | |- | ||
+ | | BI-LSTM-CRF | ||
+ | | Bidirectional LSTM-CRF Model | ||
+ | | S | ||
+ | | Akbik, Blythe, & Vollgraf (2018) | ||
+ | | https://github.com/zalandoresearch/flair | ||
+ | | 93.09% | ||
|} | |} | ||
− | * (1) '''System type''': R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid | + | * (1) '''System type''': R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid |
− | |||
== References == | == References == | ||
Line 55: | Line 67: | ||
Nadeau, D., Turney, P. D. and Matwin, S. (2006) [http://iit-iti.nrc-cnrc.gc.ca/publications/nrc-48727_e.html Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity]. ''Proceedings 19th Canadian Conference on Artificial Intelligence''. Québec, Canada. | Nadeau, D., Turney, P. D. and Matwin, S. (2006) [http://iit-iti.nrc-cnrc.gc.ca/publications/nrc-48727_e.html Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity]. ''Proceedings 19th Canadian Conference on Artificial Intelligence''. Québec, Canada. | ||
− | Tjong Kim Sang, E. F. and De Meulder, F. (2003) [http://www.cnts.ua.ac.be/conll2003/pdf/14247tjo.pdf Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition]. ''Proceedings of CoNLL-2003''. Edmonton, Canada. | + | Tjong Kim Sang, E. F. and De Meulder, F. (2003) [http://www.cnts.ua.ac.be/conll2003/pdf/14247tjo.pdf Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition]. ''Proceedings of CoNLL-2003''. Edmonton, Canada. |
+ | |||
+ | Z. H. Huang, W. Xu, and K. Yu. (2015) [http://arxiv.org/abs/1508.01991 Bidirectional LSTM-CRF Models for Sequence Tagging]. ''In arXiv:1508.01991''. 2015. | ||
+ | Akbik, A., Blythe, D., and Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 1638-1649). | ||
== See also == | == See also == |
Latest revision as of 06:29, 12 July 2019
- Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
- Precision: percentage of named entities found by the algorithm that are correct
- Recall: percentage of named entities defined in the corpus that were found by the program
- Exact match (for all words of a chunk) is used in the calculation of precision and recall (see CONLL scoring software)
- Training data: Train split of CONLL-2003 corpus
- Dryrun data: Testa split of CONLL-2003 corpus
- Testing data: Testb split of CONLL-2003 corpus
- The corpus contains a very high ratio of metonymic references (city names standing for sport teams)
Table of results
System name | Short description | System type (1) | Main publications | Software | Results |
---|---|---|---|---|---|
FIJZ | Best CONLL-2003 participant | S | Florian, Ittycheriah, Jing and Zhang (2003) | - | 88.76% |
Baseline | Vocabulary transfer from training to testing | S | Tjong Kim Sang and De Meulder(2003) | - | 59.61% |
Balie | Unsupervised approach: no prior training | U | Nadeau, Turney and Matwin (2006) | sourceforge.net | 55.98% |
BI-LSTM-CRF | Bidirectional LSTM-CRF Model | S | Huang et al. (2015) | - | 90.10% |
BI-LSTM-CRF | Bidirectional LSTM-CRF Model | S | Akbik, Blythe, & Vollgraf (2018) | https://github.com/zalandoresearch/flair | 93.09% |
- (1) System type: R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid
References
Florian, R., Ittycheriah, A., Jing, H. and Zhang, T. (2003) Named Entity Recognition through Classifier Combination. Proceedings of CoNLL-2003. Edmonton, Canada.
Nadeau, D., Turney, P. D. and Matwin, S. (2006) Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. Proceedings 19th Canadian Conference on Artificial Intelligence. Québec, Canada.
Tjong Kim Sang, E. F. and De Meulder, F. (2003) Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. Proceedings of CoNLL-2003. Edmonton, Canada.
Z. H. Huang, W. Xu, and K. Yu. (2015) Bidirectional LSTM-CRF Models for Sequence Tagging. In arXiv:1508.01991. 2015.
Akbik, A., Blythe, D., and Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 1638-1649).