A Named Entity Recognition Shootout for German

Martin Riedl, Sebastian Padó


Abstract
We ask how to practically build a model for German named entity recognition (NER) that performs at the state of the art for both contemporary and historical texts, i.e., a big-data and a small-data scenario. The two best-performing model families are pitted against each other (linear-chain CRFs and BiLSTM) to observe the trade-off between expressiveness and data requirements. BiLSTM outperforms the CRF when large datasets are available and performs inferior for the smallest dataset. BiLSTMs profit substantially from transfer learning, which enables them to be trained on multiple corpora, resulting in a new state-of-the-art model for German NER on two contemporary German corpora (CoNLL 2003 and GermEval 2014) and two historic corpora.
Anthology ID:
P18-2020
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
120–125
Language:
URL:
https://aclanthology.org/P18-2020
DOI:
10.18653/v1/P18-2020
Bibkey:
Cite (ACL):
Martin Riedl and Sebastian Padó. 2018. A Named Entity Recognition Shootout for German. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 120–125, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
A Named Entity Recognition Shootout for German (Riedl & Padó, ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-2020.pdf
Poster:
 P18-2020.Poster.pdf
Data
CoNLL 2003