Contextual String Embeddings for Sequence Labeling

Alan Akbik, Duncan Blythe, Roland Vollgraf


Abstract
Recent advances in language modeling using recurrent neural networks have made it viable to model language as distributions over characters. By learning to predict the next character on the basis of previous characters, such models have been shown to automatically internalize linguistic concepts such as words, sentences, subclauses and even sentiment. In this paper, we propose to leverage the internal states of a trained character language model to produce a novel type of word embedding which we refer to as contextual string embeddings. Our proposed embeddings have the distinct properties that they (a) are trained without any explicit notion of words and thus fundamentally model words as sequences of characters, and (b) are contextualized by their surrounding text, meaning that the same word will have different embeddings depending on its contextual use. We conduct a comparative evaluation against previous embeddings and find that our embeddings are highly useful for downstream tasks: across four classic sequence labeling tasks we consistently outperform the previous state-of-the-art. In particular, we significantly outperform previous work on English and German named entity recognition (NER), allowing us to report new state-of-the-art F1-scores on the CoNLL03 shared task. We release all code and pre-trained language models in a simple-to-use framework to the research community, to enable reproduction of these experiments and application of our proposed embeddings to other tasks: https://github.com/zalandoresearch/flair
Anthology ID:
C18-1139
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1638–1649
URL:
https://www.aclweb.org/anthology/C18-1139
DOI:
Bib Export formats:
BibTeX MODS XML EndNote