Evaluation of a Sequence Tagging Tool for Biomedical Texts

Julien Tourille, Matthieu Doutreligne, Olivier Ferret, Aurélie Névéol, Nicolas Paris, Xavier Tannier


Abstract
Many applications in biomedical natural language processing rely on sequence tagging as an initial step to perform more complex analysis. To support text analysis in the biomedical domain, we introduce Yet Another SEquence Tagger (YASET), an open-source multi purpose sequence tagger that implements state-of-the-art deep learning algorithms for sequence tagging. Herein, we evaluate YASET on part-of-speech tagging and named entity recognition in a variety of text genres including articles from the biomedical literature in English and clinical narratives in French. To further characterize performance, we report distributions over 30 runs and different sizes of training datasets. YASET provides state-of-the-art performance on the CoNLL 2003 NER dataset (F1=0.87), MEDPOST corpus (F1=0.97), MERLoT corpus (F1=0.99) and NCBI disease corpus (F1=0.81). We believe that YASET is a versatile and efficient tool that can be used for sequence tagging in biomedical and clinical texts.
Anthology ID:
W18-5622
Volume:
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Alberto Lavelli, Anne-Lyse Minard, Fabio Rinaldi
Venue:
Louhi
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
193–203
Language:
URL:
https://aclanthology.org/W18-5622
DOI:
10.18653/v1/W18-5622
Bibkey:
Cite (ACL):
Julien Tourille, Matthieu Doutreligne, Olivier Ferret, Aurélie Névéol, Nicolas Paris, and Xavier Tannier. 2018. Evaluation of a Sequence Tagging Tool for Biomedical Texts. In Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, pages 193–203, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Evaluation of a Sequence Tagging Tool for Biomedical Texts (Tourille et al., Louhi 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5622.pdf
Code
 strayMat/bio-medical_ner
Data
CoNLL 2003NCBI Datasets