Annotation of a Large Clinical Entity Corpus

Pinal Patel, Disha Davey, Vishal Panchal, Parth Pathak


Abstract
Having an entity annotated corpus of the clinical domain is one of the basic requirements for detection of clinical entities using machine learning (ML) approaches. Past researches have shown the superiority of statistical/ML approaches over the rule based approaches. But in order to take full advantage of the ML approaches, an accurately annotated corpus becomes an essential requirement. Though there are a few annotated corpora available either on a small data set, or covering a narrower domain (like cancer patients records, lab reports), annotation of a large data set representing the entire clinical domain has not been created yet. In this paper, we have described in detail the annotation guidelines, annotation process and our approaches in creating a CER (clinical entity recognition) corpus of 5,160 clinical documents from forty different clinical specialities. The clinical entities range across various types such as diseases, procedures, medications, medical devices and so on. We have classified them into eleven categories for annotation. Our annotation also reflects the relations among the group of entities that constitute larger concepts altogether.
Anthology ID:
D18-1228
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2033–2042
Language:
URL:
https://aclanthology.org/D18-1228
DOI:
10.18653/v1/D18-1228
Bibkey:
Cite (ACL):
Pinal Patel, Disha Davey, Vishal Panchal, and Parth Pathak. 2018. Annotation of a Large Clinical Entity Corpus. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2033–2042, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Annotation of a Large Clinical Entity Corpus (Patel et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1228.pdf