Kirk Roberts


2023

pdf bib
Proceedings of the 5th Clinical Natural Language Processing Workshop
Tristan Naumann | Asma Ben Abacha | Steven Bethard | Kirk Roberts | Anna Rumshisky
Proceedings of the 5th Clinical Natural Language Processing Workshop

2022

pdf bib
DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
Jayetri Bardhan | Anthony Colas | Kirk Roberts | Daisy Zhe Wang
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from a publicly available Electronic Health Record (EHR). EHRs contain patient records, stored in structured tables and unstructured clinical notes. The information in structured and unstructured EHRs is not strictly disjoint: information may be duplicated, contradictory, or provide additional context between these sources. Our dataset has medication-related queries, containing over 70,000 question-answer pairs. To provide a baseline model and help analyze the dataset, we have used a simple model (MultimodalEHRQA) which uses the predictions of a modality selection network to choose between EHR tables and clinical notes to answer the questions. This is used to direct the questions to the table-based or text-based state-of-the-art QA model. In order to address the problem arising from complex, nested queries, this is the first time Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers (RAT-SQL) has been used to test the structure of query templates in EHR data. Our goal is to provide a benchmark dataset for multi-modal QA systems, and to open up new avenues of research in improving question answering over EHR structured data by using context from unstructured clinical data.

pdf bib
A Cross-document Coreference Dataset for Longitudinal Tracking across Radiology Reports
Surabhi Datta | Hio Cheng Lam | Atieh Pajouhi | Sunitha Mogalla | Kirk Roberts
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper proposes a new cross-document coreference resolution (CDCR) dataset for identifying co-referring radiological findings and medical devices across a patient’s radiology reports. Our annotated corpus contains 5872 mentions (findings and devices) spanning 638 MIMIC-III radiology reports across 60 patients, covering multiple imaging modalities and anatomies. There are a total of 2292 mention chains. We describe the annotation process in detail, highlighting the complexities involved in creating a sizable and realistic dataset for radiology CDCR. We apply two baseline methods–string matching and transformer language models (BERT)–to identify cross-report coreferences. Our results indicate the requirement of further model development targeting better understanding of domain language and context to address this challenging and unexplored task. This dataset can serve as a resource to develop more advanced natural language processing CDCR methods in the future. This is one of the first attempts focusing on CDCR in the clinical domain and holds potential in benefiting physicians and clinical research through long-term tracking of radiology findings.

pdf bib
RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports
Sarvesh Soni | Meghana Gudala | Atieh Pajouhi | Kirk Roberts
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present a radiology question answering dataset, RadQA, with 3074 questions posed against radiology reports and annotated with their corresponding answer spans (resulting in a total of 6148 question-answer evidence pairs) by physicians. The questions are manually created using the clinical referral section of the reports that take into account the actual information needs of ordering physicians and eliminate bias from seeing the answer context (and, further, organically create unanswerable questions). The answer spans are marked within the Findings and Impressions sections of a report. The dataset aims to satisfy the complex clinical requirements by including complete (yet concise) answer phrases (which are not just entities) that can span multiple lines. We conduct a thorough analysis of the proposed dataset by examining the broad categories of disagreement in annotation (providing insights on the errors made by humans) and the reasoning requirements to answer a question (uncovering the huge dependence on medical knowledge for answering the questions). The advanced transformer language models achieve the best F1 score of 63.55 on the test set, however, the best human performance is 90.31 (with an average of 84.52). This demonstrates the challenging nature of RadQA that leaves ample scope for future method research.

pdf bib
Proceedings of the 4th Clinical Natural Language Processing Workshop
Tristan Naumann | Steven Bethard | Kirk Roberts | Anna Rumshisky
Proceedings of the 4th Clinical Natural Language Processing Workshop

2020

pdf bib
A Hybrid Deep Learning Approach for Spatial Trigger Extraction from Radiology Reports
Surabhi Datta | Kirk Roberts
Proceedings of the Third International Workshop on Spatial Language Understanding

Radiology reports contain important clinical information about patients which are often tied through spatial expressions. Spatial expressions (or triggers) are mainly used to describe the positioning of radiographic findings or medical devices with respect to some anatomical structures. As the expressions result from the mental visualization of the radiologist’s interpretations, they are varied and complex. The focus of this work is to automatically identify the spatial expression terms from three different radiology sub-domains. We propose a hybrid deep learning-based NLP method that includes – 1) generating a set of candidate spatial triggers by exact match with the known trigger terms from the training data, 2) applying domain-specific constraints to filter the candidate triggers, and 3) utilizing a BERT-based classifier to predict whether a candidate trigger is a true spatial trigger or not. The results are promising, with an improvement of 24 points in the average F1 measure compared to a standard BERT-based sequence labeler.

pdf bib
Rad-SpatialNet: A Frame-based Resource for Fine-Grained Spatial Relations in Radiology Reports
Surabhi Datta | Morgan Ulinski | Jordan Godfrey-Stovall | Shekhar Khanpara | Roy F. Riascos-Castaneda | Kirk Roberts
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper proposes a representation framework for encoding spatial language in radiology based on frame semantics. The framework is adopted from the existing SpatialNet representation in the general domain with the aim to generate more accurate representations of spatial language used by radiologists. We describe Rad-SpatialNet in detail along with illustrating the importance of incorporating domain knowledge in understanding the varied linguistic expressions involved in different radiological spatial relations. This work also constructs a corpus of 400 radiology reports of three examination types (chest X-rays, brain MRIs, and babygrams) annotated with fine-grained contextual information according to this schema. Spatial trigger expressions and elements corresponding to a spatial frame are annotated. We apply BERT-based models (BERT-Base and BERT- Large) to first extract the trigger terms (lexical units for a spatial frame) and then to identify the related frame elements. The results of BERT- Large are decent, with F1 of 77.89 for spatial trigger extraction and an overall F1 of 81.61 and 66.25 across all frame elements using gold and predicted spatial triggers respectively. This frame-based resource can be used to develop and evaluate more advanced natural language processing (NLP) methods for extracting fine-grained spatial information from radiology text in the future.

pdf bib
Evaluation of Dataset Selection for Pre-Training and Fine-Tuning Transformer Language Models for Clinical Question Answering
Sarvesh Soni | Kirk Roberts
Proceedings of the Twelfth Language Resources and Evaluation Conference

We evaluate the performance of various Transformer language models, when pre-trained and fine-tuned on different combinations of open-domain, biomedical, and clinical corpora on two clinical question answering (QA) datasets (CliCR and emrQA). We perform our evaluations on the task of machine reading comprehension, which involves training the model to answer a question given an unstructured context paragraph. We conduct a total of 48 experiments on different combinations of the large open-domain and domain-specific corpora. We found that an initial fine-tuning on an open-domain dataset, SQuAD, consistently improves the clinical QA performance across all the model variants.

pdf bib
Extracting Adherence Information from Electronic Health Records
Jordan Sanders | Meghana Gudala | Kathleen Hamilton | Nishtha Prasad | Jordan Stovall | Eduardo Blanco | Jane E Hamilton | Kirk Roberts
Proceedings of the 28th International Conference on Computational Linguistics

Patient adherence is a critical factor in health outcomes. We present a framework to extract adherence information from electronic health records, including both sentence-level information indicating general adherence information (full, partial, none, etc.) and span-level information providing additional information such as adherence type (medication or nonmedication), reasons and outcomes. We annotate and make publicly available a new corpus of 3,000 de-identified sentences, and discuss the language physicians use to document adherence information. We also explore models based on state-of-the-art transformers to automate both tasks.

pdf bib
Proceedings of the 3rd Clinical Natural Language Processing Workshop
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann
Proceedings of the 3rd Clinical Natural Language Processing Workshop

pdf bib
Towards an Ontology-based Medication Conversational Agent for PrEP and PEP
Muhammad Amith | Licong Cui | Kirk Roberts | Cui Tao
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations

ABSTRACT: HIV (human immunodeficiency virus) can damage a human’s immune system and cause Acquired Immunodeficiency Syndrome (AIDS) which could lead to severe outcomes, including death. While HIV infections have decreased over the last decade, there is still a significant population where the infection permeates. PrEP and PEP are two proven preventive measures introduced that involve periodic dosage to stop the onset of HIV infection. However, the adherence rates for this medication is low in part due to the lack of information about the medication. There exist several communication barriers that prevent patient-provider communication from happening. In this work, we present our ontology-based method for automating the communication of this medication that can be deployed for live conversational agents for PrEP and PEP. This method facilitates a model of automated conversation between the machine and user can also answer relevant questions.

2019

pdf bib
Proceedings of the 2nd Clinical Natural Language Processing Workshop
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann
Proceedings of the 2nd Clinical Natural Language Processing Workshop

pdf bib
A Paraphrase Generation System for EHR Question Answering
Sarvesh Soni | Kirk Roberts
Proceedings of the 18th BioNLP Workshop and Shared Task

This paper proposes a dataset and method for automatically generating paraphrases for clinical questions relating to patient-specific information in electronic health records (EHRs). Crowdsourcing is used to collect 10,578 unique questions across 946 semantically distinct paraphrase clusters. This corpus is then used with a deep learning-based question paraphrasing method utilizing variational autoencoder and LSTM encoder/decoder. The ultimate use of such a method is to improve the performance of automatic question answering methods for EHRs.

pdf bib
Extraction of Lactation Frames from Drug Labels and LactMed
Heath Goodrum | Meghana Gudala | Ankita Misra | Kirk Roberts
Proceedings of the 18th BioNLP Workshop and Shared Task

This paper describes a natural language processing (NLP) approach to extracting lactation-specific drug information from two sources: FDA-mandated drug labels and the NLM Drugs and Lactation Database (LactMed). A frame semantic approach is utilized, and the paper describes the selected frames, their annotation on a set of 900 sections from drug labels and LactMed articles, and the NLP system to extract such frame instances automatically. The ultimate goal of the project is to use such a system to identify discrepancies in lactation-related drug information between these resources.

2018

pdf bib
A FrameNet for Cancer Information in Clinical Narratives: Schema and Annotation
Kirk Roberts | Yuqi Si | Anshul Gandhi | Elmer Bernstam
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Annotating Named Entities in Consumer Health Questions
Halil Kilicoglu | Asma Ben Abacha | Yassine Mrabet | Kirk Roberts | Laritza Rodriguez | Sonya Shooshan | Dina Demner-Fushman
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe a corpus of consumer health questions annotated with named entities. The corpus consists of 1548 de-identified questions about diseases and drugs, written in English. We defined 15 broad categories of biomedical named entities for annotation. A pilot annotation phase in which a small portion of the corpus was double-annotated by four annotators was followed by a main phase in which double annotation was carried out by six annotators, and a reconciliation phase in which all annotations were reconciled by an expert. We conducted the annotation in two modes, manual and assisted, to assess the effect of automatic pre-annotation and calculated inter-annotator agreement. We obtained moderate inter-annotator agreement; assisted annotation yielded slightly better agreement and fewer missed annotations than manual annotation. Due to complex nature of biomedical entities, we paid particular attention to nested entities for which we obtained slightly lower inter-annotator agreement, confirming that annotating nested entities is somewhat more challenging. To our knowledge, the corpus is the first of its kind for consumer health text and is publicly available.

pdf bib
Annotating Logical Forms for EHR Questions
Kirk Roberts | Dina Demner-Fushman
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper discusses the creation of a semantically annotated corpus of questions about patient data in electronic health records (EHRs). The goal is provide the training data necessary for semantic parsers to automatically convert EHR questions into a structured query. A layered annotation strategy is used which mirrors a typical natural language processing (NLP) pipeline. First, questions are syntactically analyzed to identify multi-part questions. Second, medical concepts are recognized and normalized to a clinical ontology. Finally, logical forms are created using a lambda calculus representation. We use a corpus of 446 questions asking for patient-specific information. From these, 468 specific questions are found containing 259 unique medical concepts and requiring 53 unique predicates to represent the logical forms. We further present detailed characteristics of the corpus, including inter-annotator agreement results, and describe the challenges automatic NLP systems will face on this task.

pdf bib
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

pdf bib
Assessing the Corpus Size vs. Similarity Trade-off for Word Embeddings in Clinical NLP
Kirk Roberts
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

The proliferation of deep learning methods in natural language processing (NLP) and the large amounts of data they often require stands in stark contrast to the relatively data-poor clinical NLP domain. In particular, large text corpora are necessary to build high-quality word embeddings, yet often large corpora that are suitably representative of the target clinical data are unavailable. This forces a choice between building embeddings from small clinical corpora and less representative, larger corpora. This paper explores this trade-off, as well as intermediate compromise solutions. Two standard clinical NLP tasks (the i2b2 2010 concept and assertion tasks) are evaluated with commonly used deep learning models (recurrent neural networks and convolutional neural networks) using a set of six corpora ranging from the target i2b2 data to large open-domain datasets. While combinations of corpora are generally found to work best, the single-best corpus is generally task-dependent.

2014

pdf bib
Decomposing Consumer Health Questions
Kirk Roberts | Halil Kilicoglu | Marcelo Fiszman | Dina Demner-Fushman
Proceedings of BioNLP 2014

pdf bib
Structuring Operative Notes using Active Learning
Kirk Roberts | Sanda Harabagiu | Michael Skinner
Proceedings of BioNLP 2014

pdf bib
Annotating Question Decomposition on Complex Medical Questions
Kirk Roberts | Kate Masterton | Marcelo Fiszman | Halil Kilicoglu | Dina Demner-Fushman
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a method for annotating question decomposition on complex medical questions. The annotations cover multiple syntactic ways that questions can be decomposed, including separating independent clauses as well as recognizing coordinations and exemplifications. We annotate a corpus of 1,467 multi-sentence consumer health questions about genetic and rare diseases. Furthermore, we label two additional medical-specific annotations: (1) background sentences are annotated with a number of medical categories such as symptoms, treatments, and family history, and (2) the central focus of the complex question (a disease) is marked. We present simple baseline results for automatic classification of these annotations, demonstrating the challenging but important nature of this task.

2013

pdf bib
Recognizing Spatial Containment Relations between Event Mentions
Kirk Roberts | Michael A. Skinner | Sanda M. Harabagiu
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

2012

pdf bib
UTD-SpRL: A Joint Approach to Spatial Role Labeling
Kirk Roberts | Sanda Harabagiu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
UTDHLT: COPACETIC System for Choosing Plausible Alternatives
Travis Goodwin | Bryan Rink | Kirk Roberts | Sanda Harabagiu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
EmpaTweet: Annotating and Detecting Emotions on Twitter
Kirk Roberts | Michael A. Roach | Joseph Johnson | Josh Guthrie | Sanda M. Harabagiu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The rise of micro-blogging in recent years has resulted in significant access to emotion-laden text. Unlike emotion expressed in other textual sources (e.g., blogs, quotes in newswire, email, product reviews, or even clinical text), micro-blogs differ by (1) placing a strict limit on length, resulting radically in new forms of emotional expression, and (2) encouraging users to express their daily thoughts in real-time, often resulting in far more emotion statements than might normally occur. In this paper, we introduce a corpus collected from Twitter with annotated micro-blog posts (or “tweets”) annotated at the tweet-level with seven emotions: ANGER, DISGUST, FEAR, JOY, LOVE, SADNESS, and SURPRISE. We analyze how emotions are distributed in the data we annotated and compare it to the distributions in other emotion-annotated corpora. We also used the annotated corpus to train a classifier that automatically discovers the emotions in tweets. In addition, we present an analysis of the linguistic style used for expressing emotions our corpus. We hope that these observations will lead to the design of novel emotion detection techniques that account for linguistic style and psycholinguistic theories.

pdf bib
Annotating Spatial Containment Relations Between Events
Kirk Roberts | Travis Goodwin | Sanda M. Harabagiu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

A significant amount of spatial information in textual documents is hidden within the relationship between events. While humans have an intuitive understanding of these relationships that allow us to recover an object's or event's location, currently no annotated data exists to allow automatic discovery of spatial containment relations between events. We present our process for building such a corpus of manually annotated spatial relations between events. Events form complex predicate-argument structures that model the participants in the event, their roles, as well as the temporal and spatial grounding. In addition, events are not presented in isolation in text; there are explicit and implicit interactions between events that often participate in event structures. In this paper, we focus on five spatial containment relations that may exist between events: (1) SAME, (2) CONTAINS, (3) OVERLAPS, (4) NEAR, and (5) DIFFERENT. Using the transitive closure across these spatial relations, the implicit location of many events and their participants can be discovered. We discuss our annotation schema for spatial containment relations, placing it within the pre-existing theories of spatial representation. We also discuss our annotation guidelines for maintaining annotation quality as well as our process for augmenting SpatialML with spatial containment relations between events. Additionally, we outline some baseline experiments to evaluate the feasibility of developing supervised systems based on this corpus. These results indicate that although the task is challenging, automated methods are capable of discovering spatial containment relations between events.

2011

pdf bib
Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
Kirk Roberts | Sanda Harabagiu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
UTDMet: Combining WordNet and Corpus Data for Argument Coercion Detection
Kirk Roberts | Sanda Harabagiu
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
C-3: Coherence and Coreference Corpus
Cristina Nicolae | Gabriel Nicolae | Kirk Roberts
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The phenomenon of coreference, covering entities, their mentions and their properties, is intricately linked to the phenomenon of coherence, covering the structure of rhetorical relations in a discourse. A text corpus that has both phenomena annotated can be used to test hypotheses about their interrelation or to detect other phenomena. We present the process by which C-3, a new corpus, was obtained by annotating the Discourse GraphBank coherence corpus with entity and mention information. The annotation followed a set of ACE guidelines adapted to favor coreference and to include entities of unknown types in the annotation. Together with the corpus we offer a new annotation tool specifically designed to annotate entity and mention information within a simple and functional graphical interface that combines the “best of all worlds” from available annotation tools. The potential usefulness of C-3 is discussed, as well as an application in which the corpus proved to be a valuable resource.

pdf bib
A Linguistic Resource for Semantic Parsing of Motion Events
Kirk Roberts | Srikanth Gullapalli | Cosmin Adrian Bejan | Sanda Harabagiu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a corpus of annotated motion events and their event structure. We consider motion events triggered by a set of motion evoking words and contemplate both literal and figurative interpretations of them. Figurative motion events are extracted into the same event structure but are marked as figurative in the corpus. To represent the event structure of motion, we use the FrameNet annotation standard, which encodes motion in over 70 frames. In order to acquire a diverse set of texts that are different from FrameNet's, we crawled blog and news feeds for five different domains: sports, newswire, finance, military, and gossip. We then annotated these documents with an automatic FrameNet parser. Its output was manually corrected to account for missing and incorrect frames as well as missing and incorrect frame elements. The corpus, UTD-MotionEvent, may act as a resource for semantic parsing, detection of figurative language, spatial reasoning, and other tasks.

2009

pdf bib
Building an Annotated Textual Inference Corpus for Motion and Space
Kirk Roberts
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

2008

pdf bib
Scaling Answer Type Detection to Large Hierarchies
Kirk Roberts | Andrew Hickl
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes the creation of a state-of-the-art answer type detection system capable of recognizing more than 200 different expected answer types with greater than 85% precision and recall. After describing how we constructed a new, multi-tiered answer type hierarchy from the set of entity types recognized by Language Computer Corporation’s CICEROLITE named entity recognition system, we describe how we used this hierarchy to annotate a new corpus of more than 10,000 English factoid questions. We show how an answer type detection system trained on this corpus can be used to enhance the accuracy of a state-of-the-art question-answering system (Hickl et al., 2007; Hickl et al., 2006b) by more than 7% overall.