BioNLP Workshop
IMPORTANT DATES
- Workshop: Collocated with ACL 2026 that will be held in San Diego, CA, from July 2 - 7, 2026. More information will be announced soon.
WORKSHOP TOPIC AND CONTEXT
The interest in biomedical and clinical language continues to broaden due to unprecedented advances supported by success stories in improving health through supporting patients and clinicians. Access to biomedical information became easier, and more people generate and access health-related text. Only language technologies can enable and support adequate use of the biomedical and clinical text in most use cases. The advances in pre-trained language models and foundation models make all parties involved in healthcare turn to language technologies in the hope of getting tangible support in satisfying information needs, facilitating research and improving clinical documentation and healthcare. In addition to exposing BioNLP researchers to the mainstream ACL research, the workshop is a venue for informing the mainstream ACL researchers about the fast growing and important domain of biomedical / clinical language processing.
BioNLP 2026 will focus on evaluation frameworks and metrics that reflect the needs of health-related use cases and provide a good estimate of reliability of the proposed solutions. BioNLP 2026 will continue focusing on transparency of the generative approaches and factuality of the generated text. Language processing that supports DEIA (Diversity, Equity, Inclusion and Accessibility) continues to be of utmost importance. The work on detection and mitigation of bias and misinformation continues to be paramount. Research in languages other than English, particularly, under-represented languages, and health disparities are always of interest to BioNLP. Other areas of interest include, but are not limited to:
- Extraction of complex relations and events;
- Discourse analysis; Anaphora & coreference resolution;
- Question Answering; Summarization; Text simplification;
- Resources and strategies for system testing and evaluation;
- Synthetic data generation & data augmentation;
- Translating NLP research into practice: tangible explainable results of biomedical language processing applications;
- Reproducibility of the published findings.
SHARED TASKS
BioNLP has a long-standing tradition of sponsoring Shared Tasks. This year, we invited SIGBioMed members to submit a description of a shared task to be included with the BioNLP proposal. We received four strong detailed descriptions of the tasks, which were reviewed by the workshop organizers. These well-defined and timely tasks are briefly described below.
MedExACT
This task involves detection and labeling of medical decisions in ICU discharge summaries, with evaluation metrics emphasizing both accuracy and fairness across demographic and disease subgroups at the span and token levels, as well as through stratified analyses to measure robustness against biases in sex, race, English proficiency, and disease type. Baseline models such as RoBERTa indicated the complexity of the task, and participants will be supported with expedited access to MedDec through PhysioNet, a public leaderboard, and a starter kit in Python. The training and validation splits of MedDec are currently available on PhysioNet, while the test split has not been released and will remain withheld until the evaluation phase.
Detecting Psychological Defense Mechanisms in Conversations
This task is the first benchmark to capture psychologically grounded constructs (defense mechanisms) in real conversational data. It focuses on fine-grained coping strategies in dialogue, supporting research at the intersection of clinical psychology, counseling, and NLP. Given the dialogue history and a target seeker utterance, participants will (Task A) predict whether any defense is present and (Task B) classify the specific type (e.g., Disavowal, Obsessional, Highly Adaptive). The organizers provide an expert-annotated dataset, train/dev/test splits, baselines, and a starter kit. Submissions will be evaluated on a held-out test set using F1 for Task A and Macro-F1 for Task B.
BioGen
The task focuses on grounding answers with reference attribution to mitigate generation of false statements by LLMs when answering biomedical questions. BioGen 2026 introduces generation of multimodal answers from textual and visual sources with citations, leveraging PubMed and HealthVidQA as multimodal sources. The test set is based on the information requests submitted by self-identified non-clinicians to the MedlinePlus service provided by the National Library of Medicine. The evaluation will leverage BioACE, an automated metric that strongly correlates with human evaluation on the BioGen 2024 textual dataset.
Clinical Skill QA
This task extends evaluation to a multimodal setting. Given an image of a medical student’s procedure, a question, and four answer options, the goal is for participants to train a model to generate the correct response. The dataset will be constructed from ~80 video clips of medical student clinical procedures, collected from a partner medical school. This task provides a unified framework for benchmarking, diagnosing, and advancing LLM capabilities for both clinical decision support and medical training. Evaluation will follow a multiple-choice QA setup with accuracy as the primary metric, with additional stratified analyses by skill type and modality.