BioNLP Workshop

SIGBIOMED | BioNLP 2025

IMPORTANT DATES

(All submission deadlines are 11:59 p.m. UTC-12:00 “anywhere on Earth”)

Paper submission deadline: ~~April 17 (Friday)~~ April 20, Monday, 2026
Notification of acceptance: May 8 (Friday), 2026
Camera-ready paper due: May 19 (Tuesday), 2026
Pre-recorded video due (hard deadline): June 4, 2026
Workshop: July 3 - 4, 2026, Collocated with ACL 2026 in San Diego, CA

Schedule

Invited Talk

Annika Marie Schoene, PhD, Assistant Professor, Bouvé College of Health Sciences

AI Safety in Healthcare : Ethical and Technical Considerations

Abstract: The rapid integration of artificial intelligence into healthcare, spanning ambient documentation, virtual nursing, and medical imaging, has outpaced the development of robust oversight mechanisms. While the global AI-in-healthcare market is projected to exceed $868 billion by 2030, incidents of algorithmic bias and unequal care delivery have exposed significant gaps between responsible AI principles and clinical implementation. This talk examines AI safety as a foundational pillar of responsible AI in health contexts, arguing that fairness, transparency, accountability, and human autonomy cannot remain aspirational without corresponding technical and governance infrastructure. I survey existing frameworks and identify a critical missing layer of concrete, operationalizable evaluation methods that bridge ethical principles and technical practice. To address this gap, I present work toward an actionable framework for responsible AI integration in clinical settings, grounded in the AI Ethics Box, a structured taxonomy derived from biomedical ethics. The framework maps ethical domains to specific technical tests, supporting both pre- and post-deployment evaluation. I also describe an ongoing co-design process with clinical and community stakeholders to ensure real-world feasibility. I conclude that AI in healthcare demands genuinely interdisciplinary research to build tools that improve health outcomes without compromising patient safety.

Biography: Annika Marie Schoene, PhD is a computer scientist and researcher in AI safety, working on the evaluation, robustness, and security of large-scale AI systems, including large language models. She develops technical methods and evaluation frameworks to identify and mitigate high-risk behaviors such as jailbreaks, harmful outputs, and unsafe system behavior, with the goal of enabling the safe and trustworthy deployment of AI in public health, health systems, and healthcare settings. She is currently an Assistant Professor in the Department of Public Health and Health Sciences and Technical Lead for the Responsible AI Practice at Northeastern University. Beyond her core research, she works across academic, industry, public-sector and not-for-profit settings to generate evidence that supports non-technical stakeholders and policymakers in making informed decisions that reduce algorithmic harm and inform organizational decision-making. She is also a Visiting Scientist at MaineHealth and the University of Southampton (UK), and a Faculty Fellow at the Institute for Social Justice and Healthy Equity, and serve as a Scientific Expert Advisor at Meta on AI safety. She holds a PhD in Computer Science from the University of Hull (UK). During her doctoral training, she conducted research on machine and deep learning methods for analyzing complex real-world text and interned at IBM Research (UK), continuing this collaboration throughout her PhD and postdoctoral work. She completed her postdoctoral training at the University of Manchester’s National Centre for Text Mining (NaCTeM), where she worked on natural language processing in health-related contexts. Prior to her current role, she was a Research Scientist at the Institute for Experiential AI (EAI), working primarily on research involving the development and evaluation of AI methods in applied health contexts.

Program Committee

Abdulrahman AAl Abdulsalam, Sultan Qaboos University, Oman
Sophia Ananiadou, National Centre for Text Mining and University of Manchester, UK
Rohit Agarwal, UiT The Arctic University of Norway
Aizierjiang Aiersilan, The George Washington University, USA
Ebrahim Alharbi The University of Sheffield, UK
Daniel Andrade, Hiroshima University, Japan
Eiji Aramaki, University of Tokyo, Japan
Niloofar Arazkhani, University of Pittsburgh, USA
Steven Au, University of California, Santa Cruz, USA
Davis Bartels, National Library of Medicine, USA
Hadas Ben Atya, Technion -- Israel Institute of Technology, Haifa, Israel
Krushil Bhojani, SUNY Polytechnic Institute, USA
Madeline Bittner, National Library of Medicine, USA
Leandra Budau, Toronto Metropolitan University, Canada
Ioana Buhnila, Center for Data Science in Humanities, Chosun University, South Korea
Leonardo Campillos Llanos, Universidad Autonoma de Madrid, Spain
Liuliu Chen, University of Melbourne, Australia
Brandon Colelough, National Library of Medicine, USA
Brian Connolly, Cincinnati Children's Hospital Medical Center , USA
An Dao, The University of Tokyo, Japan
Dina Demner-Fushman, National Library of Medicine, USA
Oumaima El Khettarim Aix-Marseille University, France
Mohamed Elmofty, Humboldt University of Berlin, Germany
Pietro Ferrazzi, Fondazione Bruno Kessler - University of Padova, Italy
Kathleen C. Fraser, National Research Council Canada
Natalia Grabar, CNRS, France
Cyril Grouin, LIMSI - CNRS, France
Deepak Gupta, US National Library of Medicine
Thierry Hamon, LIMSI-CNRS, France
Yikun Han, University of Illinois, Urbana Champaign, USA
Keno Hanken, Independent Researcher, EU
Moustafa Hassan, Qatar University
Sam Henry, Christopher Newport University, USA
Ben Holgate, King's College London
Bahar Ilgen, Robert Koch Institute, Germany
Antonio Jimeno Yepes, RMIT, Australia
Vani Kanjirangat, IDSIA, Switzerland
Sarvnaz Karimi, CSIRO, Australia
Nazmul Kazi, Montana State University, USA
Halil Kilicoglu, University of Illinois at Urbana-Champaign
Won Gyu Kim, National Library of Medicine, USA
Ashwin Kirubakaran, Edison Academy Magnet School, New Jersey, USA
Gaurav Kumar, University of California San Diego USA
Thomas Labbe, Orange Research, USA
Andre Lamurias, University of Lisbon, Portugal
Vojtech Lanz, Charles University, Czech Republic
Majid Latifi, University of York, York, UK
Alberto Lavelli, FBK-ICT, Italy
Robert Leaman, US National Library of Medicine
Lung-Hao Lee, National Central University, Taiwan
Ulf Leser, Humboldt Universitat zu Berlin, Germany
Yuan Liang, Queen Mary University of London
Siting Liang, German Research Center for Artificial Intelligence
Livia Lilli, Fondazione Policlinico Universitario Agostino Gemelli IRCCS and Catholic University of the Sacred Heart, Rome, Italy
Ying-Jia Lin, Chang Gung University, Taiwan
Jinghui Liu, CSIRO, Australia
Fabien Maury, Inserm, Universite Paris Cite, France
Makoto Miwa, Toyota Technological Institute, Japan
Rodrigo Morales-Sanchez, Universidad Nacional de Educacion a Distancia (UNED), Spain
Hyeonyeong Nam, Korea University, South Korea
Claire Nedellec, INRA, France
Guenter Neumann, DFKI, Saarland, Germany
Aurelie Neveol, LIMSI - CNRS, France
Brian Ondov, Yale University, USA
John E. Ortega, Northeastern University, Massachusetts, USA
Olga Pelloni, UiT The Arctic University of Norway
Noon Pokaratsiri Goldstein, DFKI, Germany
Juan Prieto, Universidad de Los Andes, Bogota, Colombia
Francois Remy, Ghent University, Belgium
Francisco J. Ribadas-Pena, University of Vigo, Spain
Fabio Rinaldi, IDSIA, Switzerland
Kirk Roberts, UTHealth, Houston, Texas, USA
Roland Roller, DFKI GmbH, Berlin, Germany
Mahule Roy, University of Oxford, UK
Nourah Salem, University of Colorado, USA
Vicente Ivan Sanchez Carmona, Ricoh Software Research Center (Beijing) Co., Ltd
Mustafa Sikder, U.S. Food and Drug Administration
Saurabh Singh, Oracle Health, USA
Ujjwal Singh, Max Healthcare Institute Limited, India
Sarvesh Soni, National Library of Medicine, USA
Adam Sutton, King's College London, UK
Mario Sanger, Humboldt Universitat zu Berlin, Germany
Sanya Taneja, Johnson and Johnson Innovative Medicine, USA
Karin Verspoor, RMIT University, Australia
Xing David Wang, Humboldt University of Berlin, Germany
Nathan M. White, James Cook University, Australia
Dongfang Xu, Cedars-Sinai, USA.
Ken Yano, The National Institute of Advanced Industrial Science and Technology (AIST), Japan
Hyunwoo Yoo, Drexel University, USA
Xiao Yu Cindy Zhang, University of British Columbia, Canada
Jingqing Zhang, Pangaea Data Limited, UK, USA
Angelo Ziletti, Bayer AG, Germany
Pierre Zweigenbaum, LIMSI - CNRS, France
Secondary Reviewers:
Wei-Chun Chen, Chang Gung University, Taiwan
Joseph Cornelius, Dalle Molle Institute for Artificial Intelligence Research -- IDSIA USI-SUPSI, Switzerland
Samuele Garda, Humboldt University of Berlin, Germany
Athlene Jones, University of North Florida, USA
Sangwoon LeeKorea University, South Korea
Hima Bindu Nandyala, SUNY Polytechnic Institute, USA
Oguz Serbetci, Humboldt University of Berlin, Germany
Vishwaa Shah, University of North Florida, USA
Mayank Nilesh Waghmare, SUNY Polytechnic Institute, USA
Zhangfei Yang, The George Washington University, USA

SUBMISSION INSTRUCTIONS

Two types of submissions are invited: full papers and short papers.

Full papers should not exceed eight (8) pages of text, plus unlimited references. THE FINAL VERSIONS FOLLOW ACL RULES on +1 page. These are intended to be reports of original research. BioNLP aims to be the forum for interesting, innovative, and promising work involving biomedicine and language technology, whether or not yielding high performance at the moment. This by no means precludes our interest in and preference for mature results, strong performance, and thorough evaluation. Both types of research and combinations thereof are encouraged.

Short papers may consist of up to four (4) pages of content, plus unlimited references. THE FINAL VERSIONS FOLLOW ACL RULES on +1 page. Appropriate short paper topics include preliminary results, application notes, descriptions of work in progress, etc.

Electronic Submission Submissions must be electronic and in PDF format, using the Softconf START conference management system Submissions need to be anonymous.

Submission site for the workshop: https://softconf.com/acl2026/bionlp2026

Submission site for the Shared Tasks: https://softconf.com/acl2026/bionlp2026-st

Please follow the ACL formatting guidelines: https://github.com/acl-org/acl-style-files

Dual submission policy: papers may NOT be submitted to the BioNLP workshop if they are or will be concurrently submitted to another meeting or publication.

WORKSHOP TOPIC AND CONTEXT

The interest in biomedical and clinical language continues to broaden due to unprecedented advances supported by success stories in improving health through supporting patients and clinicians. Access to biomedical information became easier, and more people generate and access health-related text. Only language technologies can enable and support adequate use of the biomedical and clinical text in most use cases. The advances in pre-trained language models and foundation models make all parties involved in healthcare turn to language technologies in the hope of getting tangible support in satisfying information needs, facilitating research and improving clinical documentation and healthcare. In addition to exposing BioNLP researchers to the mainstream ACL research, the workshop is a venue for informing the mainstream ACL researchers about the fast growing and important domain of biomedical / clinical language processing.

BioNLP 2026 will focus on evaluation frameworks and metrics that reflect the needs of health-related use cases and provide a good estimate of reliability of the proposed solutions. BioNLP 2026 will continue focusing on transparency of the generative approaches and factuality of the generated text. Language processing that supports DEIA (Diversity, Equity, Inclusion and Accessibility) continues to be of utmost importance. The work on detection and mitigation of bias and misinformation continues to be paramount. Research in languages other than English, particularly, under-represented languages, and health disparities are always of interest to BioNLP. Other areas of interest include, but are not limited to:

Extraction of complex relations and events;
Discourse analysis; Anaphora & coreference resolution;
Question Answering; Summarization; Text simplification;
Resources and strategies for system testing and evaluation;
Synthetic data generation & data augmentation;
Translating NLP research into practice: tangible explainable results of biomedical language processing applications;
Reproducibility of the published findings.

SHARED TASKS

BioNLP has a long-standing tradition of sponsoring Shared Tasks. This year, we invited SIGBioMed members to submit a description of a shared task to be included with the BioNLP proposal. We received four strong detailed descriptions of the tasks, which were reviewed by the workshop organizers. These well-defined and timely tasks are briefly described below.

MedExACT

This task involves detection and labeling of medical decisions in ICU discharge summaries, with evaluation metrics emphasizing both accuracy and fairness across demographic and disease subgroups at the span and token levels, as well as through stratified analyses to measure robustness against biases in sex, race, English proficiency, and disease type. Baseline models such as RoBERTa indicated the complexity of the task, and participants will be supported with expedited access to MedDec through PhysioNet, a public leaderboard, and a starter kit in Python. The training and validation splits of MedDec are currently available on PhysioNet, while the test split has not been released and will remain withheld until the evaluation phase.

Please join the google group to receive notifications and register your team https://groups.google.com/g/medexact-acl2026.
If you have any question, feel free to send an email to medexact-acl2026+owner@googlegroups.com.

PsyDefDetect

Detecting Psychological Defense Mechanisms in Conversations. This task focuses on classifying Seeker’s utterances in supportive conversations into specific Psychological Defense Levels based on the Defense Mechanism Rating Scales (DMRS) framework. The benchmark addresses the challenge of capturing subtle linguistic cues of deep-seated psychological mechanisms within highly informal and context-dependent emotional dialogues. This initiative supports research at the intersection of clinical psychology and NLP, aiming to operationalize complex psychological constructs for computational analysis. Participating systems will be ranked using Accuracy, Precision, Recall, and F1-score.

Task Homepage: https://psydefdetect-shared-task.github.io/

MedGenVidQA

The task focuses on grounding answers with reference attribution to mitigate generation of false statements by LLMs when answering biomedical questions. BioGen 2026 introduces generation of multimodal answers from textual and visual sources with citations, leveraging PubMed and HealthVidQA as multimodal sources. The test set is based on the information requests submitted by self-identified non-clinicians to the MedlinePlus service provided by the National Library of Medicine. The evaluation will leverage BioACE, an automated metric that strongly correlates with human evaluation on the BioGen 2024 textual dataset.

The CodaBench registration and submission portal for the MedGenVidQA shared task is now open.

Participants can access the test dataset and submit their system runs for evaluation through the portal.

Task A: Multimodal Retrieval (MMR) https://www.codabench.org/competitions/13989/

Task B: Multimodal Answer Generation (MAG) https://www.codabench.org/competitions/14014/

Task C: Visual Answer Localization (VAL) https://www.codabench.org/competitions/14015/

Submission Deadline: March 31, 2026

More details can be found on the shared task webpage: https://medgenvidqa.github.io/

ClinicalSkillQA

ClinSkill QA formulates clinical skill understanding and continuous perception for clinical skill assessment as an ordering task: the MLLM is required to arrange shuffled key frames into a coherent sequence of clinical actions and to provide explanations for the resulting order. The dataset is constructed from video clips of medical student clinical procedures, collected from Zhongnan Hospital of Wuhan University and Cofun. For evaluation, we use Task Accuracy (exact ordering) and Pairwise Accuracy (the fraction of adjacent pairs correctly ordered) for the ordering results, and BertScore as well as an LLM-as-judge(G-Eval) for assessing the quality of the ordering explanations. Dataset access: After joining our Google Group (https://groups.google.com/g/clinskill-qa2026), participants can download the dataset via Google Drive: https://drive.google.com/file/d/1PdaBthI3OeDEbMlxUAxKqsKWw_t6hdJ2/view?usp=drive_link

Additional information: https://whunextgen.github.io/ClinicalskillQA/

Workshop Organizers

* Dina Demner-Fushman, US National Library of Medicine
* Sophia Ananiadou, National Centre for Text Mining and University of Manchester, UK
* Kirk Roberts, UTHealth, Houston, Texas
* Jun-ichi Tsujii, National Institute of Advanced Industrial Science and Technology, Japan

BioNLP Workshop

Contents

IMPORTANT DATES

Schedule

Invited Talk

Annika Marie Schoene, PhD, Assistant Professor, Bouvé College of Health Sciences

AI Safety in Healthcare : Ethical and Technical Considerations

Program Committee

SUBMISSION INSTRUCTIONS

WORKSHOP TOPIC AND CONTEXT

SHARED TASKS

MedExACT

PsyDefDetect

MedGenVidQA

ClinicalSkillQA

Workshop Organizers

Navigation menu

BioNLP Workshop

IMPORTANT DATES

Schedule

Invited Talk

Annika Marie Schoene, PhD, Assistant Professor, Bouvé College of Health Sciences

AI Safety in Healthcare : Ethical and Technical Considerations

Program Committee

SUBMISSION INSTRUCTIONS

WORKSHOP TOPIC AND CONTEXT

SHARED TASKS

MedExACT

PsyDefDetect

MedGenVidQA

ClinicalSkillQA

Workshop Organizers

Navigation menu

Search