Shared Task: Pedagogical Ability Assessment of AI-powered Tutors @ BEA 2025

Event Notification Type: 
Call for Participation
Location: 
BEA 2025
Wednesday, 23 April 2025
Country: 
Austria
City: 
Vienna
Contact: 
Ekaterina Kochmar (MBZUAI)
Kaushal Kumar Maurya (MBZUAI)
Kseniia Petukhova (MBZUAI)
KV Aditya Srivatsa (MBZUAI)
Justin Vasselli (Nara Institute of Science and Technology)
Anaïs Tack (KU Leuven)
Submission Deadline: 
Wednesday, 9 April 2025

Conversational agents offer promising opportunities for education as they
can fulfill various roles (e.g., intelligent tutors and service-oriented
assistants) and pursue different objectives (e.g., improving student skills
and increasing instructional efficiency), among which serving as an AI
tutor is one of the most prevalent tasks. Recent advances in the
development of Large Language Models (LLMs) provide our field with
promising ways of building AI-based conversational tutors, which can
generate human-sounding dialogues on the fly. The key question posed in
previous research, however, remains: *How can we test whether
state-of-the-art generative models are good AI teachers, capable of
replying to a student in an educational dialogue?*

In this shared task, we will focus on educational dialogues between a
student and a tutor in the mathematical domain grounded in student mistakes
or confusion, where the AI tutor aims to remediate such mistakes or
confusions, with the goal of evaluating the quality of tutor responses
along the key dimensions of tutor’s ability to (1) identify student’s
mistake, (2) point to its location, (3) provide the student with relevant
pedagogical guidance, that is also (4) actionable. Dialogues used in this
shared task include the dialogue contexts from MathDial (Macina et al.,
2023) and Bridge (Wang et al., 2024) datasets, including the last utterance
from the student containing a mistake, and a set of responses to the last
student’s utterance from a range of LLM-based tutors and, where available,
human tutors, aimed at mistake remediation and annotated for their quality.

*Tracks*
This shared task will include five tracks. Participating teams are welcome
to take part in any number of tracks.
- Track 1 - Mistake Identification: Participants are invited to develop
systems to detect whether tutors' responses recognize mistakes in students'
solutions.
- Track 2 - Mistake Location: Participants are invited to develop systems
to assess whether tutors' responses accurately point to genuine mistakes
and their locations in the students' responses.
- Track 3 - Pedagogical Guidance: Participants are invited to develop
systems to evaluate whether tutors' responses offer correct and relevant
guidance, such as an explanation, elaboration, hint, examples, and so on.
- Track 4 - Actionability: Participants are invited to develop systems to
assess whether tutors' feedback is actionable, i.e., it makes it clear what
the student should do next.
- Track 5 - Guess the tutor identity: Participants are invited to develop
systems to identify which tutors the anonymized responses in the test set
originated from.

*Participant registration*
All participants should register using the following link:
https://forms.gle/fKJcdvL2kCrPcu8X6

*Important dates*
All deadlines are 11:59pm UTC-12 (anywhere on Earth).

- March 12, 2025: Development data release
- April 9, 2025: Test data release
- April 23, 2025: System submissions from teams due
- April 30, 2025: Evaluation of the results by the organizers
- May 21, 2025: System papers due
- May 28, 2025: Paper reviews returned
- June 9, 2025: Final camera-ready submissions
- July 31 and August 1, 2025: BEA 2025 workshop at ACL

*Shared task website*: https://sig-edu.org/sharedtask/2025

*Organizers*
- Ekaterina Kochmar (MBZUAI)
- Kaushal Kumar Maurya (MBZUAI)
- Kseniia Petukhova (MBZUAI)
- KV Aditya Srivatsa (MBZUAI)
- Justin Vasselli (Nara Institute of Science and Technology)
- Anaïs Tack (KU Leuven)

*Contact*: bea.sharedtask.2025@gmail.com