Call For Participation: Shared Task on Automatic Evaluation for Code-Switched Text Generation

Event Notification Type: 
Call for Participation
Location: 
NAACL
Saturday, 3 May 2025
State: 
New Mexico
Country: 
USA
City: 
Albuquerque
Contact: 
Genta Winata
Sudipta Kar
Marina Zhukova
Submission Deadline: 
Friday, 21 February 2025

Task Description

This shared task focuses on developing automatic evaluation metrics for code-switched (CS) text generation. Participants are tasked with creating systems that can accurately assess the quality of synthetically generated CS text, considering both fluency and accuracy. This is crucial because:

  • Scarcity of CS Data: CS text data is limited, making automatic generation vital for data augmentation and improving model performance.
  • Growing Demand: The need for CS text is increasing, particularly in dialogue systems and chatbots, to enable more natural and inclusive interactions.
  • Lack of Robust Evaluation: Current methods for evaluating CS text are insufficient, hindering progress in this field.

This shared task aims to address this gap and drive further research in automatic evaluation metrics for CS text generation.

Goal

The goal of this shared task is to encourage the development of robust and reliable automatic evaluation metrics for CS text generation, ultimately leading to more fluent and accurate CS language models.

Important Dates

  • Jan 23: Platform release (ready for submissions)
  • Feb 14: Test set release
  • Feb 21: Results submission
  • Feb 28 Paper submission
  • Mar 8: Acceptance notification

Languages Supported

  • Public Leaderboard: English-Hindi, English-Tamil, English-Malayalam
  • Private Leaderboard: English-Indonesian, Indonesian-Javanese, Singlish (English-Chinese)

Metric

Accuracy: Systems will be evaluated based on their accuracy in predicting human preferences for CS text. This will be measured by comparing the system's ranking of generated sentences (Sent 1 vs. Sent 2) with human annotations in the CSPref dataset.

Dataset

The CSPref dataset will be used for this task. It contains:

  • Original L1: English sentences
  • Original L2: Hindi, Tamil, or Malayalam sentences
  • Sent 1, Sent 2: Two different CS generations based on the original sentences
  • Chosen: Human annotation indicating the preferred sentence (Sent 1, Sent 2, or Tie)
  • Lang: Language pair
  • Data is available here: https://huggingface.co/datasets/garrykuwanto/cspref

Evaluation

  • Systems will be ranked on a public leaderboard based on their accuracy in predicting human preferences on the English-Hindi, English-Tamil, and English-Malayalam language pairs.
  • A private leaderboard will evaluate system performance on unseen language pairs (English-Indonesian, Indonesian-Javanese, Singlish) to assess generalization ability.
  • We will rank based on performance.

Submission

Participants will submit their system's predictions for each instance in the test set, indicating their preferred sentence (Sent 1, Sent 2, or Tie).