Call for Shared Task Pariticipation on Machine Translation in Code-Switching Environments

Event Notification Type: 
Call for Participation
Abbreviated Title: 
Shared Tas on MT for Code-Switching Data
Location: 
NAACL 2021
Contact: 
Shuguang Chen
Anirudh Srinivasan
Mona Diab
Sunayana Sitaram
Thamar Solorio
Submission Deadline: 
Thursday, 1 April 2021

Shared Tasks on Machine Translation in Code-Switching Settings

In the past few years we have organized a series of shared tasks focusing primarily on enabling technology for code-switching, including language identification, part of speech tagging and named entity recognition. This year we are organizing a series of shared tasks involving machine translation for code-switching settings in multiple language combinations and directions.

Task 1. Supervised Setting: MT for English → Hinglish

In this task we provide gold standard data to train and evaluate MT models to take English as input and generate Hinglish data.

Task 2. Unsupervised Setting: MT for multiple language combinations

We provide raw data with no gold label translations. Participants are challenged to work on systems that can generate high quality translations in the pairs shown below. More language directions may be added soon:

Spanish-English → English
Spanish-English → Spanish
English → Spanish-English
Spanish → Spanish-English
Spanish → English
English → Spanish
Modern Standard Arabic-Egyptian Arabic → English
Modern Standard Arabic-Egyptian Arabic → Spanish

Evaluation

The leaderboard will rank systems based on BLUE scores. We also plan to do a smaller, human evaluation that will be presented at the workshop.

Datasets

To access the data sets go here: https://ritual.uh.edu/lince/datasets

Timeline

Shared Task training data release: Feb 26th
Shared Task test phase: April 1-7th
Shared Task System description papers due: April 15th
Shared Task reviews back to authors: April 22nd
Camera ready papers due: April 29th
Workshop date: June 11th