1st Open Language Data Initiative shared task at WMT24

Event Notification Type: 
Call for Papers
Abbreviated Title: 
1st OLDI shared task
Tuesday, 12 November 2024 to Wednesday, 13 November 2024
Submission Deadline: 
Tuesday, 20 August 2024

We are excited to announce the 1st edition of the Open Language Data Initiative shared task at WMT24, co-located with EMNLP 2024.

The Open Language Data Initiative (OLDI, https://oldi.org) empowers language communities around the globe to contribute to a database that drives the foundation of today’s machine translation and natural language processing work.

**Task description**

The goal of this shared task is to expand OLDI’s open datasets to more languages. In this first iteration, we are soliciting contributions to three different tracks:

* Track 1: The addition of new languages, varieties or dialects to the MT evaluation dataset FLORES+ <https://github.com/openlanguagedata/flores>, or substantial improvements to existing data.
* Track 2: The addition of new languages, varieties or dialects to the MT Seed dataset <https://github.com/openlanguagedata/seed>, or substantial improvements to existing data.
* Track 3: The contribution of high-quality, human-verified monolingual text data in under-resourced languages.

Participants will be asked to submit a dataset card and a systems description paper to WMT24, carefully describing the data collection and quality assurance process. Additionally, participants are strongly encouraged to provide experimental validation of the quality of the data they are submitting. Please see <https://oldi.org/guidelines> for the full contribution guidelines.

**Important Dates**

* Indication of interest (recommended): 20th May 2024
* Paper and dataset submission deadline: 20th August 2024 (follows WMT/EMNLP)
* Notification of acceptance: 20th September 2024 (follows WMT/EMNLP)
* Conference: 12-13 November 2024 (follows WMT/EMNLP)


* Antonios Anastasopoulos, George Mason University
* Laurie Burchell, University of Edinburgh
* Christian Federmann, Microsoft
* Jean Maillard, FAIR, Meta
* Philipp Koehn, Johns Hopkins University
* Skyler Wang, UC Berkeley

**More information**

For more information, please refer to the following pages:

* <https://www2.statmt.org/wmt24/open-data.html>
* <https://oldi.org/>