Call for Participation: AmericasNLP 2024 Shared Tasks (Yes, Two of Them!)

Event Notification Type: 
Call for Participation
Abbreviated Title: 
AmericasNLP2024 ST
Location: 
NAACL 2024
Friday, 21 June 2024
State: 
Mexico City
Country: 
Mexico
City: 
Mexico City
Contact: 
Manuel Mager
Luis Chiruzzo
Submission Deadline: 
Wednesday, 10 April 2024

Call for Participation: AmericasNLP 2024 Shared Tasks (Yes, Two of Them!)

The AmericasNLP 2024 organizers are excited to announce that, in 2024, the workshop will feature two independent shared tasks on Indigenous languages! Please find the two calls for participation below.

#### AmericasNLP 2024 Shared Task 1: Machine Translation Systems for Indigenous Languages ####

First Call for Participation

The AmericasNLP 2024 Shared Task on machine translation systems for Indigenous languages is a competition aimed at encouraging the development of machine translation (MT) systems for Indigenous languages of the Americas. Participants will build systems that translate between Spanish and an Indigenous language. Systems submitted to the shared task will be presented at the Fourth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP) in June 2024, which will be co-located with the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2024) and held in Mexico City.

Why?
Many of the Indigenous languages of the Americas are so-called low-resource languages: parallel data with other languages as needed to train MT systems is limited. This means that many approaches designed for translating between high-resource languages, such as English and Chinese, are not directly applicable or perform poorly. Additionally, many Indigenous languages exhibit linguistic properties uncommon among languages frequently studied in natural language processing (NLP). For instance, many are polysynthetic. This constitutes an additional difficulty. The goal of the AmericasNLP 2024 shared task on machine translation systems for Indigenous languages is to motivate researchers to take on the challenge of developing MT systems for Indigenous languages.

How?
AmericasNLP invites the submission of MT results obtained by systems built for Indigenous languages. Participants can use the training and development data we provide, but there are no limits on what participants can use. If participants want to translate additional data to improve their systems, that's great! If they want to use pretrained models, that's great, too! The only limitation is that we ask participants to not have the test input translated by hand or train on the development or test sets.
The main metric of the shared task is ChrF++ (Popović, 2017). Participants can enter the competition with as many language pairs as they like, and systems for every language pair will be evaluated separately, in addition to the overall average score, which will be used to determine the shared task’s winner. We provide an evaluation script and a baseline MT system to help participants get started quickly. If you are interested in this shared task, please register here: Google form (https://forms.gle/bvSP3BUJj9YSDU5T6).

Which languages?
The following languages are featured in the AmericasNLP 2024 shared task on machine translation systems for Indigenous languages (AmericasNLP 2024 Shared Task 1):
Asháninka–Spanish
Aymara–Spanish
Bribri–Spanish
Chatino–Spanish
Guarani–Spanish
Hñähñu–Spanish
Nahuatl–Spanish
Quechua–Spanish
Rarámuri–Spanish
Shipibo-Konibo–Spanish
Wixarika–Spanish
All data and baseline systems will be made available in this GitHub repository (https://github.com/AmericasNLP/americasnlp2024/tree/master/ST1_MachineTr...).

Important Dates:
Release of pilot data: January 29, 2024
Release of training and development sets: February 5, 2024
Release of baseline systems and baseline results: February 12, 2024
Release of test inputs: April 1, 2024
Submission of results (shared task deadline): April 10, 2024
Announcement of winners: April 12, 2024
Submission of system descriptions papers: April 19, 2024
Notification of acceptance: April 22, 2024
Camera-ready papers due: April 26, 2024
All deadlines are 11:59 pm UTC -12h (AoE).

Organizers
Abteen Ebrahimi, Arturo Oncevay, Pavel Denisov, Robert Pugh, Ona de Gibert Bonet, Raúl Vázquez, Manuel Mager, Rolando Coto-Solano, Katharina von der Wense, Shruti Rijhwani

Contact: americas.nlp.workshop [at] gmail.com
Website: https://turing.iimas.unam.mx/americasnlp/2024_st_1.html

#### AmericasNLP 2024 Shared Task 2: Creation of Educational Materials for Indigenous Languages ####

First Call for Participation

The AmericasNLP 2024 shared task on the creation of educational materials for Indigenous languages is a competition aimed at encouraging the development of natural language processing systems (NLP) to help with the teaching and diffusion of Indigenous languages of the Americas. Participants will build systems that can automatically create exercises by converting a base sentence into another sentence that’s changed with regards to one specific property (such as negation or tense). Systems submitted to the shared task will be presented at the Fourth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP) in June 2024, which will be co-located with the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2024) and held in Mexico City.

Why?
Many of the Indigenous languages of the Americas are vulnerable or endangered. This means that, depending on the language, no or only a few children are learning them and, generally, they are only spoken by a few small groups of people. Because of this, these languages are at a high risk of becoming extinct in the near future. Many communities are carrying out revitalization efforts, including teaching their languages to their community members. Creating materials to teach these languages is an urgent priority, but this process is expensive and time consuming. NLP presents an opportunity to help with these efforts.
In addition to being endangered, most Indigenous languages of the Americas are so-called low-resource languages: the data needed to train any NLP systems, let alone deep learning-based systems, is severely limited. This means that many approaches used for high-resource languages, such as English and Chinese, are not directly applicable or perform poorly. Finally, many Indigenous languages exhibit linguistic properties uncommon among languages frequently studied in NLP. This constitutes an additional difficulty. The goal of AmericasNLP is to motivate researchers to take on the challenge of developing systems for these Indigenous languages.

How?
AmericasNLP invites the submission of results obtained by systems built for the creation of educational materials for Indigenous languages. Participants can use the training and development data we provide and there are no limits on what additional resources participants may use. If participants want to leverage additional data to improve their systems, that's great! If they want to use pretrained models, that's great, too! The only limitation is that we ask participants to not create the test outputs manually or train on the development or test sets.

In this shared task, participants will be given a dataset with base sentences. The dataset will also contain an indication of the change we expect systems to make to each base sentence. Systems will transform the base sentence into a target sentence according to the indicated change.

Base sentence: Ye' shka' (Bribri for "I walked")
Expected change: Polarity: Negative
Target sentence: Ye' kë̀ shkàne̠ (Bribri for “I didn't walk")

The main metric of the shared task is accuracy. Participants can enter the competition for as many languages as they like, and systems for every language will be evaluated separately, in addition to the overall average score, which will be used to determine the shared task’s winner. We provide an evaluation script and a baseline system to help participants get started quickly. If you are interested in this shared task, please register here: Google form (https://forms.gle/RQztkDM7ddziM6eP7)

Which languages?
The following languages are featured in the AmericasNLP 2024 shared task on the creation of educational materials for Indigenous languages (AmericasNLP 2024 Shared Task 2):
Bribri from Costa Rica
Guarani from Paraguay
Maya from Mexico
All data and baseline systems will be made available in this GitHub repository (https://github.com/AmericasNLP/americasnlp2024/tree/master/ST2_Education...).

Important Dates:
Release of pilot data: January 29, 2024
Release of training and development sets: February 5, 2024
Release of baseline systems and baseline results: February 12, 2024
Release of test inputs: April 1, 2024
Submission of results (shared task deadline): April 10, 2024
Announcement of winners: April 12, 2024
Submission of system descriptions papers: April 19, 2024
Notification of acceptance: April 22, 2024
Camera-ready papers due: April 26, 2024
All deadlines are 11:59 pm UTC -12h (AoE).

Organizers
Manuel Mager, Pavel Denisov, Silvia Fernandez Sabido, Samuel Canul Yah, Alejandro Molina-Villegas, Lorena Hau Ucán, Arturo Oncevay, Rolando Coto-Solano, Luis Chiruzzo, Marvin Agüero-Torales, Aldo Alvarez, Katharina von der Wense

Contact: americas.nlp.workshop [at] gmail.com
Website: https://turing.iimas.unam.mx/americasnlp/2024_st_2.html