CALL FOR PARTICIPATION
We are happy to announce the CoNLL-SIGMORPHON 2017 Shared Task:
Universal Morphological Reinflection
Website: http://sigmorphon.org/conll2017
Email contact: conll-sigmorphon-2017@googlegroups.com
Morphologically rich languages are the norm among the languages of the world. Indeed, the linguistic typology database WALS shows that 80% of the world's languages mark verb tense through morphology, while 65% mark grammatical case. Yet, in the computational literature, morphology has received less attention than many other aspects of language.
This year, we are hosting the first CoNLL shared task on the learning of morphology from labeled data. The specific task will be morphological reinflection — producing previously unseen inflected forms of words given exposure to other such inflections. The shared task will be conducted in over 40 languages of varying typological characteristics. Participants in the shared task will build systems that can learn to solve reinflection problems. All submitted systems will be compared on a held-out test set.
The data format used in the task is a simple utf-8 encoded text format. The nature of the task does not presuppose any approach that is based on knowledge of morphological processes; the task can be addressed by developing string-to-string transformation algorithms that learn from examples.
We feature two tracks. The first track requires morphological generation given sparse training data, something that can be practically useful for MT and other downstream tasks in NLP. Here, participants are given varying amounts of individual unorganized labeled inflected forms of lemmas, and are asked to produce other, previously unseen inflected forms. The training data is sparse in the sense that each lemma is linked to only a few observed inflected forms.
The second track attempts to mimic generalization from basic resources that might be available to a human second-language learner — for example, limited access to a native speaker informant or a short reference grammar. In this track, participants are provided with a few full inflectional paradigms such as can be found in grammars or elicited from an informant, and are asked to produce full paradigms from partially filled paradigms.
Due to recent interest in computational natural language learning in low-resource scenarios, both tracks will also feature subtasks where varying amounts of training data is available to learn from. All data in the task will be consistently annotated with the UniMorph Schema (http://unimorph.org/).
The task is open to everyone. The organizers trust that participants will avoid unfair use of language-specific knowledge they may possess of the languages in the shared task, and that submission results will be the output of a reasonably language-agnostic learning algorithm. The chair, who does not participate in a team, will adjudicate any disputes that may arise regarding proper submission methodology.
Timeline:
Feb 24 ... Shared task web page open. Registration for the Shared Task open.
Feb 24 ... Trial data for a few languages will be available.
Mar 1 ... Training and development data released.
May 10 ... Surprise languages will be announced and small sample data released.
May 20-25 ... Test phase. Participants need to register before test submissions.
May 27 ... Results will be announced.
May 30 ... Submission of system description papers.
Jun 2 ... Reviews due.
Jun 9 ... Final papers due.
Aug 3 – 4 ... CoNLL 2017, Vancouver, Canada.
Organization:
Ryan Cotterell, Johns Hopkins University
John Sylak-Glassman, Johns Hopkins University
Christo Kirov, Johns Hopkins University
Géraldine Walther, University of Zurich
Ekaterina Vylomova, University of Melbourne
Patrick Xia, Johns Hopkins University
Manaal Faruqui, Google Research
Sandra Kübler, Indiana University
David Yarowsky, Johns Hopkins University
Jason Eisner, Johns Hopkins University
Mans Hulden (chair), University of Colorado
See http://sigmorphon.org/conll2017 for more information.