CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Event Notification Type: 
Call for Participation
Abbreviated Title: 
Location: 
Wednesday, 31 October 2018 to Thursday, 1 November 2018
State: 
Country: 
Belgium
City: 
Brussels
Contact: 
Daniel Zeman
Jan Hajic
Submission Deadline: 
Tuesday, 26 June 2018

We are excited to announce the second round of the CoNLL shared task in parsing Universal Dependencies!

The focus of the task is learning syntactic dependency parsers that can work in a real-world setting, starting from raw text, and that can work over many typologically different languages, even low-resource languages for which there is little or no training data, by exploiting a common syntactic annotation standard. This task has been made possible by the Universal Dependencies initiative (UD, http://universaldependencies.org/), which has developed treebanks for 60+ languages with cross-linguistically consistent annotation and recoverability of the original raw texts.

Participating systems will have to find labeled syntactic dependencies between words, i.e. a syntactic head for each word, and a label classifying the type of the dependency relation. In addition to syntactic dependencies, prediction of morphology and lemmatization will be evaluated. There will be multiple test sets in various languages but all data sets will adhere to the common annotation style of UD. Participants will be asked to parse raw text where no gold-standard pre-processing (tokenization, lemmas, morphology) is available. We will provide data preprocessed by a baseline system (UDPipe, https://ufal.mff.cuni.cz/udpipe/) so that the participants can focus on improving just one part of the processing pipeline, if they want to. We believe that this makes the task reasonably accessible for everyone.

We do not plan on running separate open and closed tracks. All our tracks will be formally closed, but the list of permitted resources is rather broad and includes large raw corpora and parallel corpora (see the Data description).

Dan Zeman, Jan Hajič, Joakim Nivre, Filip Ginter, Milan Straka, Slav Petrov and Martin Popel