EMNLP 2015 Workshop on Discourse in Machine Translation

Event Notification Type: 
Call for Papers
Abbreviated Title: 
DiscoMT'15
Thursday, 17 September 2015
Country: 
Portugal
City: 
Lisbon
Submission Deadline: 
Sunday, 28 June 2015

EMNLP 2015 Workshop on Discourse in Machine Translation (DiscoMT'15)
(http://www.idiap.ch/workshop/DiscoMT)
17 September 2015 -- Lisbon, Portugal

Second call for papers

It is well-known that texts have properties that go beyond those of their individual sentences and that reveal themselves in the frequency and distribution of words, word senses, referential forms and syntactic structures, including:
- document-wide properties, such as style, register, reading level and genre;
- patterns of topical or functional sub-structure;
- patterns of discourse coherence, as realized through explicit and/or implicit relations between sentences, clauses or referring forms;
- anaphoric and elliptic expressions, in which speakers exploit the previous discourse context to convey subsequent information very succinctly.

By the end of the 1990s, these properties had stimulated considerable research in Machine Translation, aimed at endowing machine--translated texts with similar document and discourse properties as their source texts. A period of ten years then elapsed before interest resumed in these topics, now from the perspectives of Statistical and/or Hybrid Machine Translation. This led to the
first ACL Workshop on Discourse in Machine Translation (DiscoMT) in 2013, held in Sofia, Bulgaria.

Since then, SMT has itself evolved in ways that allow more access to needed linguistic knowledge, through the availability of feature-rich statistical models. As such, we are now holding a second DiscoMT workshop (DiscoMT'15), this time with a complementary Shared Task (see below).

DiscoMT'15 solicits submissions on any the following topics and any language pairs, but also welcomes submissions that link discourse studies with machine translation in some other way.

- discourse processing in support of MT, including:
. textual coherence, including anaphora, coreference, tense, aspect and modality
. textual cohesion, including lexical consistency
. discourse structure, including use of connectives and information structuring devices
. topic structure
. consistency in style and register;
- MT techniques for obtaining document-level consistency and domain adaptability;
- MT techniques for structured documents;
- methods and algorithms to handle discourse-level phenomena in MT training and decoding;
- uses of MT in processing discourse-level phenomena;
- techniques for evaluating the effect of efforts targetting discourse-level phenomena in SMT
- techniques for assessing the impact of discourse-level processing on MT quality;
- quantitative studies on the impact of discourse-level phenomena on current MT systems vs. discourse-aware ones.

SUBMISSION INSTRUCTIONS

We solicit previously unpublished work, presented either as long or short papers, following the ACL 2015 formatting guidelines at

http://www.acl2015.org/call_for_papers.html

Long papers should have at most 8 pages of content, not including references. Short papers are limited to 4 pages of content, not including references. There is no constraint on the size of the reference list. Submissions should be anonymous and not disclose in any way the identity of the author(s). Submissions should be made using the START system at

https://www.softconf.com/emnlp2015/DiscoMT15

IMPORTANT DATES

Submission deadline: 28 June 2015
Notification of acceptance: 21 July 2015
Final versions due: 11 August 2015
Workshop: 17 or 18 September 2015

CO-CHAIRS

Bonnie Webber, University of Edinburgh
Andrei Popescu-Belis, Idiap Research Institute
Marine Carpuat, University of Maryland

ORGANIZING COMMITTEE

Ani Nenkova, University of Pennsylvania
Christian Hardmeier, Uppsala University
Jorg Tiedemann, Uppsala University
Lori Levin, Carnegie Mellon University
Lucia Specia, University of Sheffield
Mark Fishel, University of Zurich
Min Zhang, Soochow University
Preslav Nakov, Qatar Computing Research Institute

PROGRAM COMMITTEE

Liane Guillou, University of Edinburgh
Beata Beigman Klebanov, Educational Testing Service, New Jersey
Francisco Guzmán, Qatar Computing Research Institute, Doha, Qatar
Shafiq Joty, Qatar Computing Research Institute, Doha, Qatar
Thomas Meyer, Google, Zurich
Michal Novak, Charles University, Prague
Lucie Poláková, Charles University, Prague
Maja Popovic, DFKI, Berlin
Sara Stymne, University of Uppsala
Yannick Versley, University of Heidelberg
Marion Weller, University of Stuttgart

SHARED TASK

The DiscoMT shared task will consist of two sub-tasks, designed to make it interesting to both the MT and discourse communities. For the MT community, there is a practical MT task, for the discourse community, a classification task that requires no specific MT expertise. Both subtasks will be run on transcripts from the TED conference series. Both subtasks use the language pair English-French, which has a sufficiently high baseline performance to produce basically intelligible output, as well as interesting differences in their pronoun systems.

Subtask A: Pronoun-focused Translation (submission deadline: May 10, 2015)

The first subtask is a regular end-to-end statistical machine translation (SMT) task, where participants are provided training data for an SMT system and are asked to generate a translation of a unseen test set for the evaluation. Unlike other MT shared tasks, our primary evaluation will focus not on general MT quality, but specifically on the correctness of pronoun translation. Thanks to a grant from the European Association for Machine Translation, the evaluation of pronoun correctness will be carried out manually and is complimentary for the participants.

Task B: Cross-Lingual Pronoun Prediction (submission deadline: May 18, 2015)

The second task requires participating systems to predict the correct translation of a source language pronoun from a small set of classes. The input data will consist of the source language text and a complete manual reference translation from which the target pronouns have been removed. The evaluation of this task will be fully automatic by matching against the pronouns found in the reference translation.

Further details on the shared task can be found at

http://www.idiap.ch/workshop/DiscoMT/shared-task

Shared Task Coordinators

Christian Hardmeier, Uppsala University
Preslav Nakov, Qatar Computing Research Institute
Sara Stymne, Uppsala University
Yannick Versley, University of Heidelberg
Jörg Tiedemann, Uppsala University