The 1st Workshop on Evaluating Vector-Space Representations for NLP

Event Notification Type: 
Call for Papers
Abbreviated Title: 
RepEval 2016
Location: 
ACL 2016
Friday, 12 August 2016
Country: 
Germany
City: 
Berlin
Submission Deadline: 
Sunday, 8 May 2016

Mission Statement: To foster the development of new and improved ways of measuring the quality and understanding the properties of vector space representations in NLP.

===Motivation===

Models that learn real-valued vector representations of words, phrases, sentences, and even document are ubiquitous in today's NLP landscape. These representations are usually obtained by training a model on large amounts of unlabeled data, and then employed in NLP tasks and downstream applications. While such representations should ideally be evaluated according to their value in these applications, doing so is laborious, and it can be hard to rigorously isolate the effects of different representations for comparison. There is therefore a need for evaluation via simple and generalizable proxy tasks. To date, these proxy tasks have been mainly focused on lexical similarity and relatedness, and do not capture the full spectrum of interesting linguistic properties that are useful for downstream applications. This workshop challenges its participants to propose methods and/or design benchmarks for evaluating the next generation of vector space representations, for presentation and detailed discussion at the event. Following the workshop, the highest-quality proposals will receive the support of the organizers and participants, and some financial support, to help produce their proposed resource to the highest standard.

===Submissions===

We encourage researchers at all levels of experience to consider contributing to the discussion at RepEval by making a short submission. This can either be as an *analysis* of existing benchmarks or by *proposing* new ones.

=Analysis Track=

An analysis submission should analyze and discuss the strengths and weaknesses of existing evaluation tasks, providing helpful insights for designers of new tasks. Analysis papers will be reviewed, accepted, and published *before* the proposal track's camera-ready deadline, so that new task proposals could benefit from these findings.

As part of their analysis, papers in this track might like to consider the following questions:
What are the pros and cons of existing evaluations?
What are the limitations of task-independent representation or its evaluation?
Given a specific downstream application, which existing evaluation (or family of evaluations) is a good predictor of performance improvement?
Which linguistic/semantic/psychological properties are captured by existing evaluations? Which are not?
What methodological mistakes were made in the creation of existing evaluation datasets?

The analysis track is *not* limited to these topics. We believe that any manuscript presenting a sound argument on representation evaluation would be a great addition to the workshop.

=Proposal Track=

A proposal submission should propose a novel method for evaluating representations. It does not have to construct an actual dataset, but it should describe a way (or several optional ways) of collecting one. Proposals are expected to provide roughly 5-10 examples as a proof of concept.

In addition, each proposal should explicitly mention:
Which type of representation it evaluates (e.g. word, sentence, document)
For which downstream application(s) it functions as a proxy
Any linguistic/semantic/psychological properties it captures

Among other important points, proposals should take the following into consideration:
If the task captures some linguistic phenomenon via annotators, what evidence is there that it is robustly observed in humans (e.g., inter-annotator agreement)?
How easy would it be for other researchers to accurately reproduce the evaluation (not necessarily the dataset)?
Will the dataset be cost-effective to produce?
Is a specific family of models expected to perform particularly better (or worse) on the task? In other words, which types of models is this evaluation targeted at?
How should the evaluation's results be interpreted?

=Submission Format=

Submissions to both tracks should be 2-4 pages of content in ACL format, with an unlimited amount of pages for references. For the proposal track, we encourage shorter content (2-3 pages), leaving more room for examples and their visualization.

===Best Proposal Awards *Sponsored by Facebook AI Research*===

Two proposal-track papers will be selected by a special committee, and awarded financial support for turning their idea into a large-scale high-quality dataset via crowdsourcing or other annotation efforts. We hope that the workshop community's endorsement will also promote the use of these new evaluations.

===Important Dates===

Submission: May 8th 2016
Notification: June 5th 2016
Camera-Ready (Analysis Track): June 12th 2016
Camera-Ready (Proposal Track): June 26th 2016*
Workshop Date: August 12th 2016

*This will give proposal-track authors enough time to go over any relevant results that may rise from the analysis track, and cite them as motivation.

===Organizers===

Omer Levy, Bar-Ilan University
Felix Hill, Cambridge University
Roi Reichart, Technion - Israel Institute of Technology
Kyunghyun Cho, New York University
Anna Korhonen, Cambridge University
Yoav Goldberg, Bar-Ilan University
Antoine Bordes, Facebook AI Research