*SEM 2013 SHARED TASK: Semantic Textual Similarity (STS)
Final call for participation and train data release for pilot on typed similarity
Semantic Textual Similarity (STS)
*SEM 2013 SHARED TASK
REGISTRATION CLOSES FEB. 15
NEW: TRAIN DATA FOR PILOT ON TYPED SIMILARITY
STS was selected by the *SEM conference as the 2013 shared task.
Following the success of the 2012 task on STS (Agirre et al. 2012), with 35 teams and 88 runs, we solicit participation in the second STS task, which will held at *SEM 2013, co-located with NAACL-2013 conference in Atlanta.
STS in 2013
Participants will submit systems that examine the degree of semantic equivalence between two sentences. The goal of the STS task is to create a unified framework for the evaluation of semantic textual similarity modules and to characterize their impact on NLP applications. We particularly encourage submissions from the lexical semantics, summarization, machine translation evaluation metric, and textual entailment communities.
The task will follow a similar design as the SemEval pilot last year, but instead of providing train/test data from the same datasets, we will provide all the 2012 data as training data, and the test data will be drawn from related but different datasets. This setting is more realistic, and the teams which prefer to train/test on the same datasets can perform experiments using the 2012 data.
There will be two tasks this year:
The core STS task
A pilot task on typed-similarity between semi-structured records
Given two sentences, s1 and s2, participants will quantifiably inform us on how similar s1 and s2 are, resulting in a similarity score. Participants will also provide a confidence score indicating their confidence level for the result returned for each pair. The output of participant systems will be compared to the manual scores, which range from 5 (semantic equivalence) to 0 (no relation).
The test data will include the following datasets:
Paraphrase sentence pairs
MT evaluation pairs including those from HyTER graphs and GALE HTER data
Please check http://ixa2.si.ehu.es/sts/data for the trial data and details on the core task. No new train data will be released in 2013. The trial data contains all 2012 data.
Pilot task on typed-similarity
In addition we will hold a pilot task on typed-similarity between semi-structured records. The types of similarity to be studied include location, author, people involved, time, events or actions, subject, description. Please check http://ixa2.si.ehu.es/sts/data for data and details on this pilot task. New: train data for the pilot task was just released, including tool for visualizing pairs.
After in-house theoretical and empirical analysis, we have selected Pearson as the main evaluation metric. The core task will be evaluated according to the weighted mean across the evaluation datasets. The pilot task will be evaluated according to the mean across the several similarity types.
Open source pipeline for STS
An open source pipeline for STS will be made available shortly. Strong open-source baselines like DKPro can be found in the STS wiki http://www-nlp.stanford.edu/wiki/STS.
Nov 11: Initial training dataset
Jan 3: Trial dataset, with documentation and scorer
Jan 30: Training dataset for typed similarity
Feb 15: Registration for the task closes
Mar 1: Start of evaluation period
Mar 15: End of evaluation period (23:59, UTC-11)
Apr 9: Paper due [TBC]
Apr 23: Reviews due [TBC]
May 4: Camera ready [TBC]
Please use the following mailing list: sts-semeval at googlegroups com.
If interested in the task please join the mailing list for updates http://groups.google.com/group/STS-semeval.
Eneko Agirre, University of the Basque Country, Basque Country
Daniel Cer, Stanford University, USA
Mona Diab, The George Washington University, USA
Aitor Gonzalez-Agirre, University of the Basque Country, Basque Country
Weiwei Guo, Columbia University, USA