This page serves as a community portal for everything related to Semantic Evaluation (SemEval).
Semantic Evaluation Exercises
SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval Word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.
This series of evaluations is providing a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning. As such, the evaluations provide an emergent mechanism to identify the problems and solutions for computations with meaning. These exercises have evolved to articulate more of the dimensions that are involved in our use of language. They began with apparently simple attempts to identify word senses computationally. They have evolved to investigate the interrelationships among the elements in a sentence (e.g., semantic role labeling), relations between sentences (e.g., coreference), and the nature of what we are saying (semantic relations and sentiment analysis).
The purpose of the SemEval exercises and SENSEVAL is to evaluate semantic analysis systems. The first three evaluations, Senseval-1 through Senseval-3, were focused on word sense disambiguation, each time growing in the number of languages offered in the tasks and in the number of participating teams. Beginning with the 4th workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation. This portal will be used to provide a comprehensive view of the issues involved in semantic evaluations.
Upcoming and Past Events
|SemEval-2015||2015||TBA||TBA||TBA||discussion at SemEval Group|
|SemEval-2014||2014||Dublin, Ireland||COLING 2014||TBA||discussion at SemEval Group|
|SemEval-2013||2013||Atlanta, USA||*SEM 2013 and NAACL 2013||ACL Anthology||discussion at SemEval Group|
|SemEval-2012||2012||Montreal, Canada||*SEM 2012 and NAACL-HLT 2012||ACL Anthology||discussion at SemEval Group|
|SemEval-2010||2010||Uppsala, Sweden||ACL 2010||ACL Anthology|
|SemEval-2007||2007||Prague, Czech Republic||ACL 2007||ACL Anthology||copy of website at Internet Archive|
|SENSEVAL 3||2004||Barcelona, Spain||ACL 2004||ACL Anthology|
|SENSEVAL 2||2001||Toulouse, France||ACL-EACL 2001||ACL Anthology and Natural Language Engineering, Volume 8, Issue 4, 2002||copy of website at Internet Archive|
|SENSEVAL 1||1998||East Sussex, UK||independent||Language Resources and Evaluation, Volume 34, Issue 1-2, April 2000||Language Resources and Evaluation was called Computers and the Humanities in 1998|
Overview of Issues in Semantic Analysis
The SemEval exercises provide a mechanism for examining issues in semantic analysis of texts. The topics of interest are concerned with identifying and characterizing the kinds of issues relevant to human understanding of language; the topics are generally different from the concerns of the logic-based approach of formal computational semantics. The primary goal is to replicate human processing by means of computer systems. The tasks (shown below) are developed by individuals and groups to deal with identifiable issues, as they take on some concrete form.
The first major area in semantic analysis is the identification of the intended meaning at the word level (taken to include idiomatic expressions). This is word-sense disambiguation (a concept that is evolving away from the notion that words have discrete senses, but rather are characterized by the ways in which they are used, i.e., their contexts). The tasks in this area include lexical sample and all-word disambiguation, multi- and cross-lingual disambiguation, and lexical substitution. Given the difficulties of identifying word senses, other tasks relevant to this topic include word-sense induction, subcategorization acquisition, and evaluation of lexical resources. The tasks in this area may be characterized as dealing with dictionary issues.
The second major area in semantic analysis is the understanding of how different sentence and textual elements fit together. Tasks in this area include semantic role labeling, semantic relation analysis, and coreference resolution. Other tasks in this area look at more specialized issues of semantic analysis, such as temporal information processing, metonymy resolution, and sentiment analysis. The tasks in this area have many potential applications, such as information extraction, question answering, document summarization, machine translation, construction of thesauri and semantic networks, language modeling, paraphrasing, and recognizing textual entailment. In each of these potential applications, the contribution of the types of semantic analysis constitutes the most outstanding research issue.
Tasks in Semantic Evaluation
The major tasks in semantic evaluation include:
- Word sense disambiguation: WSD, lexical sample and all-words, the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy). The WSD task has two variants: "lexical sample" and "all words" task. The former comprises disambiguating the occurrences of a small sample of target words which were previously selected, while in the latter all the words in a piece of running text need to be disambiguated. Tasks have been performed for many languages. Tasks have covered disambiguation of nouns, verbs, adjectives, and prepositions. A new task is evaluating phrasal semantics (compositionality and semantic similarity of phrases).
- Multi-lingual or cross-lingual word-sense disambiguation: word senses are defined according to translation distinctions, e.g., a polysemous word in Japanese is translated differently in a given context. The WSD task provides texts with target words and requires identification of the appropriate translation. A related task is cross-language information retrieval, where participants disambiguate in one language (e.g., with WordNet synsets) and retrieve documents in another language; standard information retrieval metrics are use to assess the quality of the disambiguation. New tasks include cross-lingual content-based recommendation (where user profiles are built to recommend items of interest in another language), examining semantic textual similarity with a view toward evaluating modular semantic components, and linking noun phrases across Wikipedia articles in different languages.
- Word-sense induction: comparison of sense-induction and discrimination systems. The task is to cluster corpus instances (word uses, rather than word senses) and to evaluate systems on how well they correspond to pre-existing sense inventories or to various sense mapping systems. New tasks are to provide an evaluation framework for web search result clustering, induction for graded or non-graded senses, and tags used in folksonomies.
- Lexical substitution or simplification: find an alternative substitute word or phrase for a target word in context. The task involves both finding the synonyms and disambiguating the context. It allows the use of any kind of lexical resource or technique, including word sense disambiguation and word sense induction. A cross-lingual task was also defined. This topic also includes textual entailment and paraphrasing tasks.
- Evaluation of lexical resources: the task evaluates the submitted lexical resources indirectly, running a simple WSD based on topic signatures (sets of words related to each target sense). A lexical sample tagged with English WordNet senses was used for evaluation.
- Subcategorization acquistion: semantically similar verbs are similar in terms of subcategorization frames. The task is to use any available method for disambiguating verb senses, so that the results can then be fed into automatic methods used for acquiring subcategorization frames, with the hypothesis that the disambiguation will cluster the instances.
- Semantic role labeling: identifying and labeling constituents of sentences with their semantic roles. The basic task began with attempts to replicate FrameNet data, specifically frame elements. This task has expanded to inferring and developing new frames and frame elements, in individual sentences and in full running texts, with identification of intersentential links and coreference chains. New tasks focus on extraction of spatial information from natural language (spatial role labeling) and the utility of semantic dependency parsing in semantic role labeling.
- Semantic relation identification: examining relations between lexical items in a sentence. The task, given a sample of semantic relation types, is to identify and classify semantic relations between nominals (i.e., nouns and base noun phrases, excluding named entities); a main purpose of this task is to assess different classification methods. Another task is, given a sentence and two tagged nominals, to predict the relation between those nominals and the direction of the relation. New tasks seek to measure the relational similarity between pairs of words, to extract drug-drug interactions from biomedical texts, and to develop methods in causal reasoning.
- Metonymy resolution: the figurative substitution of an attribute of a name for the thing specified. The task is a lexical sample task (1) to classify preselected expressions of a particular semantic class (such as country names) as having a literal or a metonymic reading, and if so, (2) to identify a further specification into prespecified metonymic patterns (such as place-for-event or company-for-stock) or, alternatively, recognition as an innovative reading. A second task is to identify when the arguments of a specified predicate does not satisfy selectional restrictions, and if not, to identify both the type mismatch and the type shift (coercion).
- Temporal information processing: the temporal location and order of events in newspaper articles, narratives, and similar texts. The task is to identify the events described in a text and locate these in time, i.e., identification of temporal referring expressions, events and temporal relations within a text. A further task requires systems to recognize which of a fixed set of temporal relations holds between (a) events and time expressions within the same sentence (b) events and the document creation time (c) main events in consecutive sentences, and (d) two events where one syntactically dominates the other.
- Coreference resolution: detection and resolution of coreferences. The task is to detect full coreference chains, composed by named entities, pronouns, and full noun phrases and to resolve pronouns, i.e., finding their antecedents.
- Sentiment analysis: emotion annotation, polarity orientation labeling. The task is to classify the titles of newspaper articles with the appropriate emotion label and/or with a valence indication (positive/negative), given a set of predefined six emotion labels (i.e., Anger, Disgust, Fear, Joy, Sadness, Surprise). A new task is to examine polarity in Twitter.
This list is expected to grow as the field progresses.
Some tasks are closely related to each other. For instance, word sense disambiguation (monolingual, multi-lingual and cross-lingual), word sense induction task, lexical substitution, subcategorization acquisition and evaluation of lexical resources are all related to word senses.
SIGLEX, the ACL Special Interest Group on the Lexicon is the umbrella organization for the SemEval semantic evaluations and the SENSEVAL word-sense evaluation exercises. SENSEVAL is the home page for SENSEVAL 1-3. Each exercise is usually organized by two individuals, who make the call for tasks and handle the overall administration. Within the general guidelines, each task is then organized and run by individuals or groups.