Difference between revisions of "RTE Knowledge Resources"
Line 16: | Line 16: | ||
=== Publicly available Resources === | === Publicly available Resources === | ||
− | {|class="wikitable sortable" cellpadding="3" cellspacing="0" border="1" | + | {|class="wikitable sortable" cellpadding="3" cellspacing="0" style="margin-left: 20px;" border="1" |
|- bgcolor="#CDCDCD" | |- bgcolor="#CDCDCD" | ||
− | ! Resource | + | ! width="50"|Resource |
− | ! Type | + | ! width="80"|Type |
− | ! Author | + | ! width="180"|Author |
− | ! class="unsortable"|Brief description | + | ! class="unsortable" width="500"|Brief description |
− | ! | + | ! width="30"|<small>PAST Users</small> |
− | ! class="unsortable"|Usage info | + | ! width="30"|<small>RTE4 Users</small> |
+ | ! width="30"|<small>RTE5 Users</small> | ||
+ | ! class="unsortable" width="50"|Usage info | ||
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 31: | Line 33: | ||
| Princeton University | | Princeton University | ||
| Lexical database of English nouns, verbs, adjectives and adverbs | | Lexical database of English nouns, verbs, adjectives and adverbs | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"|3 |
+ | | style="text-align: center;"|21 | ||
+ | | style="text-align: center;"| | ||
| [[WordNet - RTE Users|Users]] | | [[WordNet - RTE Users|Users]] | ||
Line 39: | Line 43: | ||
| University of Colorado Boulder | | University of Colorado Boulder | ||
| Lexicon for English verbs organized into classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class | | Lexicon for English verbs organized into classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"|2 |
+ | | style="text-align: center;"|2 | ||
+ | | style="text-align: center;"| | ||
| [[Verbnet - RTE Users|Users]] | | [[Verbnet - RTE Users|Users]] | ||
Line 47: | Line 53: | ||
| Information Sciences Institute, University of Southern California | | Information Sciences Institute, University of Southern California | ||
| Broad-coverage semantic network of verbs | | Broad-coverage semantic network of verbs | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"|2 |
+ | | style="text-align: center;"|3 | ||
+ | | style="text-align: center;"| | ||
| [[VerbOcean - RTE Users|Users]] | | [[VerbOcean - RTE Users|Users]] | ||
Line 55: | Line 63: | ||
| ICSI (International Computer Science Institute) - Berkley University | | ICSI (International Computer Science Institute) - Berkley University | ||
| Lexical resource for English words, based on frame semantics (valences) and supported by corpus evidence | | Lexical resource for English words, based on frame semantics (valences) and supported by corpus evidence | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"|1 |
+ | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[Framenet - RTE Users|Users]] | | [[Framenet - RTE Users|Users]] | ||
Line 63: | Line 73: | ||
| New York University | | New York University | ||
| Lexical resource containing syntactic frames for nouns, extracted from annotated corpora | | Lexical resource containing syntactic frames for nouns, extracted from annotated corpora | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"|2 |
+ | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[NomBank Resource - RTE Users|Users]] | | [[NomBank Resource - RTE Users|Users]] | ||
Line 71: | Line 83: | ||
| University of Colorado Boulder | | University of Colorado Boulder | ||
| Lexical resource containing syntactic frames for verbs, extracted from annotated corpora | | Lexical resource containing syntactic frames for verbs, extracted from annotated corpora | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"|2 |
+ | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[PropBank Resource - RTE Users|Users]] | | [[PropBank Resource - RTE Users|Users]] | ||
Line 79: | Line 93: | ||
| New York University | | New York University | ||
| Dictionary of English nominalizations: it describes the allowed complements for a nominalization and relates the nominal complements to the arguments of the corresponding verb | | Dictionary of English nominalizations: it describes the allowed complements for a nominalization and relates the nominal complements to the arguments of the corresponding verb | ||
+ | | style="text-align: center;"|0 | ||
| style="text-align: center;"|1 | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[Nomlex Plus - RTE Users|Users]] | | [[Nomlex Plus - RTE Users|Users]] | ||
Line 87: | Line 103: | ||
| | | | ||
| Free encyclopedia. Used for extraction of lexical-semantic rules (from its more structured parts), named entity recognition, geographical information etc. | | Free encyclopedia. Used for extraction of lexical-semantic rules (from its more structured parts), named entity recognition, geographical information etc. | ||
+ | | style="text-align: center;"|0 | ||
| style="text-align: center;"|3 | | style="text-align: center;"|3 | ||
+ | | style="text-align: center;"| | ||
| [[Wikipedia - RTE Users|Users]] | | [[Wikipedia - RTE Users|Users]] | ||
Line 96: | Line 114: | ||
| Output of the TEASE algorithm | | Output of the TEASE algorithm | ||
| style="text-align: center;"|0 | | style="text-align: center;"|0 | ||
+ | | style="text-align: center;"|0 | ||
+ | | style="text-align: center;"| | ||
| [[Tease Collection - RTE Users|Users]] | | [[Tease Collection - RTE Users|Users]] | ||
Line 103: | Line 123: | ||
| BADC (British Atmospheric Data Centre) | | BADC (British Atmospheric Data Centre) | ||
| Acronym and Abbreviation List | | Acronym and Abbreviation List | ||
+ | | style="text-align: center;"|0 | ||
| style="text-align: center;"|1 | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[BADC Acronym and Abbreviation List - RTE Users|Users]] | | [[BADC Acronym and Abbreviation List - RTE Users|Users]] | ||
Line 111: | Line 133: | ||
| Acronym-Guide.com | | Acronym-Guide.com | ||
| Acronym and Abbreviation Lists for English, branched in thematic directories | | Acronym and Abbreviation Lists for English, branched in thematic directories | ||
+ | | style="text-align: center;"|0 | ||
| style="text-align: center;"|1 | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[Acronym Guide - RTE Users|Users]] | | [[Acronym Guide - RTE Users|Users]] | ||
Line 119: | Line 143: | ||
| University of Alberta | | University of Alberta | ||
| Thesaurus automatically constructed using a parsed corpus, based on distributional similarity scores | | Thesaurus automatically constructed using a parsed corpus, based on distributional similarity scores | ||
+ | | style="text-align: center;"|0 | ||
| style="text-align: center;"|1 | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[Dekang Lin’s Thesaurus - RTE Users|Users]] | | [[Dekang Lin’s Thesaurus - RTE Users|Users]] | ||
Line 128: | Line 154: | ||
| Roget's Thesaurus is a widely-used English thesaurus, created by Dr. Peter Mark Roget in 1805. The original edition had 15,000 words, and each new edition has been larger. The electronic edition ([http://machaut.uchicago.edu/rogets version 1.02]) is made available by University of Chicago. | | Roget's Thesaurus is a widely-used English thesaurus, created by Dr. Peter Mark Roget in 1805. The original edition had 15,000 words, and each new edition has been larger. The electronic edition ([http://machaut.uchicago.edu/rogets version 1.02]) is made available by University of Chicago. | ||
| style="text-align: center;"|1 | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"|0 | ||
+ | | style="text-align: center;"| | ||
| [[Roget's Thesaurus - RTE Users|Users]] | | [[Roget's Thesaurus - RTE Users|Users]] | ||
Line 135: | Line 163: | ||
| Linguistic Data Consortium, University of Pennsylvania; Google Inc. | | Linguistic Data Consortium, University of Pennsylvania; Google Inc. | ||
| Data set containing English word n-grams and their observed frequency counts. The n-gram counts were generated from approximately 1 trillion word tokens of text from publicly accessible Web pages | | Data set containing English word n-grams and their observed frequency counts. The n-gram counts were generated from approximately 1 trillion word tokens of text from publicly accessible Web pages | ||
+ | | style="text-align: center;"|0 | ||
| style="text-align: center;"|1 | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[Web1T - RTE Users|Users]] | | [[Web1T - RTE Users|Users]] | ||
Line 143: | Line 173: | ||
| USGS (United States Geological Survey) | | USGS (United States Geological Survey) | ||
| Database containing the Federal and national standard toponyms for USA, associated areas and Antarctica | | Database containing the Federal and national standard toponyms for USA, associated areas and Antarctica | ||
+ | | style="text-align: center;"|0 | ||
| style="text-align: center;"|1 | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[GNIS - RTE Users|Users]] | | [[GNIS - RTE Users|Users]] | ||
Line 151: | Line 183: | ||
| | | | ||
| Database containing eight million geographical names. It is integrating geographical data such as names of places in various languages, elevation, population and others from various sources. | | Database containing eight million geographical names. It is integrating geographical data such as names of places in various languages, elevation, population and others from various sources. | ||
+ | | style="text-align: center;"|0 | ||
| style="text-align: center;"|1 | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[Geonames - RTE Users|Users]] | | [[Geonames - RTE Users|Users]] | ||
Line 159: | Line 193: | ||
| Department of Computer Science, New York University | | Department of Computer Science, New York University | ||
| Data-base created using Sekine's method, NOT cleaned up by human. It includes 19,975 sets of paraphrases with 191,572 phrases. | | Data-base created using Sekine's method, NOT cleaned up by human. It includes 19,975 sets of paraphrases with 191,572 phrases. | ||
− | | style="text-align: center;"| 0 | + | | style="text-align: center;"|0 |
+ | | style="text-align: center;"|0 | ||
+ | | style="text-align: center;"| | ||
| [[Sekine's Paraphrase Database - RTE Users|Users]] | | [[Sekine's Paraphrase Database - RTE Users|Users]] | ||
Line 167: | Line 203: | ||
| Microsoft Research | | Microsoft Research | ||
| Text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship. | | Text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship. | ||
− | | style="text-align: center;"| 0 | + | | style="text-align: center;"|0 |
+ | | style="text-align: center;"|0 | ||
+ | | style="text-align: center;"| | ||
| [[Microsoft Research Paraphrase Corpus - RTE Users|Users]] | | [[Microsoft Research Paraphrase Corpus - RTE Users|Users]] | ||
Line 175: | Line 213: | ||
| Department of Computer Science, Cornell University, Ithaca NY | | Department of Computer Science, Cornell University, Ithaca NY | ||
| System output of an unsupervised algorithm recovering many Downward Entailing operators, like 'doubt'. | | System output of an unsupervised algorithm recovering many Downward Entailing operators, like 'doubt'. | ||
− | | style="text-align: center;"| 0 | + | | style="text-align: center;"|0 |
+ | | style="text-align: center;"|0 | ||
+ | | style="text-align: center;"| | ||
| [[Downward entailing operators - RTE Users|Users]] | | [[Downward entailing operators - RTE Users|Users]] | ||
Line 183: | Line 223: | ||
| Bar-Ilan University | | Bar-Ilan University | ||
| Extraction of lexical reference rules from the text body (first sentence) and from metadata (links, redirects, parentheses) of Wikipedia | | Extraction of lexical reference rules from the text body (first sentence) and from metadata (links, redirects, parentheses) of Wikipedia | ||
− | | style="text-align: center;"|1 | + | | style="text-align: center;"|0 |
+ | | style="text-align: center;"|1 | ||
+ | | style="text-align: center;"| | ||
| [[WikiRules! - RTE Users|Users]] | | [[WikiRules! - RTE Users|Users]] | ||
Line 192: | Line 234: | ||
| ''Participants are encouraged to contribute'' | | ''Participants are encouraged to contribute'' | ||
| style="text-align: center;"| | | style="text-align: center;"| | ||
− | | [[New | + | | style="text-align: center;"| |
+ | | style="text-align: center;"| | ||
+ | | [[New Resource2 - RTE Users|Users]] | ||
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 199: | Line 243: | ||
| | | | ||
| ''Participants are encouraged to contribute'' | | ''Participants are encouraged to contribute'' | ||
+ | | style="text-align: center;"| | ||
+ | | style="text-align: center;"| | ||
| style="text-align: center;"| | | style="text-align: center;"| | ||
| [[New Resource2 - RTE Users|Users]] | | [[New Resource2 - RTE Users|Users]] |
Revision as of 03:32, 24 November 2009
Knowledge resources have shown their relevance for applied semantic inference, and are extensively used by applied inference systems, such as those developed within the Textual Entailment framework.
This page presents a list of the knowledge resources used by systems that have participated in the last RTE challenges. The first table lists the publicly available resources, the second one lists unpublished resources. Both tables are sortable by Resource name, type, author and number of users.
RTE Participants are encouraged to add information about all kind of knowledge resources used, from standard existing resources (e.g. WordNet) to knowledge collections created for specific purposes, which can be made available to the community.
Call for Resources
Ablation Tests
Publicly available Resources
Resource | Type | Author | Brief description | PAST Users | RTE4 Users | RTE5 Users | Usage info |
---|---|---|---|---|---|---|---|
WordNet | Lexical DB | Princeton University | Lexical database of English nouns, verbs, adjectives and adverbs | 3 | 21 | Users | |
Verbnet | Lexical DB | University of Colorado Boulder | Lexicon for English verbs organized into classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class | 2 | 2 | Users | |
VerbOcean | Lexical DB | Information Sciences Institute, University of Southern California | Broad-coverage semantic network of verbs | 2 | 3 | Users | |
FrameNet | Lexical DB | ICSI (International Computer Science Institute) - Berkley University | Lexical resource for English words, based on frame semantics (valences) and supported by corpus evidence | 1 | 1 | Users | |
NomBank | Lexical DB | New York University | Lexical resource containing syntactic frames for nouns, extracted from annotated corpora | 2 | 1 | Users | |
PropBank | Lexical DB | University of Colorado Boulder | Lexical resource containing syntactic frames for verbs, extracted from annotated corpora | 2 | 1 | Users | |
Nomlex Plus | Lexical DB | New York University | Dictionary of English nominalizations: it describes the allowed complements for a nominalization and relates the nominal complements to the arguments of the corresponding verb | 0 | 1 | Users | |
Wikipedia | Encyclopedia | Free encyclopedia. Used for extraction of lexical-semantic rules (from its more structured parts), named entity recognition, geographical information etc. | 0 | 3 | Users | ||
TEASE Collection | Collection of Entailment Rules | Bar-Ilan University | Output of the TEASE algorithm | 0 | 0 | Users | |
BADC Acronym and Abbreviation List | Word List | BADC (British Atmospheric Data Centre) | Acronym and Abbreviation List | 0 | 1 | Users | |
Acronym Guide | Word List | Acronym-Guide.com | Acronym and Abbreviation Lists for English, branched in thematic directories | 0 | 1 | Users | |
Dekang Lin’s Thesaurus | Thesaurus | University of Alberta | Thesaurus automatically constructed using a parsed corpus, based on distributional similarity scores | 0 | 1 | Users | |
Roget's Thesaurus | Thesaurus | Peter Mark Roget (Electronic version distributed by University of Chicago) | Roget's Thesaurus is a widely-used English thesaurus, created by Dr. Peter Mark Roget in 1805. The original edition had 15,000 words, and each new edition has been larger. The electronic edition (version 1.02) is made available by University of Chicago. | 1 | 0 | Users | |
Web1T 5-grams | Word list | Linguistic Data Consortium, University of Pennsylvania; Google Inc. | Data set containing English word n-grams and their observed frequency counts. The n-gram counts were generated from approximately 1 trillion word tokens of text from publicly accessible Web pages | 0 | 1 | Users | |
GNIS - Geographic Names Information System | Gazetteer | USGS (United States Geological Survey) | Database containing the Federal and national standard toponyms for USA, associated areas and Antarctica | 0 | 1 | Users | |
Geonames | Gazetteer | Database containing eight million geographical names. It is integrating geographical data such as names of places in various languages, elevation, population and others from various sources. | 0 | 1 | Users | ||
Sekine's Paraphrase Database | Collection of paraphrases | Department of Computer Science, New York University | Data-base created using Sekine's method, NOT cleaned up by human. It includes 19,975 sets of paraphrases with 191,572 phrases. | 0 | 0 | Users | |
Microsoft Research Paraphrase Corpus | Collection of paraphrases | Microsoft Research | Text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship. | 0 | 0 | Users | |
Downward entailing operators | Collection of entailing operators | Department of Computer Science, Cornell University, Ithaca NY | System output of an unsupervised algorithm recovering many Downward Entailing operators, like 'doubt'. | 0 | 0 | Users | |
WikiRules! | Lexical Reference rule-base | Bar-Ilan University | Extraction of lexical reference rules from the text body (first sentence) and from metadata (links, redirects, parentheses) of Wikipedia | 0 | 1 | Users | |
New resource | Participants are encouraged to contribute | Users | |||||
New resource | Participants are encouraged to contribute | Users |
Not available Resources
The following table lists the unpublished resources used by RTE participants. Some of them have been developed by Users themselves specifically for RTE. Interested people may turn to authors to obtain further information.
Resource | Type | Author | Brief description | RTE Users* | Usage info |
---|---|---|---|---|---|
PARC Polarity Lexicon | Lexical DB | PARC - Palo Alto Research Center | Verbs classification with respect to semantic polarity | 1 | Users |
DIRT Paraphrase Collection | Collection of paraphrases | University of Alberta | Output of the DIRT algorithm | 5 | Users |
Gazetteer from TREC | Gazetteer | NIST - National Institute of Standards and Technology | Cities and other geographical names | 1 | Users |
DFKI Geographic Ontology (to be released) |
Ontology | DFKI - German Research Center for Artificial Intelligence | Ontology containing geographic terms and two kinds of relations: the directional part-of relation, and the equal relation for synonyms and abbreviations of the same geographic area (e.g the United Kingdom, the UK, Great Britain, etc.) | 1 | Users |
Syntactic rule base (to be released) |
Collection of Entailment Rules | Bar-Ilan University; Tel-Aviv University | A manually-composed collection of entailment rules which define parse tree transformations. The rules cover generic syntactic phenomena such as appositions, conjunctions, passive, relative clause, etc. (Bar-Haim et al., AAAI-07) | 1 | Users |
Polarity rule base (to be released) |
Collection of Entailment Rules | Bar-Ilan University; Tel-Aviv University | A manually-composed collection of entailment rules which detect predicates whose polarity is negative (e.g. didn't dance) or unknown (e.g. plans to dance). The rules capture diverse phenomena that affect polarity, e.g. verbal negation, modal verbs, conditionals, and certain verbs that induce negative or "unknown" polarity context. The latter were taken mainly from VerbNet. Extends a resource described in (Bar-Haim et al., AAAI-07) | 1 | Users |
Lexical-Syntactic rule base combining WordNet, NomLex-plus and Unary DIRT | Collection of Entailment Rules | Bar-Ilan University; Tel-Aviv University | Extract lexical-syntactic entailment rules for predicates (verbal and nominal), including argument mapping. The resource is based on WordNet, Nomlex-Plus and Unary DIRT (Szpektor and Dagan, Coling 08) | 1 | Users |
OPENU Collection | Collection of Entailment Rules and Patterns | Open University | Collections of rules, patterns etc. for RTE purpose, extracted from Reuter corpus parsed using Minipar. | 1 | Users |
New resource | Participants are encouraged to contribute | Users | |||
New resource | Participants are encouraged to contribute | Users |
[*] The number of Users (see "Usage Info" links for details) refers to participants in the last two RTE challenges.
RTE-3 data have been provided only by participants, whereas RTE-4 data have been integrated with information extracted from the related proceedings.