Difference between revisions of "RTE Knowledge Resources"

Revision as of 09:33, 19 June 2009

Knowledge resources have shown their relevance for applied semantic inference, and are extensively used by applied inference systems, such as those developed within the Textual Entailment framework.

This page presents a list of the knowledge resources used by systems that have participated in the last RTE challenges. The first table lists the publicly available resources, the second one lists unpublished resources. Both tables are sortable by Resource name, type, author and number of users.

RTE Participants are encouraged to add information about all kind of knowledge resources used, from standard existing resources (e.g. WordNet) to knowledge collections created for specific purposes, which can be made available to the community.

Publicly available Resources

Resource	Type	Author	Brief description	RTE Users*	Usage info
WordNet	Lexical DB	Princeton University	Lexical database of English nouns, verbs, adjectives and adverbs	23	Users
Verbnet	Lexical DB	University of Colorado Boulder	Lexicon for English verbs organized into classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class	3	Users
VerbOcean	Lexical DB	Information Sciences Institute, University of Southern California	Broad-coverage semantic network of verbs	5	Users
FrameNet	Lexical DB	ICSI (International Computer Science Institute) - Berkley University	Lexical resource for English words, based on frame semantics (valences) and supported by corpus evidence	2	Users
NomBank	Lexical DB	New York University	Lexical resource containing syntactic frames for nouns, extracted from annotated corpora	2	Users
PropBank	Lexical DB	University of Colorado Boulder	Lexical resource containing syntactic frames for verbs, extracted from annotated corpora	2	Users
Nomlex Plus	Lexical DB	New York University	Dictionary of English nominalizations: it describes the allowed complements for a nominalization and relates the nominal complements to the arguments of the corresponding verb	1	Users
Wikipedia	Encyclopedia		Free encyclopedia. Used for extraction of lexical-semantic rules (from its more structured parts), named entity recognition, geographical information etc.	3	Users
TEASE Collection	Collection of Entailment Rules	Bar-Ilan University	Output of the TEASE algorithm	0	Users
BADC Acronym and Abbreviation List	Word List	BADC (British Atmospheric Data Centre)	Acronym and Abbreviation List	1	Users
Acronym Guide	Word List	Acronym-Guide.com	Acronym and Abbreviation Lists for English, branched in thematic directories	1	Users
Dekang Lin’s Thesaurus	Thesaurus	University of Alberta	Thesaurus automatically constructed using a parsed corpus, based on distributional similarity scores	1	Users
Roget's Thesaurus	Thesaurus	Peter Mark Roget (Electronic version distributed by University of Chicago)	Roget's Thesaurus is a widely-used English thesaurus, created by Dr. Peter Mark Roget in 1805. The original edition had 15,000 words, and each new edition has been larger. The electronic edition (version 1.02) is made available by University of Chicago.	1	Users
Web1T 5-grams	Word list	Linguistic Data Consortium, University of Pennsylvania; Google Inc.	Data set containing English word n-grams and their observed frequency counts. The n-gram counts were generated from approximately 1 trillion word tokens of text from publicly accessible Web pages	1	Users
GNIS - Geographic Names Information System	Gazetteer	USGS (United States Geological Survey)	Database containing the Federal and national standard toponyms for USA, associated areas and Antarctica	1	Users
Geonames	Gazetteer		Database containing eight million geographical names. It is integrating geographical data such as names of places in various languages, elevation, population and others from various sources.	1	Users
Sekine's Paraphrase Database	Collection of paraphrases	Department of Computer Science, New York University	Data-base created using Sekine's method, NOT cleaned up by human. It includes 19,975 sets of paraphrases with 191,572 phrases.	0	Users
Microsoft Research Paraphrase Corpus	Collection of paraphrases	Microsoft Research	Text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship.	0	Users
Downward entailing operators	Collection of entailing operators	Department of Computer Science, Cornell University, Ithaca NY	System output of an unsupervised algorithm recovering many Downward Entailing operators, like 'doubt'.	0	Users
New resource			Participants are encouraged to contribute		Users
New resource			Participants are encouraged to contribute		Users

Not available Resources

The following table lists the unpublished resources used by RTE participants. Some of them have been developed by Users themselves specifically for RTE. Interested people may turn to authors to obtain further information.

Resource	Type	Author	Brief description	RTE Users*	Usage info
PARC Polarity Lexicon	Lexical DB	PARC - Palo Alto Research Center	Verbs classification with respect to semantic polarity	1	Users
DIRT Paraphrase Collection	Collection of paraphrases	University of Alberta	Output of the DIRT algorithm	4	Users
Gazetteer from TREC	Gazetteer	NIST - National Institute of Standards and Technology	Cities and other geographical names	1	Users
DFKI Geographic Ontology (to be released)	Ontology	DFKI - German Research Center for Artificial Intelligence	Ontology containing geographic terms and two kinds of relations: the directional part-of relation, and the equal relation for synonyms and abbreviations of the same geographic area (e.g the United Kingdom, the UK, Great Britain, etc.)	1	Users
Syntactic rule base (to be released)	Collection of Entailment Rules	Bar-Ilan University; Tel-Aviv University	A manually-composed collection of entailment rules which define parse tree transformations. The rules cover generic syntactic phenomena such as appositions, conjunctions, passive, relative clause, etc. (Bar-Haim et al., AAAI-07)	1	Users
Polarity rule base (to be released)	Collection of Entailment Rules	Bar-Ilan University; Tel-Aviv University	A manually-composed collection of entailment rules which detect predicates whose polarity is negative (e.g. didn't dance) or unknown (e.g. plans to dance). The rules capture diverse phenomena that affect polarity, e.g. verbal negation, modal verbs, conditionals, and certain verbs that induce negative or "unknown" polarity context. The latter were taken mainly from VerbNet. Extends a resource described in (Bar-Haim et al., AAAI-07)	1	Users
Lexical-Syntactic rule base combining WordNet, NomLex-plus and Unary DIRT	Collection of Entailment Rules	Bar-Ilan University; Tel-Aviv University	Extract lexical-syntactic entailment rules for predicates (verbal and nominal), including argument mapping. The resource is based on WordNet, Nomlex-Plus and Unary DIRT (Szpektor and Dagan, Coling 08)	1	Users
Lexical reference rules extracted from Wikipedia (to be released)	Collection of Entailment Rules	Bar-Ilan University; Tel-Aviv University	Extraction of lexical entailment rules from the text body (first sentence), and from metadata (links, redirects, parentheses)	1	Users
OPENU Collection	Collection of Entailment Rules and Patterns	Open University	Collections of rules, patterns etc. for RTE purpose, extracted from Reuter corpus parsed using Minipar.	1	Users
New resource			Participants are encouraged to contribute		Users
New resource			Participants are encouraged to contribute		Users

[*] The number of Users (see "Usage Info" links for details) refers to participants in the last two RTE challenges.
RTE-3 data have been provided only by participants, whereas RTE-4 data have been integrated with information extracted from the related proceedings.

@@ Line 12: / Line 12: @@
 ! Resource
 ! Type
-! Authors
+! Author
 ! class="unsortable"|Brief description
 ! RTE Users*
@@ Line 19: / Line 19: @@
 | [[WordNet]]
 | Lexical DB
-| George A. Miller (project director) - <br>Princeton University
+| Princeton University
 | Lexical database of English nouns, verbs, adjectives and adverbs
 | style="text-align: center;"|23
@@ Line 26: / Line 26: @@
 | [http://verbs.colorado.edu/~mpalmer/projects/verbnet.html Verbnet]
 | Lexical DB
-| Martha Palmer, Karin Kipper - <br>University of Colorado Boulder
+| University of Colorado Boulder
 | Lexicon for English verbs organized into classes extending Levin (1993) classes through refinement and addition of subclasses to achieve syntactic and semantic coherence among members of a class
 | style="text-align: center;"|3
@@ Line 33: / Line 33: @@
 | [[VerbOcean]]
 | Lexical DB
-| Timothy Chklovski and Patrick Pantel - <br>Information Sciences Institute, University of Southern California
+| Information Sciences Institute, University of Southern California
 | Broad-coverage semantic network of verbs
 | style="text-align: center;"|5
@@ Line 40: / Line 40: @@
 | [http://framenet.icsi.berkeley.edu/ FrameNet]
 | Lexical DB
-| Charles J. Fillmore (project director) - <br>ICSI (International Computer Science Institute) - Berkley University
+| ICSI (International Computer Science Institute) - Berkley University
 | Lexical resource for English words, based on frame semantics (valences) and supported by corpus evidence
 | style="text-align: center;"|2
@@ Line 47: / Line 47: @@
 | [http://nlp.cs.nyu.edu/meyers/NomBank.html NomBank]
 | Lexical DB
-| Adam Meyers, Ruth Reeves, Catherine Macleod, Rachel Szekely, Veronika Zielinska, Brian Young - <br>New York University
+| New York University
 | Lexical resource containing syntactic frames for nouns, extracted from annotated corpora
 | style="text-align: center;"|2
@@ Line 54: / Line 54: @@
 | [http://verbs.colorado.edu/~mpalmer/projects/ace.html PropBank]
 | Lexical DB
-| Martha Palmer, Mitch Marcus - <br>University of Colorado Boulder
+| University of Colorado Boulder
 | Lexical resource containing syntactic frames for verbs, extracted from annotated corpora
 | style="text-align: center;"|2
@@ Line 61: / Line 61: @@
 | [http://nlp.cs.nyu.edu/nomlex/index.html Nomlex] Plus
 | Lexical DB
-| Catherine Macleod, Ralph Grishman, Adam Meyers, Leslie Barrett and Ruth Reeves - <br>New York University
+| New York University
 | Dictionary of English nominalizations: it describes the allowed complements for a nominalization and relates the nominal complements to the arguments of the corresponding verb
 | style="text-align: center;"|1
@@ Line 75: / Line 75: @@
 | [[TEASE]] Collection
 | Collection of Entailment Rules
-| Idan Szpektor - <br>Bar Ilan University
+| Bar-Ilan University
 | Output of the TEASE algorithm
 | style="text-align: center;"|0
@@ Line 96: / Line 96: @@
 | [http://www.cs.ualberta.ca/~lindek/downloads.htm Dekang Lin’s Thesaurus]
 | Thesaurus
-| Dekang Lin - <br>University of Alberta
+| University of Alberta
 | Thesaurus automatically constructed using a parsed corpus, based on distributional similarity scores
 | style="text-align: center;"|1
@@ Line 110: / Line 110: @@
 | [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13 Web1T 5-grams]
 | Word list
-| Thorsten Brants, Alex Franz -  <br>Linguistic Data Consortium, University of Pennsylvania; Google Inc.
+| Linguistic Data Consortium, University of Pennsylvania; Google Inc.
 | Data set containing English word n-grams and their observed frequency counts. The n-gram counts were generated from approximately 1 trillion word tokens of text from publicly accessible Web pages
 | style="text-align: center;"|1
@@ Line 131: / Line 131: @@
 | [http://nlp.cs.nyu.edu/paraphrase/ Sekine's Paraphrase Database]
 | Collection of paraphrases
-| Satoshi Sekine - <br>Department of Computer Science, New York University
+| Department of Computer Science, New York University
 | Data-base created using Sekine's method, NOT cleaned up by human. It includes 19,975 sets of paraphrases with 191,572 phrases.
 | style="text-align: center;"| 0
@@ Line 145: / Line 145: @@
 | [http://www.cs.cornell.edu/~cristian/Without_a_doubt_-_Data.html Downward entailing operators]
 | Collection of entailing operators
-| Cristian Danescu-Niculescu-Mizil - <br>Department of Computer Science, Cornell University, Ithaca NY
+| Department of Computer Science, Cornell University, Ithaca NY
 | System output of an unsupervised algorithm recovering many Downward Entailing operators, like 'doubt'.
 | style="text-align: center;"| 0
@@ Line 174: / Line 174: @@
 ! Resource
 ! Type
-! Authors
+! Author
 ! class="unsortable"|Brief description
 ! RTE Users*
@@ Line 188: / Line 188: @@
 | [[DIRT Paraphrase Collection]]
 | Collection of paraphrases
-| Dekang Lin and Patrick Pantel -<br>University of Alberta
+| University of Alberta
 | Output of the DIRT algorithm
 | style="text-align: center;"|4

Difference between revisions of "RTE Knowledge Resources"

Revision as of 09:33, 19 June 2009

Contents

Publicly available Resources

Not available Resources

Navigation menu

Search