Difference between revisions of "Textual Entailment Resource Pool"

From ACL Wiki
Jump to navigation Jump to search
(Undo revision 8570 by Erel Segal (Talk))
(Add a link to the Guardian Headline Entailment Training Dataset)
(21 intermediate revisions by 9 users not shown)
Line 1: Line 1:
 +
[[Textual Entailment]] > '''Resources''':
 +
----
 +
 
[[Textual Entailment|Textual entailment]] systems rely on many different types of [[Natural Language Processing|NLP]] resources, including term banks, paraphrase lists, parsers, named-entity recognizers, etc. With so many resources being continuously released and improved, it can be difficult to know which particular resource to use when developing a system.
 
[[Textual Entailment|Textual entailment]] systems rely on many different types of [[Natural Language Processing|NLP]] resources, including term banks, paraphrase lists, parsers, named-entity recognizers, etc. With so many resources being continuously released and improved, it can be difficult to know which particular resource to use when developing a system.
  
Line 13: Line 16:
 
* [http://l2r.cs.uiuc.edu/~cogcomp/kindleDemo.php Entailment Demo] (from the University of Illinois at Urbana-Champaign) - INACTIVE (as of 2010-12-22)
 
* [http://l2r.cs.uiuc.edu/~cogcomp/kindleDemo.php Entailment Demo] (from the University of Illinois at Urbana-Champaign) - INACTIVE (as of 2010-12-22)
 
* [http://edits.fbk.eu/ EDITS - Edit Distance Textual Entailment Suite] (open source software developed by [http://hlt.fbk.eu/ Human Language Technology (HLT) group at FBK-Irst])
 
* [http://edits.fbk.eu/ EDITS - Edit Distance Textual Entailment Suite] (open source software developed by [http://hlt.fbk.eu/ Human Language Technology (HLT) group at FBK-Irst])
 +
* [http://www.cs.biu.ac.il/~nlp/downloads/biutee/index.html BIUTEE] - Bar Ilan University Textual Entailment Engine (open source).
  
 
== RTE data sets ==
 
== RTE data sets ==
 +
=== Past campaigns data sets ===
 +
* [http://pascallin.ecs.soton.ac.uk/Challenges/RTE/Datasets RTE1 dataset] - provided by [http://pascallin.ecs.soton.ac.uk PASCAL]
 +
* [http://pascallin.ecs.soton.ac.uk/Challenges/RTE2/Datasets RTE2 dataset] - provided by [http://pascallin.ecs.soton.ac.uk PASCAL]
 +
* [http://pascallin.ecs.soton.ac.uk/Challenges/RTE3/Datasets RTE3 dataset] - provided by [http://pascallin.ecs.soton.ac.uk PASCAL]
 +
* [http://www.nist.gov/tac/data/past/2008/RTE-4.html RTE4 dataset] - provided by [http://www.nist.gov/index.html NIST] - freely available upon request. For details see [http://www.nist.gov/tac/data/forms/index.html TAC User Agreements]
 +
* [http://www.nist.gov/tac/data/past/2009/RTE-5.html RTE5 dataset] - provided by [http://www.nist.gov/index.html NIST] - freely available upon request. For details see [http://www.nist.gov/tac/data/forms/index.html TAC User Agreements]
 +
* [http://www.nist.gov/tac/data/past/2010/RTE-6_Main_Task.html RTE6 dataset] - provided by [http://www.nist.gov/index.html NIST] - freely available upon request. For details see [http://www.nist.gov/tac/data/forms/index.html TAC User Agreements]
 +
* [http://www.nist.gov/tac/2011/RTE/index.html RTE7 dataset] - provided by [http://www.nist.gov/index.html NIST] - freely available upon request. For details see [http://www.nist.gov/tac/data/forms/index.html TAC User Agreements]
 +
 +
 +
=== Other data sets ===
 
* [http://www.coli.uni-saarland.de/projects/salsa/fate FrameNet manually annotated RTE 2006 Test Set.] Provided by  [http://www.coli.uni-saarland.de/projects/salsa/ SALSA project, Saarland University.]
 
* [http://www.coli.uni-saarland.de/projects/salsa/fate FrameNet manually annotated RTE 2006 Test Set.] Provided by  [http://www.coli.uni-saarland.de/projects/salsa/ SALSA project, Saarland University.]
 
* [http://www.cs.biu.ac.il/~nlp/files/RTE_2006_Aligned.zip Manually Word Aligned RTE 2006 Data Sets.] Provided by  [http://research.microsoft.com/nlp/ the Natural Language Processing Group, Microsoft Research.]
 
* [http://www.cs.biu.ac.il/~nlp/files/RTE_2006_Aligned.zip Manually Word Aligned RTE 2006 Data Sets.] Provided by  [http://research.microsoft.com/nlp/ the Natural Language Processing Group, Microsoft Research.]
Line 22: Line 37:
 
* [http://www.nist.gov/tac/data/ RTE-5 Search Pilot Data Set annotated with anaphora and coreference information] - RTE-5 Search Data Set annotated with anaphora/coreference information + Augmented RTE-5 Search Data Set, where all the referring expressions which need to be resolved in the entailing sentences are substituted by explicit expressions on the basis of the anaphora/coreference annotation. Provided by [http://www.celct.it/ CELCT] and distributed by [http://www.nist.gov/index.html NIST] at the [http://www.nist.gov/tac/data/ Past TAC Data] web page (2009 Search Pilot, annotated test/dev data).
 
* [http://www.nist.gov/tac/data/ RTE-5 Search Pilot Data Set annotated with anaphora and coreference information] - RTE-5 Search Data Set annotated with anaphora/coreference information + Augmented RTE-5 Search Data Set, where all the referring expressions which need to be resolved in the entailing sentences are substituted by explicit expressions on the basis of the anaphora/coreference annotation. Provided by [http://www.celct.it/ CELCT] and distributed by [http://www.nist.gov/index.html NIST] at the [http://www.nist.gov/tac/data/ Past TAC Data] web page (2009 Search Pilot, annotated test/dev data).
 
* [http://www.investigacion.frc.utn.edu.ar/mslabs/~jcastillo/Sagan-test-suite/ RTE-3-Expanded, RTE-4-Expanded, RTE-5-Expanded.] RTE data set expanded in the two and three way task, at least 2000 pairs in each data set.
 
* [http://www.investigacion.frc.utn.edu.ar/mslabs/~jcastillo/Sagan-test-suite/ RTE-3-Expanded, RTE-4-Expanded, RTE-5-Expanded.] RTE data set expanded in the two and three way task, at least 2000 pairs in each data set.
 +
* [https://agora.cs.illinois.edu/display/rtedata/Explanation+Based+Analysis+of+RTE+Data Explanation-Based Analysis annotation of RTE 5 Main Task subset] described in [http://l2r.cs.uiuc.edu/~danr/Papers/SammonsVyRo10.pdf  this ACL 2010 paper]
 +
* [http://art.uniroma2.it/zanzotto/resources/WIKI_FINAL_CORPUS_v1.zip Wiki Entailment Corpus] A RTE-like set of entailment pairs extracted from Wikipedia revisions described in [http://aclweb.org/anthology/W/W10/W10-3504.pdf  this paper]
 +
* [https://github.com/daoudclarke/rte-experiment The Guardian Headlines Entailment Training Dataset] An automatically generated dataset of 32,000 pairs similar to the RTE-1 dataset.
  
 
== Knowledge Resources ==
 
== Knowledge Resources ==
Line 29: Line 47:
 
* [[RTE Knowledge Resources#Ablation tests|the ablation tests]] carried out in the RTE challenges in order to evaluate the impact of knowledge resources and tools on TE system performances;
 
* [[RTE Knowledge Resources#Ablation tests|the ablation tests]] carried out in the RTE challenges in order to evaluate the impact of knowledge resources and tools on TE system performances;
 
* [[RTE Knowledge Resources#Publicly available Resources|lists of knowledge resources]], both publically available and unpublished, used by systems participating in the last RTE challenges.
 
* [[RTE Knowledge Resources#Publicly available Resources|lists of knowledge resources]], both publically available and unpublished, used by systems participating in the last RTE challenges.
* [https://agora.cs.illinois.edu/display/rtedata/Explanation+Based+Analysis+of+RTE+Data Explanation-Based Analysis annotation of RTE 5 Main Task subset] described in [http://l2r.cs.uiuc.edu/~danr/Papers/SammonsVyRo10.pdf  this ACL 2010 paper]
+
<!-- * [https://agora.cs.illinois.edu/display/rtedata/Explanation+Based+Analysis+of+RTE+Data Explanation-Based Analysis annotation of RTE 5 Main Task subset] described in [http://l2r.cs.uiuc.edu/~danr/Papers/SammonsVyRo10.pdf  this ACL 2010 paper] -->
  
 
== Tools ==
 
== Tools ==
Line 39: Line 57:
  
 
=== Role Labelling ===
 
=== Role Labelling ===
* [http://cemantix.org/assert ASSERT]
+
* [http://cemantix.org/assert.html ASSERT]
 
* [http://www.coli.uni-saarland.de/projects/salsa/shal/ Shalmaneser]
 
* [http://www.coli.uni-saarland.de/projects/salsa/shal/ Shalmaneser]
 
* [http://l2r.cs.uiuc.edu/~cogcomp/asoftware.php?skey=SRL Semantic Role Labeler] - from the University of Illinois at Urbana-Champaign, see a [http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php web demo] of this tool
 
* [http://l2r.cs.uiuc.edu/~cogcomp/asoftware.php?skey=SRL Semantic Role Labeler] - from the University of Illinois at Urbana-Champaign, see a [http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php web demo] of this tool
Line 56: Line 74:
  
 
* [http://www.semantilog.org/pypes.html PyPES] general purpose library containing evaluation environment for RTE and McPIET text inference engine based on the ERG (English Resource Grammar)
 
* [http://www.semantilog.org/pypes.html PyPES] general purpose library containing evaluation environment for RTE and McPIET text inference engine based on the ERG (English Resource Grammar)
 +
 +
=== Text Normalizers ===
 +
[http://u.cs.biu.ac.il/~nlp/downloads/normalizer.html Java number normalizer (Beta)]
 +
A tool for converting textual representations of numbers to a standard numerical string.
 +
 +
== References ==
 +
 +
*[[Textual Entailment References#Workshops and Tutorials | Workshops and Tutorials ]]
 +
*[[Textual Entailment References#Papers in recent conferences and other workshops | Papers in recent conferences and other workshops ]]
 +
*[[Textual Entailment References#Journal papers | Journal papers ]]
  
 
== Links ==
 
== Links ==
 
* [http://homepages.inf.ed.ac.uk/jbos/rte/ Textual Entailment site by Johan Bos]
 
* [http://homepages.inf.ed.ac.uk/jbos/rte/ Textual Entailment site by Johan Bos]
* [http://ai-nlp.info.uniroma2.it/te/ Textual Entailment at the University of Rome "Tor Vergata"]
+
* [http://ai-nlp.info.uniroma2.it/research/te/ Textual Entailment at the University of Rome "Tor Vergata"]
 
[[Category:Textual Entailment Portal]]
 
[[Category:Textual Entailment Portal]]
* [http://l2r.cs.uiuc.edu/~cogcomp/entailment-module-demos.php Illinois Textual Entailment System Component demos]
+
* [http://cogcomp.cs.illinois.edu/page/demo_view/18 Illinois Textual Entailment System Component demos]

Revision as of 08:11, 12 October 2012

Textual Entailment > Resources:


Textual entailment systems rely on many different types of NLP resources, including term banks, paraphrase lists, parsers, named-entity recognizers, etc. With so many resources being continuously released and improved, it can be difficult to know which particular resource to use when developing a system.

In response, the Recognizing Textual Entailment (RTE) shared task community initiated a new activity for building this Textual Entailment Resource Pool. RTE participants and any other member of the NLP community are encouraged to contribute to the pool.

In an effort to determine the relative impact of the resources, RTE participants are strongly encouraged to report, whenever possible, the contribution to the overall performance of each utilized resource. Formal qualitative and quantitative results should be included in a separate section of the system report as well as posted on the talk pages of this Textual Entailment Resource Pool.

Adding a new resource is very easy. See how to use existing templates to do this in Help:Using Templates.

Complete RTE Systems

RTE data sets

Past campaigns data sets


Other data sets

Knowledge Resources

The RTE Knowledge Resources page presents:

  • a call for resources, inviting system developers to share the resources used by their own TE engines, to both help improve the TE technology and further test and evaluate such resources;
  • the ablation tests carried out in the RTE challenges in order to evaluate the impact of knowledge resources and tools on TE system performances;
  • lists of knowledge resources, both publically available and unpublished, used by systems participating in the last RTE challenges.

Tools

Parsers

Role Labelling

Entity Recognition Tools

Similarity / Relatedness Tools

  • UKB: Open source WordNet-based similarity/relatedness tool, includes also pre-computed semantic vectors for all words

Corpus Readers

  • NLTK provides a corpus reader for the data from RTE Challenges 1, 2, and 3 - see the Corpus Readers Guide for more information.

Related Libraries

  • PyPES general purpose library containing evaluation environment for RTE and McPIET text inference engine based on the ERG (English Resource Grammar)

Text Normalizers

Java number normalizer (Beta) A tool for converting textual representations of numbers to a standard numerical string.

References

Links