Difference between revisions of "RTE5 - Ablation Tests"

Latest revision as of 04:18, 25 June 2012

The following table lists the results of the ablation tests submitted by participants, which have been introduced as a mandatory track in the RTE5 campaign.

The first column contains the specific resources which have been ablated.
The second column lists the Team Run in the form [name_of_the_Team][number_of_the_submitted_run].[submission_task] (e.g. BIU1.2way, Boeing3.3way).
The third and fourth columns present the normalized difference between the accuracy of the complete system run and the accuracy of the ablation run (i.e. the output of the complete system without the ablated resource), showing the impact of the resource on the performance of the system. The third refers to the score obtained in the 2-way task, the fourth to that obtained in the 3-way task. For all the runs submitted as 3-way, also the 2-way derived accuracy has been calculated.
Finally, the fifth column contains a brief description of the specific usage of the resource. It is based on the information provided both in the "readme" files submitted together with the ablation tests and in the system reports published in the RTE5 proceedings.

Participants are kindly invited to check if all the inserted information is correct and complete.

Ablated Resource	Team Run^[1]	Resource impact - 2way	Resource impact - 3way	Resource Usage Description
Acronym guide	Siel_093.3way	0	0	The acronyms are expanded using the acronym database, so the acronyms are also matched with the expanded acronyms, and entailment is predicted accordingly
Acronym guide + UAIC_Acronym_rules	UAIC20091.3way	0.17	0.16	We start from acronym-guide, but additional we use a rule that consider for expressions like Xaaaa Ybbbb Zcccc the acronym XYZ, regardless of length of text with this form.
DIRT	BIU1.2way	1.33	—	Inference rules
DIRT	Boeing3.3way	-1.17	0	Verb paraphrases
DIRT	UAIC20091.3way	0.17	0.33	We transform text and hypothesis with MINIPAR into dependency trees: use of DIRT relations to map verbs in T with verbs in H
Framenet+ WordNet	DLSIUAES1.2way	1.16	—	Frame-to-frame similarity metric
Framenet+ WordNet	DLSIUAES1.3way	-0.17	-0.17	Frame-to-frame similarity metric
Framenet	UB.dmirg3.2way	0	—	If two lexical items are covered in a single FrameNet frame, then the two items are treated as semantically related.
Grady Ward’s MOBY Thesaurus + Roget's Thesaurus	Venses2.2way	2.83	—	Semantic fields are used as semantic similarity matching, in all cases of non identical lemmas
MontyLingua Tool	Siel_093.3way	0	0	For the VerbOcean, the verbs have to be in the base form. We used the "MontyLingua" tool to convert the verbs into their base form
NEGATION_rules by UAIC	UAIC20091.3way	0	-1.34	Negation rules check in the dependency trees on verbs descending branches to see if some categories of words that change the meaning are found.
NER (RASP Parser nertag)	JU_CSE_TAC1.2way	0	—	Named Entity match: measure based on the number of Nes in the hypothesis that match in the corresponding text. For named entity recognition, the RASP Parser (Briscoe et al., 2006) nertag component has been used.
NE component	UI_ccg1.2way	4.83	—	Named Entity recognition/comparison
PropBank	cswhu1.3way	2	3.17	syntactic and semantic parsing
Stanford NER	QUANTA1.2way	0.67	—	We use Named Entity similarity as a feature
Stopword list	FBKirst1.2way	1.5	—	A list of the 572 most frequent English words has been collected in order to prevent assigning high costs to the deletion/insertion of terms that are unlikely to bring relevant information to detect entailment,and to avoid substituting these terms with any content word.
Training data from RTE1, 2, 3	PeMoZa3.2way	0	—
Training data from RTE2	PeMoZa3.2way	0.66	—
Training data from RTE2, 3	PeMoZa3.2way	0	—
VerbOcean	DFKI1.3way	0	0.17	VerbOcean relations are used to calculate relatedness between verbs in T and H
VerbOcean	DFKI2.3way	0.33	0.5	VerbOcean relations are used to calculate relatedness between verbs in T and H
VerbOcean	DFKI3.3way	0.17	0.17	VerbOcean relations are used to calculate relatedness between verbs in T and H
VerbOcean	FBKirst1.2way	-0.16	—	Extraction of 18232 entailment rules for all the English verbs connected by the ”stronger-than” relation. For instance, if ”kill [stronger-than] injure”, then the rule ”kill ENTAILS injure” is added to the rules repository.
VerbOcean	QUANTA1.2way	0	—	We use "opposite-of" relation in VerbOcean as a feature
VerbOcean	Siel_093.3way	0	0	Similarity/anthonymy/unrelatedness between verbs
WikiPedia	BIU1.2way	-1	—	Lexical rules extracted from Wikipedia definition sentences, title parenthesis, redirect and hyperlink relations
WikiPedia	cswhu1.3way	1.33	3.34	Lexical semantic rules
WikiPedia	FBKirst1.2way	1	—	Rules extracted from WP using Latent Semantic Analysis (LSA)
WikiPedia	UAIC20091.3way	1.17	1.5	Relations between named entities
Wikipedia + NER's (LingPipe, GATE) + Perl patterns	UAIC20091.3way	6.17	5	NE module: NERs, in order to identify Persons, Locations, Jobs, Languages, etc; Perl patterns built by us for RTE4 in order to identify numbers and dates; our own resources extracted from Wikipedia in order to identify a "distance" between one name entity from hypothesis and name entities from text
WordNet	AUEBNLP1.3way	-2	-2.67	Synonyms
WordNet	BIU1.2way	2.5	—	Synonyms, hyponyms (2 levels away from the original term), hyponym_instance and derivations
WordNet	Boeing3.3way	4	5.67	Wordnet synonyms, hypernyms relationships between (senses of) words, "similar" (SIM), "pertains" (PER), and "derivational" (DER) links to recognize equivalence between T and H
WordNet	DFKI1.3way	-0.17	0	Argument alignment between T and H
WordNet	DFKI2.3way	0.16	0.34	Argument alignment between T and H
WordNet	DFKI3.3way	0.17	0.17	Argument alignment between T and H
WordNet	DLSIUAES1.2way	0.83	—	Similarity between lemmata, computed by WordNet-based metrics
WordNet	DLSIUAES1.3way	-0.5	-0.33	Similarity between lemmata, computed by WordNet-based metrics
WordNet	JU_CSE_TAC1.2way	0.34	—	WordNet based Unigram match: if any synset for the H unigram matches with any synset of a word in T then the hypothesis unigram is considered as a WordNet based unigram match.
WordNet	PeMoZa1.2way	-0.5	—	Derivational Morphology from WordNet
WordNet	PeMoZa1.2way	1.33	—	Verb Entailment from Wordnet
WordNet	PeMoZa2.2way	1	—	Derivational Morphology from WordNet
WordNet	PeMoZa2.2way	-0.33	—	Verb Entailment from Wordnet
WordNet	QUANTA1.2way	-0.17	—	We use several relations from wordnet, such as synonyms, hyponym, hypernym et al.
WordNet	Rhodes1.3way	3.17	4	Lexicon based match: we chose a very simple metric: matching between words in T and H based on a path of distance at most 2 in the WordNet graph, using any links (hyponymy, hypernymy, meronymy, pertainymy, etc.)
WordNet	Sagan1.3way	0	-0.83	The system is based on machine learning approach. The ablation test was obtained with 2 less features using WordNet (namely, string similarity based on Levenshtein distance and semantic similarity) in the training and testing steps.
WordNet	Siel_093.3way	0.34	-0.17	Similarity between nouns using WN tool
WordNet	ssl1.3way	0	0.67	WordNet Analysis
WordNet	UB.dmirg3.2way	0	—	Synonyms, hypernyms (2 levels away from the original term)
WordNet	UI_ccg1.2way	4	—	Word similarity == identity
WordNet + FrameNet	UB.dmirg3.2way	0	—	WN: synonyms, hypernyms (2 levels away from the original term). FN: if two lexical items are covered in a single FrameNet frame, then the two items are treated as semantically related.
WordNet + VerbOcean	DFKI1.3way	0	0.17	VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs.
WordNet + VerbOcean	DFKI2.3way	0.5	0.67	VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs.
WordNet + VerbOcean	DFKI3.3way	0.17	0.17	VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs.
WordNet + VerbOcean	UAIC20091.3way	2	1.50	Contradiction identification
WordNet + VerbOcean + DLSIUAES_negation_list	DLSIUAES1.2way	0.66	—	Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by participant themselves)
WordNet + VerbOcean + DLSIUAES_negation_list	DLSIUAES1.3way	-1	-0.5	Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by participant themselves)
WordNet + XWordNet	UAIC20091.3way	1	1.33	Synonymy, hyponymy and hypernymy and eXtended WordNet relation
System component	DirRelCond3.2way	4.67	—	The ablation test (abl-1) was meant to test one component of the most complex condition for entailment used in step 3 of the system
System component	DirRelCond3.2way	-1.5	—	The ablation test (abl-2) was meant to test one component of the most complex condition for entailment used in step 3 of the system
System component	DirRelCond3.2way	0.17	—	The ablation test (abl-3) was meant to test one component of the most complex condition for entailment used in step 3 of the system
System component	DirRelCond3.2way	-1.16	—	The ablation test (abl-4) was meant to test one component of the most complex condition for entailment used in step 3 of the system
System component	DirRelCond3.2way	4.17	—	The ablation test (abl-5) was meant to test one component of the most complex condition for entailment used in step 3 of the system
Other	UAIC20091.3way	4.17	4	Pre-processing module, using MINIPAR, TreeTagger tool and some transformations, e.g. hasn't > has not
Other	DLSIUAES1.2way	1	—	Everything ablated except lexical-based metrics
Other	DLSIUAES1.2way	3.33	—	Everything ablated except semantic-derived inferences
Other	DLSIUAES1.3way	-0.17	-0.33	Everything ablated except lexical-based metrics
Other	DLSIUAES1.3way	2.33	3.17	Everything ablated except semantic-derived inferences
Other	FBKirst1.2way	2.84	—	The automatic estimation of operation costs from run-1 modules was removed: the set of costs were assigned manually.
Other	JU_CSE_TAC1.2way	0	—	Skip bigram match
Other	JU_CSE_TAC1.2way	0	—	Bigram match
Other	JU_CSE_TAC1.2way	-0.5	—	Longest Common Subsequence
Stemmer	JU_CSE_TAC1.2way	-0.5	—	Stemming, using WordNet stemmer
Other	PeMoZa1.2way	-2.5	—	Idf score
Other	PeMoZa1.2way	-0.66	—	Proper Noun Levenstain Distance
Other	PeMoZa1.2way	0.34	—	J&C (Jiang and Conrath, 1997) similarity score on nouns, adjectives
Other	PeMoZa2.2way	1	—	Idf score
Other	PeMoZa2.2way	0.17	—	Proper Noun Levenstain Distance
Other	PeMoZa2.2way	0.5	—	J&C (Jiang and Conrath, 1997) similarity score on nouns, adjectives
Other	Rhodes1.3way	-0.17	-0.17	Acronym match: we match words in all caps against sequences of capitalized words whose initial characters concatenate to form the acronym
Other	Rhodes1.3way	3.33	1.83	Proper nouns match: exact string match between T and H, for proper nouns
Other	Rhodes1.3way	0.33	0.17	Numbers match: exact string match between T and H, for numbers
Other	Rhodes1.3way	3.17	4	Edit-distance-based matching: 2 words match if 80% of the letters of a H word occur in one or more adjacent T words in the same order
Other	UI_ccg1.2way	1	—	Less sophisticated NE similarity metric: mainly Jaro-Winkler-based

Footnotes

↑ For further information about participants, click here: RTE Challenges - Data about participants

   Return to RTE Knowledge Resources

[1] For further information about participants, click here: RTE Challenges - Data about participants

[1]

Difference between revisions of "RTE5 - Ablation Tests"

Latest revision as of 04:18, 25 June 2012

Footnotes

Navigation menu

Search

@@ Line 1: / Line 1: @@
+The following table lists the results of the ablation tests submitted by participants, which have been introduced as a mandatory track in the RTE5 campaign.<br><br>
+The first column contains the specific resources which have been ablated.<br>The second column lists the Team Run in the form ''[name_of_the_Team][number_of_the_submitted_run].[submission_task]'' (e.g. BIU1.2way, Boeing3.3way).<br>The third and fourth columns present the normalized difference between the accuracy of the complete system run and the accuracy of the ablation run (i.e. the output of the complete system without the ablated resource), showing the impact of the resource on the performance of the system. The third refers to the score obtained in the 2-way task, the fourth to that obtained in the 3-way task. For all the runs submitted as 3-way, also the 2-way derived accuracy has been calculated.<br>
+Finally, the fifth column contains a brief description of the specific usage of the resource. It is based on the information provided both in the "readme" files submitted together with the ablation tests and in the system reports published in the RTE5 proceedings.<br><br>
+Participants are kindly invited to check if all the inserted information is correct and complete.
 {|class="wikitable sortable" cellpadding="3" cellspacing="0" border="1"
 |- bgcolor="#CDCDCD"
 ! Ablated Resource
-! Team Run
+! Team Run<ref>For further information about participants, click here: [[RTE Challenges - Data about participants]]</ref>
-! <small>&Delta; Accuracy % - 2way</small>
+! <small>Resource impact - 2way</small>
-! <small>&Delta; Accuracy % - 3way</small>
+! <small>Resource impact - 3way</small>
 ! Resource Usage Description
@@ Line 13: / Line 20: @@
 | style="text-align: center;"| 0
 | style="text-align: center;"| 0
-| Acronym Resolution
+| The acronyms are expanded using the acronym database, so the acronyms are also matched with the expanded acronyms, and entailment is predicted accordingly
 |- bgcolor="#ECECEC" "align="left"
@@ Line 34: / Line 41: @@
 | style="text-align: center;"| -1.17
 | style="text-align: center;"| 0
-|
+| Verb paraphrases
 |- bgcolor="#ECECEC" "align="left"
@@ Line 44: / Line 51: @@
 |- bgcolor="#ECECEC" "align="left"
-| Framenet
+| Framenet+ <br/>WordNet
 | DLSIUAES1.2way
 | style="text-align: center;"| 1.16
 | style="text-align: center;"| &mdash;
-| frame-to-frame similarity metric
+| Frame-to-frame similarity metric
 |- bgcolor="#ECECEC" "align="left"
-| Framenet
+| Framenet+ <br/>WordNet
 | DLSIUAES1.3way
 | style="text-align: center;"| -0.17
 | style="text-align: center;"| -0.17
-| frame-to-frame similarity metric
+| Frame-to-frame similarity metric
 |- bgcolor="#ECECEC" "align="left"
@@ Line 62: / Line 69: @@
 | style="text-align: center;"| 0
 | style="text-align: center;"| &mdash;
-|
+| If two lexical items are covered in a single FrameNet frame, then the two items are treated as semantically related.
 |- bgcolor="#ECECEC" "align="left"
 | Grady Ward’s MOBY Thesaurus + <br>Roget's Thesaurus
-| VensesTeam2.2way
+| Venses2.2way
 | style="text-align: center;"| 2.83
 | style="text-align: center;"| &mdash;
@@ Line 86: / Line 93: @@
 |- bgcolor="#ECECEC" "align="left"
-| NER
+| NER (RASP Parser nertag)
+| JU_CSE_TAC1.2way
+| style="text-align: center;"| 0
+| style="text-align: center;"| &mdash;
+| Named Entity match: measure based on the number of Nes in the hypothesis that match in the corresponding text. For named entity recognition, the RASP Parser (Briscoe et al., 2006) nertag  component has been used.
+|- bgcolor="#ECECEC" "align="left"
+| NE component
 | UI_ccg1.2way
 | style="text-align: center;"| 4.83
@@ Line 111: / Line 125: @@
 | style="text-align: center;"| 1.5
 | style="text-align: center;"| &mdash;
-|
+| A list of the 572 most frequent English words has been collected in order to prevent assigning high costs to the deletion/insertion of terms that are unlikely to bring relevant information to detect entailment,and to avoid substituting these terms with any content word.
 |- bgcolor="#ECECEC" "align="left"
@@ Line 161: / Line 175: @@
 | style="text-align: center;"| -0.16
 | style="text-align: center;"| &mdash;
-| Rules extracted from VerbOcean
+| Extraction of 18232 entailment rules for all the English verbs connected by the ”stronger-than” relation. For instance, if ”kill [stronger-than] injure”, then the rule ”kill ENTAILS injure” is added to the rules repository.
 |- bgcolor="#ECECEC" "align="left"
@@ Line 231: / Line 245: @@
 | style="text-align: center;"| 4
 | style="text-align: center;"| 5.67
-|
+| Wordnet synonyms, hypernyms relationships between (senses of) words, "similar" (SIM), "pertains" (PER), and "derivational" (DER) links to recognize equivalence between T and H
 |- bgcolor="#ECECEC" "align="left"
@@ Line 238: / Line 252: @@
 | style="text-align: center;"| -0.17
 | style="text-align: center;"| 0
-|
+| Argument alignment between T and H
 |- bgcolor="#ECECEC" "align="left"
@@ Line 245: / Line 259: @@
 | style="text-align: center;"| 0.16
 | style="text-align: center;"| 0.34
-|
+| Argument alignment between T and H
 |- bgcolor="#ECECEC" "align="left"
@@ Line 252: / Line 266: @@
 | style="text-align: center;"| 0.17
 | style="text-align: center;"| 0.17
-|
+| Argument alignment between T and H
 |- bgcolor="#ECECEC" "align="left"
@@ Line 273: / Line 287: @@
 | style="text-align: center;"| 0.34
 | style="text-align: center;"| &mdash;
-| WordNet based Unigram match
+| WordNet based Unigram match: if any synset for the H unigram matches with any synset of a word in T then the hypothesis unigram is considered as a WordNet based unigram match.
 |- bgcolor="#ECECEC" "align="left"
@@ Line 309: / Line 323: @@
 | style="text-align: center;"| &mdash;
 | We use several relations from wordnet, such as synonyms, hyponym, hypernym et al.
+|- bgcolor="#ECECEC" "align="left"
+| WordNet
+| Rhodes1.3way
+| style="text-align: center;"| 3.17
+| style="text-align: center;"| 4
+| Lexicon based match: we chose a very simple metric: matching between words in T and H based on a path of distance at most 2 in the WordNet graph, using any links (hyponymy, hypernymy, meronymy, pertainymy, etc.)
 |- bgcolor="#ECECEC" "align="left"
@@ Line 315: / Line 336: @@
 | style="text-align: center;"| 0
 | style="text-align: center;"| -0.83
-| The system is based on machine learning approach. The ablation test was obtained with 2 less features using WordNet in the training and testing steps.
+| The system is based on machine learning approach. The ablation test was obtained with 2 less features using WordNet (namely, string similarity based on Levenshtein distance and semantic similarity) in the training and testing steps.
@@ Line 337: / Line 358: @@
 | style="text-align: center;"| 0
 | style="text-align: center;"| &mdash;
-|
+| Synonyms, hypernyms (2 levels away from the original term)
 |- bgcolor="#ECECEC" "align="left"
@@ Line 344: / Line 365: @@
 | style="text-align: center;"| 4
 | style="text-align: center;"| &mdash;
-| word similarity == identity
+| Word similarity == identity
 |- bgcolor="#ECECEC" "align="left"
@@ Line 351: / Line 372: @@
 | style="text-align: center;"| 0
 | style="text-align: center;"| &mdash;
-|
+| WN: synonyms, hypernyms (2 levels away from the original term). FN: if two lexical items are covered in a single FrameNet frame, then the two items are treated as semantically related.
 |- bgcolor="#ECECEC" "align="left"
@@ Line 358: / Line 379: @@
 | style="text-align: center;"| 0
 | style="text-align: center;"| 0.17
-| VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the verbal nouns into verbs.
+| VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs.
 |- bgcolor="#ECECEC" "align="left"
@@ Line 365: / Line 386: @@
 | style="text-align: center;"| 0.5
 | style="text-align: center;"| 0.67
-| Used to calculate relatedness between nominal predicates in T and H
+| VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs.
 |- bgcolor="#ECECEC" "align="left"
@@ Line 372: / Line 393: @@
 | style="text-align: center;"| 0.17
 | style="text-align: center;"| 0.17
-| Used to calculate relatedness between nominal predicates in T and H
+| VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs.
 |- bgcolor="#ECECEC" "align="left"
@@ Line 386: / Line 407: @@
 | style="text-align: center;"| 0.66
 | style="text-align: center;"| &mdash;
-| Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by ourselves)
+| Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by participant themselves)
 |- bgcolor="#ECECEC" "align="left"
@@ Line 393: / Line 414: @@
 | style="text-align: center;"| -1
 | style="text-align: center;"| -0.5
-| Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by ourselves)
+| Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by participant themselves)
 |- bgcolor="#ECECEC" "align="left"
@@ Line 438: / Line 459: @@
 |- bgcolor="#ECECEC" "align="left"
-| System component
+| Other
 | UAIC20091.3way
 | style="text-align: center;"| 4.17
 | style="text-align: center;"| 4
-| Pre-processing module
+| Pre-processing module, using MINIPAR, TreeTagger tool and some transformations, e.g. ''hasn't'' > ''has not''
 |- bgcolor="#ECECEC" "align="left"
@@ Line 484: / Line 505: @@
 | style="text-align: center;"| 0
 | style="text-align: center;"| &mdash;
-| Named Entity match
+| Skip bigram match
-|- bgcolor="#ECECEC" "align="left"
-| Other
-| JU_CSE_TAC1.2way
-| style="text-align: center;"| 0
-| style="text-align: center;"| &mdash;
-| Skip bigram
 |- bgcolor="#ECECEC" "align="left"
@@ Line 508: / Line 522: @@
 |- bgcolor="#ECECEC" "align="left"
-| Other
+| Stemmer
 | JU_CSE_TAC1.2way
 | style="text-align: center;"| -0.5
 | style="text-align: center;"| &mdash;
-| Unigram match after stemming
+| Stemming, using WordNet stemmer
 |- bgcolor="#ECECEC" "align="left"
@@ Line 555: / Line 569: @@
 | style="text-align: center;"| &mdash;
 | J&C (Jiang and Conrath, 1997) similarity score on nouns, adjectives
+|- bgcolor="#ECECEC" "align="left"
+| Other
+| Rhodes1.3way
+| style="text-align: center;"| -0.17
+| style="text-align: center;"| -0.17
+| Acronym match: we match words in all caps against sequences of capitalized words whose initial characters  concatenate to form the acronym
+|- bgcolor="#ECECEC" "align="left"
+| Other
+| Rhodes1.3way
+| style="text-align: center;"| 3.33
+| style="text-align: center;"| 1.83
+| Proper nouns match: exact string match between T and H, for proper nouns
+|- bgcolor="#ECECEC" "align="left"
+| Other
+| Rhodes1.3way
+| style="text-align: center;"| 0.33
+| style="text-align: center;"| 0.17
+| Numbers match: exact string match between T and H, for numbers
+|- bgcolor="#ECECEC" "align="left"
+| Other
+| Rhodes1.3way
+| style="text-align: center;"| 3.17
+| style="text-align: center;"| 4
+| Edit-distance-based matching: 2 words match if 80% of the letters of a H word occur in one or more adjacent T words in the same order
 |- bgcolor="#ECECEC" "align="left"
@@ Line 564: / Line 606: @@
 |}
+<br>
+==Footnotes==
+<references />
+    Return to [[RTE Knowledge Resources]]