Difference between revisions of "RTE6 - Ablation Tests"

From ACL Wiki
Jump to navigation Jump to search
m (Reverted edits by Creek (talk) to last revision by Amarchetti)
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The following table lists the results of the ablation tests (a mandatory track since the RTE5 campaign), submitted by participants to RTE6 .<br><br>
+
The following table lists the results of the ablation tests submitted by participants to RTE6 .<br>
 
+
The exploratory effort started in RTE5 as a mandatory track, was not only reiterated in RTE6, but also extended to tools.
 +
The first column contains the specific resources which have been ablated.<br>The second column lists the Team Run in the form ''[name_of_the_Team][number_of_the_submitted_run].[submission_task]'' (e.g. BIU1.2way, Boeing3.3way).<br>The third column presents the normalized difference between the accuracy of the complete system run and the accuracy of the ablation run (i.e. the output of the complete system without the ablated resource), showing the impact of the resource on the performance of the system.<br>
 +
The fourth column contains a brief description of the specific usage of the resource. It is based on the information provided both in the "readme" files submitted together with the ablation tests and in the system reports published in the RTE6 proceedings.<br>
 +
If the ablated resource is highlighted in yellow, it is a tool, otherwise is a knowledge resource.<br>
  
 
Participants are kindly invited to check if all the inserted information is correct and complete.
 
Participants are kindly invited to check if all the inserted information is correct and complete.
Line 220: Line 223:
  
 
|- bgcolor="#FFFFF" "align="left"
 
|- bgcolor="#FFFFF" "align="left"
| Shalmanesar Parser
+
| Shalmaneser Parser
 
| Sangyan1_abl-1
 
| Sangyan1_abl-1
 
| style="text-align: center;"| -1.76
 
| style="text-align: center;"| -1.76
| style="text-align: center;"| This ablation run was executed with the Shalmanesar parser ablated from our system. Our system uses the Shalmanesar Parser for Framenet frame assignment.
+
| style="text-align: center;"| This ablation run was executed with the Shalmaneser parser ablated from our system. Our system uses the Shalmaneser Parser for Framenet frame assignment.
Our system required minor tweaking of the source code in order ablate Shalmanesar Parser.
+
Our system required minor tweaking of the source code in order ablate Shalmaneser Parser.
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"

Latest revision as of 04:18, 25 June 2012

The following table lists the results of the ablation tests submitted by participants to RTE6 .
The exploratory effort started in RTE5 as a mandatory track, was not only reiterated in RTE6, but also extended to tools. The first column contains the specific resources which have been ablated.
The second column lists the Team Run in the form [name_of_the_Team][number_of_the_submitted_run].[submission_task] (e.g. BIU1.2way, Boeing3.3way).
The third column presents the normalized difference between the accuracy of the complete system run and the accuracy of the ablation run (i.e. the output of the complete system without the ablated resource), showing the impact of the resource on the performance of the system.
The fourth column contains a brief description of the specific usage of the resource. It is based on the information provided both in the "readme" files submitted together with the ablation tests and in the system reports published in the RTE6 proceedings.
If the ablated resource is highlighted in yellow, it is a tool, otherwise is a knowledge resource.

Participants are kindly invited to check if all the inserted information is correct and complete.


Ablated Component Ablation Run[1] Resource impact - F1 Resource Usage Description
WordNet BIU1_abl-1 0.9 No Word-Net. On Dev set: 39.18% (compared to 40.73% when WN is used)
CatVar BIU1_abl-2 0.63 No CatVar. On Dev set achieved about 40.20% (compared to 40.73% when CatVar is used)
Coreference resolver BIU1_abl-3 -0.88 No coreference resolver

On Dev set 41.62% (Compared to 40.73% when Coreference resolver is used). This ablation test is an unusual ablation test, since it shows that the co-reference resolution component has a negative impact.

DIRT Boeing1_abl-1 3.97 DIRT removed
WordNet Boeing1_abl-2 4.42 No WordNet
Name Normalization budapestcad2_abl-2 0.65 no name normalization was performed (e.g. George W. Bush -> Bush).
Named Entities Recognition budapestcad2_abl-3 -1.23 no NER
WordNet budapestcad2_abl-4 -1.11 No WordNet. (In the original run, WordNet was used to find the synonyms of words in the triplets, and additional triplets were generated from all possible combinations.)
WordNet deb_iitb1_abl-1 8.68 Wordnet is albated in this test.No change of code required only wordnet module is removed while matching.
VerbOcean deb_iitb1_abl-2 1.87 VerbOcean is albated in this test.No change of code required only VerbOcean module is removed while matching.
WordNet deb_iitb2_abl-1 7.9 Wordnet is albated in this test.No change of code required only wordnet module is removed while matching.
VerbOcean deb_iitb2_abl-2 0.94 VerbOcean is albated in this test.No change of code required only VerbOcean module is removed while matching.
WordNet deb_iitb3_abl-1 11.43 Wordnet is albated in this test. No change of code required only wordnet module is removed while matching.
WordNet deb_iitb3_abl-2 2.54 VerbOcean is albated in this test.No change of code required only VerbOcean module is removed while matching.
POS-Tagger DFKI1_abl-4 4.99 No wordform/POS-tags included for the comparison.
POS-Tagger DFKI1_abl-6 2.22 No named entity recognition for the comparison.
WordNet DFKI1_abl-7 -0.23 No WordNet similarity for the comparison.
Coreference resolver DFKI1_Main -1.54 Coreference resolution used for the comparison.
WordNet DirRelCond23_abl-1 8.43 WordNet removed. Only basic word comparison used instead of word relations.
Wikipedia FBK_irst3_Main -23.91 This run is produced by the system configuration for run3 and uses rules extracted from Wikipedia
Wikipedia FBK_irst3_Main -3.58 This run is produced by the system configuration for run3 and uses rules extracted from Wikipedia with probability above 0.7
Proximity similarity dictionary of Dekang Lin FBK_irst3_Main -7.79 This run is produced by the system configuration for run3 and uses rules extracted from proximity similarity dictionary of Dekang Lin
WordNet FBK_irst3_Main -3.21 This run is produced by the system configuration for run3 and uses rules extracted from WordNet
WordNet FBK_irst3_Main -2.08 This run is produced by the system configuration for run3 and uses rules extracted from WordNet with probability above 0.7
VerbOcean FBK_irst3_Main -4 This run is produced by the system configuration for run3 and uses rules extracted from Verbocean
Dependency similarity dictionary of Dekang Lin FBK_irst3_Main -13.56 This run is produced by the system configuration for run3 and uses rules extracted from dependency similarity dictionary of Dekang Lin
Dictionary of Named Entities Acronyms and Synonyms IKOMA2_abl-3 -0.76 Remove synonym dictionaries: as acronym dictionary constructed automatically from the corpus and a synonym dictionary that contains geographical terms.
WordNet JU_CSE_TAC1_abl-1 13.29 The Run-1 is based on the composition of lexical based RTE methods and Syntactic RTE Method. The lexical based RTE methods are: WordNet based unigram match, bigram match, longest common sub-sequence, skip-gram and stemming. Here we ablated the WordNet based unigram match only.
WordNet JU_CSE_TAC2_abl-1 10.19 The Run-2 is based on the composition of lexical based RTE methods, Syntactic RTE Method, Chunk and Named Entity Methods. The lexical based RTE methods are: WordNet based unigram match, bigram match, longest common sub-sequence, skip-gram and stemming. Here we ablated the WordNet based unigram match only.
WordNet JU_CSE_TAC3_abl-1 3.86 The Run-3 is based on the Support Vector Machine that uses twenty five features for lexical similarity, the output tag from a rule based syntactic two-way TE system as feature, and output from a rule based Chunk Module and Named Entity Module. The important lexical features that are used in the present system are: WordNet based unigram match, bigram match, longest common sub-sequence, skip-gram, stemming and lexical distance (17 features). Here we ablated the WordNet based unigram match only.
LingPipe co-reference PKUTM2_abl-1 0.17 Lingpipe co-reference are removed, the experiment was based on named-entity, wordnet, verbocean
VerbOcean PKUTM2_abl-2 1.02 Verbocean are removed, the experiment was based on named-entity, wordnet, co-reference
LingPipe Named Entities PKUTM2_abl-3 13.84 Lingpipe named-entity are removed, the experiment was based on wordnet, co-reference, verbocean
WordNet saicnlp1_abl-1 -0.02 Ablation run, with WordNet stubbed.
Shalmaneser Parser Sangyan1_abl-1 -1.76 This ablation run was executed with the Shalmaneser parser ablated from our system. Our system uses the Shalmaneser Parser for Framenet frame assignment.

Our system required minor tweaking of the source code in order ablate Shalmaneser Parser.

WordNet Sangyan1_abl-2 -0.39 This ablation run was executed with the WordNet ablated from our system. Our system uses the Wordnet for Word mathcing.

Our system required minor tweaking of the source code in order ablate Wordnet.

VerbOcean Sangyan1_abl-3 0.07 This ablation run was executed with the Verb Ocean ablated from our system. Our system uses the Verb Ocean resource for detection of antonyms. Our system required minor tweaking of the source code in order ablate Verb Ocean resource.
Named Entities Recognition SINAI1_abl-1 7.62 Only PPVs on WN 1.7
Wordnet SINAI1_abl-2 15.87 Only NEs on WN 1.7
Named Entities Recognition SINAI2_abl-1 20.25 Only PPVs on WN 3.0
Wordnet SINAI2_abl-2 18.28 Only NEs on WN 3.0
Wordnet SJTU_CIT3_abl-1 0.03 The resource that has been ablated is WordNet.
Wikipedia SJTU_CIT3_abl-2 4.7 The resource that has been ablated is Wikipedia.
VerbOcean SJTU_CIT3_abl-3 -1.15 The resource that has been ablated is VerbOcean.
Wikipedia UAIC20101_abl-1 1.93 Run 1 with no BK
DIRT UAIC20101_abl-2 -1.56 Run 1 without DIRT.
Wikipedia UAIC20102_abl-1 1.08 Run 2 with no BK
DIRT UAIC20102_abl-2 -0.72 Run 2 without DIRT.
Wikipedia UAIC20103_abl-1 1.3 Run 3 with no BK
DIRT UAIC20103_abl-2 -0.98 Run 3 without DIRT.
WordNet UB.dmirg1_abl-1 1.7 WordNet has been removed. No system changes required.
Framenet UB.dmirg1_abl-2 -0.1 FrameNet has been removed. No system changes required.
WordNet UB.dmirg2_abl-1 0.6 WordNet has been removed. No system changes required.
Framenet UB.dmirg2_abl-2 -1.84 FrameNet has been removed. No system changes required.
WordNet UB.dmirg3_abl-1 -0.78 WordNet has been removed. No system changes required.
Framenet UB.dmirg3_abl-2 -1.81 FrameNet has been removed. No system changes required.
WordNet UIUC1_abl-1 -3.07 Abalation 1 - Removed wordnet (No locations and no synset expansion for non-entity terms). Threshold tuned on development collection to achieve similar precision and recall as Run 1.


Footnotes

  1. For further information about participants, click here: RTE Challenges - Data about participants


   Return to RTE Knowledge Resources