Difference between revisions of "RTE7 - Ablation Tests"

From ACL Wiki
Jump to navigation Jump to search
 
(6 intermediate revisions by 2 users not shown)
Line 28: Line 28:
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
|Direct
 
|Direct
| BIU1_abl-2
+
| BIU2_abl-2
 
| style="text-align: center;"| 0.94
 
| style="text-align: center;"| 0.94
 
| style="text-align: center;"| Without Bap (AKA "Direct"), which is used as a lexical rulebase  resource
 
| style="text-align: center;"| Without Bap (AKA "Direct"), which is used as a lexical rulebase  resource
Line 34: Line 34:
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| Wikipedia
 
| Wikipedia
| BIU1_abl-3
+
| BIU2_abl-3
 
| style="text-align: center;"| 1.56
 
| style="text-align: center;"| 1.56
 
| style="text-align: center;"| Without Wikipedia, which is used as a lexical rulebase resource
 
| style="text-align: center;"| Without Wikipedia, which is used as a lexical rulebase resource
 
 
**********************************
 
  
 
|- bgcolor="#FFFFF" "align="left"
 
|- bgcolor="#FFFFF" "align="left"
 
| Coreference resolver
 
| Coreference resolver
| BIU1_abl-3
+
| BIU2_abl-4
| style="text-align: center;"| 1.56
+
| style="text-align: center;"| 0.69
| style="text-align: center;"| Wikipedia
+
| style="text-align: center;"| Without any coreference resolution engine, instead of sing ArkRef to obtain coref information from the text, when preprocessing it
On Dev set 41.62% (Compared to 40.73% when Coreference resolver is used).
 
This ablation test is an unusual ablation test, since it shows that the co-reference resolution component has a negative impact.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| DIRT
 
| Boeing1_abl-1
 
| style="text-align: center;"| 3.97
 
| style="text-align: center;"| DIRT removed
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| WordNet
| Boeing1_abl-2
+
| DFKI1_abl-1
| style="text-align: center;"| 4.42
+
| style="text-align: center;"| -0.14
| style="text-align: center;"| No WordNet
+
| style="text-align: center;"| Features based on WordNet similarity measures (JWNL).
  
 
|- bgcolor="#FFFFF" "align="left"
 
|- bgcolor="#FFFFF" "align="left"
| Name Normalization
+
| Named Entity Recognition
| budapestcad2_abl-2
+
| DFKI1_abl-2
| style="text-align: center;"| 0.65
+
| style="text-align: center;"| 2.08
| style="text-align: center;"| no name normalization was performed (e.g. George W. Bush -> Bush).
+
| style="text-align: center;"| Features based on WordNet similarity measures (JWNL).
 
 
|- bgcolor="#FFFFF" "align="left"
 
| Named Entities Recognition
 
| budapestcad2_abl-3
 
| style="text-align: center;"| -1.23
 
| style="text-align: center;"| no NER
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
| WordNet
+
| Wikipedia
| budapestcad2_abl-4
+
| FBK_irst3_abl-2
| style="text-align: center;"| -1.11
+
| style="text-align: center;"| -2.64
| style="text-align: center;"| No WordNet. (In the original run, WordNet was used to find the synonyms of words in the triplets, and additional triplets were generated from all possible combinations.)
+
| style="text-align: center;"| Ablating wikipedia LSA similarity scores.
 
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| deb_iitb1_abl-1
 
| style="text-align: center;"| 8.68
 
| style="text-align: center;"| Wordnet is albated in this test.No change of code required only wordnet module is removed while matching.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| VerbOcean
 
| deb_iitb1_abl-2
 
| style="text-align: center;"| 1.87
 
| style="text-align: center;"| VerbOcean is albated in this test.No change of code required only VerbOcean module is removed while matching.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| deb_iitb2_abl-1
 
| style="text-align: center;"| 7.9
 
| style="text-align: center;"| Wordnet is albated in this test.No change of code required only wordnet module is removed while matching.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| VerbOcean
 
| deb_iitb2_abl-2
 
| style="text-align: center;"| 0.94
 
| style="text-align: center;"| VerbOcean is albated in this test.No change of code required only VerbOcean module is removed while matching.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| deb_iitb3_abl-1
 
| style="text-align: center;"| 11.43
 
| style="text-align: center;"| Wordnet is albated in this test. No change of code required only wordnet module is removed while matching.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| deb_iitb3_abl-2
 
| style="text-align: center;"| 2.54
 
| style="text-align: center;"| VerbOcean is albated in this test.No change of code required only VerbOcean module is removed while matching.
 
 
 
|- bgcolor="#FFFFF" "align="left"
 
| POS-Tagger
 
| DFKI1_abl-4
 
| style="text-align: center;"| 4.99
 
| style="text-align: center;"| No wordform/POS-tags included for the comparison.
 
  
 
|- bgcolor="#FFFFF" "align="left"
 
|- bgcolor="#FFFFF" "align="left"
| POS-Tagger
+
| Named Entity Recognition
| DFKI1_abl-6
+
| FBK_irst3_abl-3
| style="text-align: center;"| 2.22
+
| style="text-align: center;"| -0.89
| style="text-align: center;"| No named entity recognition for the comparison.
+
| style="text-align: center;"| Ablating named entities matching module.
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
| WordNet
+
| Paraphrase Table
| DFKI1_abl-7
+
| FBK_irst3_abl-4
| style="text-align: center;"| -0.23
+
| style="text-align: center;"| -1.43
| style="text-align: center;"| No WordNet similarity for the comparison.
+
| style="text-align: center;"| Ablating paraphrase matching module. The paraphrases were extracted from parallel corpora.
 
 
|- bgcolor="#FFFFF" "align="left"
 
| Coreference resolver
 
| DFKI1_Main
 
| style="text-align: center;"| -1.54
 
| style="text-align: center;"| Coreference resolution used for the comparison.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| DirRelCond23_abl-1
 
| style="text-align: center;"| 8.43
 
| style="text-align: center;"| WordNet removed. Only basic word comparison used instead of word relations.
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
| Wikipedia
+
| Acronym List
| FBK_irst3_Main
+
| IKOMA3_abl-1
| style="text-align: center;"| -23.91
+
| style="text-align: center;"| -0.16
| style="text-align: center;"| This run is produced by the system configuration for run3 and uses rules extracted from Wikipedia
+
| style="text-align: center;"| No acronyms of organization names extracted from the corpus.
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
| Wikipedia
+
| CatVar
| FBK_irst3_Main
+
| IKOMA3_abl-2
| style="text-align: center;"| -3.58
+
| style="text-align: center;"| 0.84
| style="text-align: center;"| This run is produced by the system configuration for run3 and uses rules extracted from Wikipedia with probability above 0.7
+
| style="text-align: center;"| No CatVar.
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Proximity similarity dictionary of Dekang Lin
 
| FBK_irst3_Main
 
| style="text-align: center;"| -7.79
 
| style="text-align: center;"| This run is produced by the system configuration for run3 and uses rules extracted from proximity similarity dictionary of Dekang Lin
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| FBK_irst3_Main
 
| style="text-align: center;"| -3.21
 
| style="text-align: center;"| This run is produced by the system configuration for run3 and uses rules extracted from WordNet
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| WordNet
| FBK_irst3_Main
+
| IKOMA3_abl-3
| style="text-align: center;"| -2.08
+
| style="text-align: center;"| 0.85
| style="text-align: center;"| This run is produced by the system configuration for run3 and uses rules extracted from WordNet with probability above 0.7
+
| style="text-align: center;"| No WordNet.
 
 
|- bgcolor="#ECECEC" "align="left"
 
| VerbOcean
 
| FBK_irst3_Main
 
| style="text-align: center;"| -4
 
| style="text-align: center;"| This run is produced by the system configuration for run3 and uses rules extracted from Verbocean
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Dependency similarity dictionary of Dekang Lin
 
| FBK_irst3_Main
 
| style="text-align: center;"| -13.56
 
| style="text-align: center;"| This run is produced by the system configuration for run3 and uses rules extracted from dependency similarity dictionary of Dekang Lin
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Dictionary of Named Entities Acronyms and Synonyms
 
| IKOMA2_abl-3
 
| style="text-align: center;"| -0.76
 
| style="text-align: center;"| Remove synonym dictionaries: as acronym dictionary constructed automatically from the corpus and a synonym dictionary that contains geographical terms.
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| WordNet
 
| JU_CSE_TAC1_abl-1
 
| JU_CSE_TAC1_abl-1
| style="text-align: center;"| 13.29
+
| style="text-align: center;"| 9.81
| style="text-align: center;"| The Run-1 is based on the composition of lexical based RTE methods and Syntactic RTE Method. The lexical based RTE methods are: WordNet based unigram match, bigram match, longest common sub-sequence, skip-gram and stemming. Here we ablated the WordNet based unigram match only.
+
| style="text-align: center;"| WordNet Ablated
 
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| JU_CSE_TAC2_abl-1
 
| style="text-align: center;"| 10.19
 
| style="text-align: center;"| The Run-2 is based on the composition of lexical based RTE methods, Syntactic RTE Method, Chunk and Named Entity Methods. The lexical based RTE methods are: WordNet based unigram match, bigram match, longest common sub-sequence, skip-gram and stemming. Here we ablated the WordNet based unigram match only.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| JU_CSE_TAC3_abl-1
 
| style="text-align: center;"| 3.86
 
| style="text-align: center;"| The Run-3 is based on the Support Vector Machine that uses twenty five features for lexical similarity, the output tag from a rule based syntactic two-way TE system as feature, and output from a rule based Chunk Module and Named Entity Module. The important lexical features that are used in the present system are: WordNet based unigram match, bigram match, longest common sub-sequence, skip-gram, stemming and lexical distance (17 features). Here we ablated the WordNet based unigram match only.
 
 
 
|- bgcolor="#FFFFF" "align="left"
 
| LingPipe co-reference
 
| PKUTM2_abl-1
 
| style="text-align: center;"| 0.17
 
| style="text-align: center;"| Lingpipe co-reference are removed, the experiment was based on named-entity, wordnet, verbocean
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| VerbOcean
 
| PKUTM2_abl-2
 
| style="text-align: center;"| 1.02
 
| style="text-align: center;"| Verbocean are removed, the experiment was based on named-entity, wordnet, co-reference
 
 
 
|- bgcolor="#FFFFF" "align="left"
 
| LingPipe Named Entities
 
| PKUTM2_abl-3
 
| style="text-align: center;"| 13.84
 
| style="text-align: center;"| Lingpipe named-entity are removed, the experiment was based on wordnet, co-reference, verbocean
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| saicnlp1_abl-1
 
| style="text-align: center;"| -0.02
 
| style="text-align: center;"| Ablation run, with WordNet stubbed.
 
  
 
|- bgcolor="#FFFFF" "align="left"
 
|- bgcolor="#FFFFF" "align="left"
| Shalmaneser Parser
+
| Named Entity Recognition
| Sangyan1_abl-1
+
| JU_CSE_TAC1_abl-2
| style="text-align: center;"| -1.76
+
| style="text-align: center;"| 7.97
| style="text-align: center;"| This ablation run was executed with the Shalmaneser parser ablated from our system. Our system uses the Shalmaneser Parser for Framenet frame assignment.
+
| style="text-align: center;"| NER Ablated
Our system required minor tweaking of the source code in order ablate Shalmaneser Parser.
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| WordNet
| Sangyan1_abl-2
 
| style="text-align: center;"| -0.39
 
| style="text-align: center;"| This ablation run was executed with the WordNet ablated from our system. Our system uses the Wordnet for Word mathcing.
 
Our system required minor tweaking of the source code in order ablate Wordnet.
 
 
|- bgcolor="#ECECEC" "align="left"
 
| VerbOcean
 
| Sangyan1_abl-3
 
| style="text-align: center;"| 0.07
 
| style="text-align: center;"| This ablation run was executed with the Verb Ocean ablated from our system. Our system uses the Verb Ocean resource for detection of antonyms. Our system required minor tweaking of the source code in order ablate Verb Ocean resource.
 
 
|- bgcolor="#FFFFF" "align="left"
 
| Named Entities Recognition
 
 
| SINAI1_abl-1
 
| SINAI1_abl-1
| style="text-align: center;"| 7.62
+
| style="text-align: center;"| -0.12
| style="text-align: center;"| Only PPVs on WN 1.7
+
| style="text-align: center;"| Resource ablated: lexical similarity module based on Personalized Page Rank vectors over WordNet 3.0
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Wordnet
 
| SINAI1_abl-2
 
| style="text-align: center;"| 15.87
 
| style="text-align: center;"| Only NEs on WN 1.7
 
 
 
|- bgcolor="#FFFFF" "align="left"
 
| Named Entities Recognition
 
| SINAI2_abl-1
 
| style="text-align: center;"| 20.25
 
| style="text-align: center;"| Only PPVs on WN 3.0
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Wordnet
 
| SINAI2_abl-2
 
| style="text-align: center;"| 18.28
 
| style="text-align: center;"| Only NEs on WN 3.0
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Wordnet
 
| SJTU_CIT3_abl-1
 
| style="text-align: center;"| 0.03
 
| style="text-align: center;"| The resource that has been ablated is WordNet.
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| Wikipedia
 
| Wikipedia
| SJTU_CIT3_abl-2
+
| SJTU_CIT1_abl-1
| style="text-align: center;"| 4.7
+
| style="text-align: center;"| 8.89
| style="text-align: center;"| The resource that has been ablated is Wikipedia.
+
| style="text-align: center;"| we removed wikipedia resouce
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| VerbOcean
 
| VerbOcean
| SJTU_CIT3_abl-3
+
| SJTU_CIT1_abl-2
| style="text-align: center;"| -1.15
+
| style="text-align: center;"| 5.93
| style="text-align: center;"| The resource that has been ablated is VerbOcean.
+
| style="text-align: center;"| we removed verbocern resource
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Wikipedia
 
| UAIC20101_abl-1
 
| style="text-align: center;"| 1.93
 
| style="text-align: center;"| Run 1 with no BK
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| DIRT
 
| UAIC20101_abl-2
 
| style="text-align: center;"| -1.56
 
| style="text-align: center;"| Run 1 without DIRT.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Wikipedia
 
| UAIC20102_abl-1
 
| style="text-align: center;"| 1.08
 
| style="text-align: center;"| Run 2 with no BK
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| DIRT
 
| UAIC20102_abl-2
 
| style="text-align: center;"| -0.72
 
| style="text-align: center;"| Run 2 without DIRT.
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Wikipedia
 
| UAIC20103_abl-1
 
| style="text-align: center;"| 1.3
 
| style="text-align: center;"| Run 3 with no BK
 
 
 
|- bgcolor="#ECECEC" "align="left"
 
| DIRT
 
| UAIC20103_abl-2
 
| style="text-align: center;"| -0.98
 
| style="text-align: center;"| Run 3 without DIRT.
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| WordNet
| UB.dmirg1_abl-1
+
| u_tokyo1_abl-1
| style="text-align: center;"| 1.7
+
| style="text-align: center;"| 0.83
| style="text-align: center;"| WordNet has been removed. No system changes required.
+
| style="text-align: center;"| Ablated resource is WordNet
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Framenet
 
| UB.dmirg1_abl-2
 
| style="text-align: center;"| -0.1
 
| style="text-align: center;"| FrameNet has been removed. No system changes required.
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| WordNet
| UB.dmirg2_abl-1
+
| u_tokyo2_abl-1
| style="text-align: center;"| 0.6
+
| style="text-align: center;"| 0.64
| style="text-align: center;"| WordNet has been removed. No system changes required.
+
| style="text-align: center;"| Ablated resource is WordNet
 
 
|- bgcolor="#ECECEC" "align="left"
 
| Framenet
 
| UB.dmirg2_abl-2
 
| style="text-align: center;"| -1.84
 
| style="text-align: center;"| FrameNet has been removed. No system changes required.
 
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
 
| WordNet
 
| WordNet
| UB.dmirg3_abl-1
+
| u_tokyo3_abl-1
| style="text-align: center;"| -0.78
+
| style="text-align: center;"| 0.99
| style="text-align: center;"| WordNet has been removed. No system changes required.
+
| style="text-align: center;"| Ablated resource is WordNet
  
 
|- bgcolor="#ECECEC" "align="left"
 
|- bgcolor="#ECECEC" "align="left"
| Framenet
+
| UAIC Knowledge Resource
| UB.dmirg3_abl-2
+
| UAIC20112_abl-1
| style="text-align: center;"| -1.81
+
| style="text-align: center;"| 0
| style="text-align: center;"| FrameNet has been removed. No system changes required.
+
| style="text-align: center;"| Ablation of the BK (acronym database and world knowledge component)
  
|- bgcolor="#ECECEC" "align="left"
+
|- bgcolor="#FFFFF" "align="left"
| WordNet
+
| Named Entity Recognition
| UIUC1_abl-1
+
| UAIC20112_abl-3
| style="text-align: center;"| -3.07
+
| style="text-align: center;"| -8.29
| style="text-align: center;"| Abalation 1 - Removed wordnet (No locations and no synset expansion for non-entity terms). Threshold tuned on development collection to achieve similar precision and recall as Run 1.
+
| style="text-align: center;"| Ablation of the NE resources.
  
 
|}
 
|}

Latest revision as of 07:40, 27 March 2012

The following table lists the results of the ablation tests submitted by participants to RTE7 .
The exploratory effort about knowledge resources, started in RTE5 and extended to tools in RTE-6, was proposed also in RTE-7.

In the table below, the first column contains the specific resources which have been ablated.
The second column lists the Team Run in the form [name_of_the_Team][number_of_the_submitted_run].[submission_task] (e.g. BIU1.2way, Boeing3.3way).
The third column presents the normalized difference between the accuracy of the complete system run and the accuracy of the ablation run (i.e. the output of the complete system without the ablated resource), showing the impact of the resource on the performance of the system.
The fourth column contains a brief description of the specific usage of the resource. It is based on the information provided both in the "readme" files submitted together with the ablation tests and in the system reports published in the RTE7 proceedings.
If the ablated resource is highlighted in yellow, it is a tool, otherwise is a knowledge resource.

Participants are kindly invited to check if all the inserted information is correct and complete.


Ablated Component Ablation Run[1] Resource impact - F1 Resource Usage Description
WordNet BIU2_abl-1 -0.05 Without WordNet, which is used as a lexical rulebase resource
Direct BIU2_abl-2 0.94 Without Bap (AKA "Direct"), which is used as a lexical rulebase resource
Wikipedia BIU2_abl-3 1.56 Without Wikipedia, which is used as a lexical rulebase resource
Coreference resolver BIU2_abl-4 0.69 Without any coreference resolution engine, instead of sing ArkRef to obtain coref information from the text, when preprocessing it
WordNet DFKI1_abl-1 -0.14 Features based on WordNet similarity measures (JWNL).
Named Entity Recognition DFKI1_abl-2 2.08 Features based on WordNet similarity measures (JWNL).
Wikipedia FBK_irst3_abl-2 -2.64 Ablating wikipedia LSA similarity scores.
Named Entity Recognition FBK_irst3_abl-3 -0.89 Ablating named entities matching module.
Paraphrase Table FBK_irst3_abl-4 -1.43 Ablating paraphrase matching module. The paraphrases were extracted from parallel corpora.
Acronym List IKOMA3_abl-1 -0.16 No acronyms of organization names extracted from the corpus.
CatVar IKOMA3_abl-2 0.84 No CatVar.
WordNet IKOMA3_abl-3 0.85 No WordNet.
WordNet JU_CSE_TAC1_abl-1 9.81 WordNet Ablated
Named Entity Recognition JU_CSE_TAC1_abl-2 7.97 NER Ablated
WordNet SINAI1_abl-1 -0.12 Resource ablated: lexical similarity module based on Personalized Page Rank vectors over WordNet 3.0
Wikipedia SJTU_CIT1_abl-1 8.89 we removed wikipedia resouce
VerbOcean SJTU_CIT1_abl-2 5.93 we removed verbocern resource
WordNet u_tokyo1_abl-1 0.83 Ablated resource is WordNet
WordNet u_tokyo2_abl-1 0.64 Ablated resource is WordNet
WordNet u_tokyo3_abl-1 0.99 Ablated resource is WordNet
UAIC Knowledge Resource UAIC20112_abl-1 0 Ablation of the BK (acronym database and world knowledge component)
Named Entity Recognition UAIC20112_abl-3 -8.29 Ablation of the NE resources.


Footnotes

  1. For further information about participants, click here: RTE Challenges - Data about participants


   Return to RTE Knowledge Resources