Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

Shasha Liao and Ralph Grishman
New York University


Abstract

Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to make more confident and informative data selection. We compare it to self-training on normal newswire corpus and show that IR can provide a better corpus for bootstrapping and global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2045.pdf