Semi-supervised condensed nearest neighbor for part-of-speech tagging

Anders Søgaard
University of Copenhagen


Abstract

This paper introduces a new training set condensation technique designed for mixtures of labeled and unlabeled data. It finds a condensed set of labeled and unlabeled data points, typically smaller than what is obtained using condensed nearest neighbor on the labeled data only, and improves classification accuracy. We evaluate the algorithm on semi-supervised part-of-speech tagging and present the best published result on the Wall Street Journal data set.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-2009.pdf