Automatic Labelling of Topic Models

Jey Han Lau1,  Karl Grieser2,  David Newman3,  Timothy Baldwin1
1University of Melbourne/NICTA, 2University of Melbourne, 3UCI/NICTA


Abstract

We propose a method for automatically labelling topics learned via LDA topic models. We generate our label candidate set from the top-ranking topic terms, titles of Wikipedia articles containing the top-ranking topic terms, and sub-phrases extracted from the Wikipedia article titles. We rank the label candidates using a combination of association measures and lexical features, optionally fed into a supervised ranking model. Our method is shown to perform strongly over four independent sets of topics, significantly better than a benchmark method.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1154.pdf