Recognizing Named Entities in Tweets

Xiaohua LIU1,  Shaodian ZHANG2,  Furu WEI3,  Ming ZHOU3
1HIT;MSRA, 2Shanghai Jiao Tong University, 3MSRA


Abstract

The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semi-supervised learning.




Full paper: http://www.aclweb.org/anthology/P/P11/P11-1037.pdf