Self-Discriminative Learning for Unsupervised Document Embedding

Hong-You Chen, Chin-Hua Hu, Leila Wehbe, Shou-De Lin


Abstract
Unsupervised document representation learning is an important task providing pre-trained features for NLP applications. Unlike most previous work which learn the embedding based on self-prediction of the surface of text, we explicitly exploit the inter-document information and directly model the relations of documents in embedding space with a discriminative network and a novel objective. Extensive experiments on both small and large public datasets show the competitiveness of the proposed method. In evaluations on standard document classification, our model has errors that are 5 to 13% lower than state-of-the-art unsupervised embedding models. The reduction in error is even more pronounced in scarce label setting.
Anthology ID:
N19-1255
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2465–2474
Language:
URL:
https://aclanthology.org/N19-1255
DOI:
10.18653/v1/N19-1255
Bibkey:
Cite (ACL):
Hong-You Chen, Chin-Hua Hu, Leila Wehbe, and Shou-De Lin. 2019. Self-Discriminative Learning for Unsupervised Document Embedding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2465–2474, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Self-Discriminative Learning for Unsupervised Document Embedding (Chen et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-1255.pdf
Video:
 https://aclanthology.org/N19-1255.mp4
Data
IMDb Movie Reviews