Learning Disentangled Representations of Texts with Application to Biomedical Abstracts

Sarthak Jain, Edward Banner, Jan-Willem van de Meent, Iain J. Marshall, Byron C. Wallace


Abstract
We propose a method for learning disentangled representations of texts that code for distinct and complementary aspects, with the aim of affording efficient model transfer and interpretability. To induce disentangled embeddings, we propose an adversarial objective based on the (dis)similarity between triplets of documents with respect to specific aspects. Our motivating application is embedding biomedical abstracts describing clinical trials in a manner that disentangles the populations, interventions, and outcomes in a given trial. We show that our method learns representations that encode these clinically salient aspects, and that these can be effectively used to perform aspect-specific retrieval. We demonstrate that the approach generalizes beyond our motivating application in experiments on two multi-aspect review corpora.
Anthology ID:
D18-1497
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4683–4693
Language:
URL:
https://aclanthology.org/D18-1497
DOI:
10.18653/v1/D18-1497
Bibkey:
Cite (ACL):
Sarthak Jain, Edward Banner, Jan-Willem van de Meent, Iain J. Marshall, and Byron C. Wallace. 2018. Learning Disentangled Representations of Texts with Application to Biomedical Abstracts. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4683–4693, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Learning Disentangled Representations of Texts with Application to Biomedical Abstracts (Jain et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1497.pdf
Attachment:
 D18-1497.Attachment.pdf
Code
 successar/neural-nlp