Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets

Prathusha Kameswara Sarma


Abstract
This research proposal describes two algorithms that are aimed at learning word embeddings for data sparse and sentiment rich data sets. The goal is to use word embeddings adapted for domain specific data sets in downstream applications such as sentiment classification. The first approach learns word embeddings in a supervised fashion via SWESA (Supervised Word Embeddings for Sentiment Analysis), an algorithm for sentiment analysis on data sets that are of modest size. SWESA leverages document labels to jointly learn polarity-aware word embeddings and a classifier to classify unseen documents. In the second approach domain adapted (DA) word embeddings are learned by exploiting the specificity of domain specific data sets and the breadth of generic word embeddings. The new embeddings are formed by aligning corresponding word vectors using Canonical Correlation Analysis (CCA) or the related nonlinear Kernel CCA. Experimental results on binary sentiment classification tasks using both approaches for standard data sets are presented.
Anthology ID:
N18-4007
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:
June
Year:
2018
Address:
New Orleans, Louisiana, USA
Editors:
Silvio Ricardo Cordeiro, Shereen Oraby, Umashanthi Pavalanathan, Kyeongmin Rim
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–53
Language:
URL:
https://aclanthology.org/N18-4007
DOI:
10.18653/v1/N18-4007
Bibkey:
Cite (ACL):
Prathusha Kameswara Sarma. 2018. Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 46–53, New Orleans, Louisiana, USA. Association for Computational Linguistics.
Cite (Informal):
Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets (Kameswara Sarma, NAACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/N18-4007.pdf