Simple Algorithms For Sentiment Analysis On Sentiment Rich, Data Poor Domains.

Prathusha K Sarma, William Sethares


Abstract
Standard word embedding algorithms learn vector representations from large corpora of text documents in an unsupervised fashion. However, the quality of word embeddings learned from these algorithms is affected by the size of training data sets. Thus, applications of these algorithms in domains with only moderate amounts of available data is limited. In this paper we introduce an algorithm that learns word embeddings jointly with a classifier. Our algorithm is called SWESA (Supervised Word Embeddings for Sentiment Analysis). SWESA leverages document label information to learn vector representations of words from a modest corpus of text documents by solving an optimization problem that minimizes a cost function with respect to both word embeddings and the weight vector used for classification. Experiments on several real world data sets show that SWESA has superior performance on domains with limited data, when compared to previously suggested approaches to word embeddings and sentiment analysis tasks.
Anthology ID:
C18-1290
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3424–3435
Language:
URL:
https://aclanthology.org/C18-1290
DOI:
Bibkey:
Cite (ACL):
Prathusha K Sarma and William Sethares. 2018. Simple Algorithms For Sentiment Analysis On Sentiment Rich, Data Poor Domains.. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3424–3435, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Simple Algorithms For Sentiment Analysis On Sentiment Rich, Data Poor Domains. (K Sarma & Sethares, COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1290.pdf
Data
IMDb Movie Reviews