RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian

Anna Rogers, Alexey Romanov, Anna Rumshisky, Svitlana Volkova, Mikhail Gronas, Alex Gribov


Abstract
This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 31,185 posts annotated with Fleiss’ kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing embeddings trained on 3.2B tokens of Russian VKontakte posts.
Anthology ID:
C18-1064
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
755–763
Language:
URL:
https://aclanthology.org/C18-1064
DOI:
Bibkey:
Cite (ACL):
Anna Rogers, Alexey Romanov, Anna Rumshisky, Svitlana Volkova, Mikhail Gronas, and Alex Gribov. 2018. RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. In Proceedings of the 27th International Conference on Computational Linguistics, pages 755–763, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian (Rogers et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1064.pdf