Linguistic Characteristics of Censorable Language on SinaWeibo

Kei Yin Ng, Anna Feldman, Jing Peng, Chris Leberknight


Abstract
This paper investigates censorship from a linguistic perspective. We collect a corpus of censored and uncensored posts on a number of topics, build a classifier that predicts censorship decisions independent of discussion topics. Our investigation reveals that the strongest linguistic indicator of censored content of our corpus is its readability.
Anthology ID:
W18-4202
Volume:
Proceedings of the First Workshop on Natural Language Processing for Internet Freedom
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Chris Brew, Anna Feldman, Chris Leberknight
Venue:
NLP4IF
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–22
Language:
URL:
https://aclanthology.org/W18-4202
DOI:
Bibkey:
Cite (ACL):
Kei Yin Ng, Anna Feldman, Jing Peng, and Chris Leberknight. 2018. Linguistic Characteristics of Censorable Language on SinaWeibo. In Proceedings of the First Workshop on Natural Language Processing for Internet Freedom, pages 12–22, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Linguistic Characteristics of Censorable Language on SinaWeibo (Ng et al., NLP4IF 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4202.pdf