A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing

Sebastian G.M. Händschke, Sven Buechel, Jan Goldenstein, Philipp Poschmann, Tinghui Duan, Peter Walgenbach, Udo Hahn


Abstract
We introduce JOCo, a novel text corpus for NLP analytics in the field of economics, business and management. This corpus is composed of corporate annual and social responsibility reports of the top 30 US, UK and German companies in the major (DJIA, FTSE 100, DAX), middle-sized (S&P 500, FTSE 250, MDAX) and technology (NASDAQ, FTSE AIM 100, TECDAX) stock indices, respectively. Altogether, this adds up to 5,000 reports from 270 companies headquartered in three of the world’s most important economies. The corpus spans a time frame from 2000 up to 2015 and contains, in total, 282M tokens. We also feature JOCo in a small-scale experiment to demonstrate its potential for NLP-fueled studies in economics, business and management research.
Anthology ID:
W18-3103
Volume:
Proceedings of the First Workshop on Economics and Natural Language Processing
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Udo Hahn, Véronique Hoste, Ming-Feng Tsai
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20–31
Language:
URL:
https://aclanthology.org/W18-3103
DOI:
10.18653/v1/W18-3103
Bibkey:
Cite (ACL):
Sebastian G.M. Händschke, Sven Buechel, Jan Goldenstein, Philipp Poschmann, Tinghui Duan, Peter Walgenbach, and Udo Hahn. 2018. A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing. In Proceedings of the First Workshop on Economics and Natural Language Processing, pages 20–31, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing (Händschke et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3103.pdf