Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Sudhanshu Kasewa, Pontus Stenetorp, Sebastian Riedel


Abstract
Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic grammatical errors would be difficult, one could learn the distribution of naturally-occurring errors and attempt to introduce them into other datasets. Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. Our approach yields error-filled artificial data that helps a vanilla bi-directional LSTM to outperform the previous state of the art at grammatical error detection, and a previously introduced model to gain further improvements of over 5% F0.5 score. When attempting to determine if a given sentence is synthetic, a human annotator at best achieves 39.39 F1 score, indicating that our model generates mostly human-like instances.
Anthology ID:
D18-1541
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4977–4983
Language:
URL:
https://aclanthology.org/D18-1541
DOI:
10.18653/v1/D18-1541
Bibkey:
Cite (ACL):
Sudhanshu Kasewa, Pontus Stenetorp, and Sebastian Riedel. 2018. Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4977–4983, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection (Kasewa et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1541.pdf
Code
 skasewa/wronging
Data
FCE