Ted Briscoe

Also published as: E.J. Briscoe, Edward Briscoe

2023

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject–verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors, respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems, which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarize the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgments, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as a comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.

2020

pdf bib abs
Analyzing Neural Discourse Coherence Models
Youmna Farag | Josef Valvoda | Helen Yannakoudakis | Ted Briscoe
Proceedings of the First Workshop on Computational Approaches to Discourse

In this work, we systematically investigate how well current models of coherence can capture aspects of text implicated in discourse organisation. We devise two datasets of various linguistic alterations that undermine coherence and test model sensitivity to changes in syntax and semantics. We furthermore probe discourse embedding space and examine the knowledge that is encoded in representations of coherence. We hope this study shall provide further insight into how to frame the task and improve models of coherence assessment further. Finally, we make our datasets publicly available as a resource for researchers to use to test discourse coherence models.

2019

pdf bib abs
Automatic learner summary assessment for reading comprehension
Menglin Xia | Ekaterina Kochmar | Ted Briscoe
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Automating the assessment of learner summary provides a useful tool for assessing learner reading comprehension. We present a summarization task for evaluating non-native reading comprehension and propose three novel approaches to automatically assess the learner summaries. We evaluate our models on two datasets we created and show that our models outperform traditional approaches that rely on exact word match on this task. Our best model produces quality assessments close to professional examiners.

pdf bib abs
The BEA-2019 Shared Task on Grammatical Error Correction
Christopher Bryant | Mariano Felice | Øistein E. Andersen | Ted Briscoe
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper reports on the BEA-2019 Shared Task on Grammatical Error Correction (GEC). As with the CoNLL-2014 shared task, participants are required to correct all types of errors in test data. One of the main contributions of the BEA-2019 shared task is the introduction of a new dataset, the Write&Improve+LOCNESS corpus, which represents a wider range of native and learner English levels and abilities. Another contribution is the introduction of tracks, which control the amount of annotated data available to participants. Systems are evaluated in terms of ERRANT F_0.5, which allows us to report a much wider range of performance statistics. The competition was hosted on Codalab and remains open for further submissions on the blind test set.

pdf bib
Active Learning for Financial Investment Reports
Sian Gooding | Ted Briscoe
Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019)

2018

pdf bib abs
Language Model Based Grammatical Error Correction without Annotated Training Data
Christopher Bryant | Ted Briscoe
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

Since the end of the CoNLL-2014 shared task on grammatical error correction (GEC), research into language model (LM) based approaches to GEC has largely stagnated. In this paper, we re-examine LMs in GEC and show that it is entirely possible to build a simple system that not only requires minimal annotated data (∼1000 sentences), but is also fairly competitive with several state-of-the-art systems. This approach should be of particular interest for languages where very little annotated training data exists, although we also hope to use it as a baseline to motivate future research.

pdf bib abs
The Effect of Adding Authorship Knowledge in Automated Text Scoring
Meng Zhang | Xie Chen | Ronan Cummins | Øistein E. Andersen | Ted Briscoe
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

Some language exams have multiple writing tasks. When a learner writes multiple texts in a language exam, it is not surprising that the quality of these texts tends to be similar, and the existing automated text scoring (ATS) systems do not explicitly model this similarity. In this paper, we suggest that it could be useful to include the other texts written by this learner in the same exam as extra references in an ATS system. We propose various approaches of fusing information from multiple tasks and pass this authorship knowledge into our ATS model on six different datasets. We show that this can positively affect the model performance at a global level.

pdf bib abs
Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input
Youmna Farag | Helen Yannakoudakis | Ted Briscoe
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We demonstrate that current state-of-the-art approaches to Automated Essay Scoring (AES) are not well-suited to capturing adversarially crafted input of grammatical but incoherent sequences of sentences. We develop a neural model of local coherence that can effectively learn connectedness features between sentences, and propose a framework for integrating and jointly training the local coherence model with a state-of-the-art AES model. We evaluate our approach against a number of baselines and experimentally demonstrate its effectiveness on both the AES task and the task of flagging adversarial input, further contributing to the development of an approach that strengthens the validity of neural essay scoring models.

2017

pdf bib abs
An Error-Oriented Approach to Word Embedding Pre-Training
Youmna Farag | Marek Rei | Ted Briscoe
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

We propose a novel word embedding pre-training approach that exploits writing errors in learners’ scripts. We compare our method to previous models that tune the embeddings based on script scores and the discrimination between correct and corrupt word contexts in addition to the generic commonly-used embeddings pre-trained on large corpora. The comparison is achieved by using the aforementioned models to bootstrap a neural network that learns to predict a holistic score for scripts. Furthermore, we investigate augmenting our model with error corrections and monitor the impact on performance. Our results show that our error-oriented approach outperforms other comparable ones which is further demonstrated when training on more data. Additionally, extending the model with corrections provides further performance gains when data sparsity is an issue.

pdf bib abs
Artificial Error Generation with Machine Translation and Syntactic Patterns
Marek Rei | Mariano Felice | Zheng Yuan | Ted Briscoe
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Shortage of available training data is holding back progress in the area of automated error detection. This paper investigates two alternative methods for artificially generating writing errors, in order to create additional resources. We propose treating error generation as a machine translation task, where grammatically correct text is translated to contain errors. In addition, we explore a system for extracting textual patterns from an annotated corpus, which can then be used to insert errors into grammatically correct sentences. Our experiments show that the inclusion of artificially generated errors significantly improves error detection accuracy on both FCE and CoNLL 2014 datasets.

pdf bib abs
Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction
Christopher Bryant | Mariano Felice | Ted Briscoe
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Until now, error type performance for Grammatical Error Correction (GEC) systems could only be measured in terms of recall because system output is not annotated. To overcome this problem, we introduce ERRANT, a grammatical ERRor ANnotation Toolkit designed to automatically extract edits from parallel original and corrected sentences and classify them according to a new, dataset-agnostic, rule-based framework. This not only facilitates error type evaluation at different levels of granularity, but can also be used to reduce annotator workload and standardise existing GEC datasets. Human experts rated the automatic edits as “Good” or “Acceptable” in at least 95% of cases, so we applied ERRANT to the system output of the CoNLL-2014 shared task to carry out a detailed error type analysis for the first time.

2016

pdf bib
Grammatical error correction using neural machine translation
Zheng Yuan | Ted Briscoe
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Text Readability Assessment for Second Language Learners
Menglin Xia | Ekaterina Kochmar | Ted Briscoe
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Unsupervised Modeling of Topical Relevance in L2 Learner Text
Ronan Cummins | Helen Yannakoudakis | Ted Briscoe
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Candidate re-ranking for SMT-based grammatical error correction
Zheng Yuan | Ted Briscoe | Mariano Felice
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Constrained Multi-Task Learning for Automated Essay Scoring
Ronan Cummins | Meng Zhang | Ted Briscoe
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib abs
Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments
Mariano Felice | Christopher Bryant | Ted Briscoe
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We propose a new method of automatically extracting learner errors from parallel English as a Second Language (ESL) sentences in an effort to regularise annotation formats and reduce inconsistencies. Specifically, given an original and corrected sentence, our method first uses a linguistically enhanced alignment algorithm to determine the most likely mappings between tokens, and secondly employs a rule-based function to decide which alignments should be merged. Our method beats all previous approaches on the tested datasets, achieving state-of-the-art results for automatic error extraction.

We have integrated the RASP system with the UIMA framework (RASP4UIMA) and used this to parse the XML-encoded version of the British National Corpus (BNC). All original annotation is preserved, and parsing information, mainly in the form of grammatical relations, is added in an XML format. A few specific adaptations of the system to give better results with the BNC are discussed briefly. The RASP4UIMA system is publicly available and can be used to parse other corpora or document collections, and the final parsed version of the BNC will be deposited with the Oxford Text Archive.

pdf bib
Statistical Anaphora Resolution in Biomedical Texts
Caroline Gasperin | Ted Briscoe
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora
Judita Preiss | Ted Briscoe | Anna Korhonen
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Weakly Supervised Learning for Hedge Classification in Scientific Literature
Ben Medlock | Ted Briscoe
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Semi-supervised Training of a Statistical Parser from Unlabeled Partially-bracketed Data
Rebecca Watson | Ted Briscoe | John Carroll
Proceedings of the Tenth International Conference on Parsing Technologies

pdf bib
Adapting the RASP System for the CoNLL07 Domain-Adaptation Task
Rebecca Watson | Ted Briscoe
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Evaluating the Accuracy of an Unlexicalized Statistical Parser on the PARC DepBank
Ted Briscoe | John Carroll
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
The Second Release of the RASP System
Ted Briscoe | John Carroll | Rebecca Watson
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf bib abs
A Large Subcategorization Lexicon for Natural Language Processing Applications
Anna Korhonen | Yuval Krymolowski | Ted Briscoe
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We introduce a large computational subcategorizationlexicon which includes subcategorization frame (SCF) and frequencyinformation for 6,397 English verbs. This extensive lexicon was acquiredautomatically from five corpora and the Web using the current version of the comprehensive subcategorization acquisition system of Briscoe and Carroll (1997). The lexicon is provided freely for research use, along with a script which can be used to filter and build sub-lexicons suited for different natural languageprocessing (NLP) purposes. Documentation is also provided whichexplains each sub-lexicon option and evaluates its accuracy.

2005

pdf bib
Efficient Extraction of Grammatical Relations
Rebecca Watson | John Carroll | Ted Briscoe
Proceedings of the Ninth International Workshop on Parsing Technology

pdf bib
Automatic Acquisition of Adjectival Subcategorization from Corpora
Jeremy Yallop | Anna Korhonen | Ted Briscoe
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Extended Lexical-Semantic Classification of English Verbs
Anna Korhonen | Ted Briscoe
Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004

pdf bib
Can Anaphoric Definite Descriptions be Replaced by Pronouns?
Judita Preiss | Caroline Gasperin | Ted Briscoe
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Intermediate Parsing for Anaphora Resolution? Implementing the Lappin and Leass non-coreference filters
Judita Preiss | Ted Briscoe
Proceedings of the 2003 EACL Workshop on The Computational Treatment of Anaphora

2002

pdf bib
Subcategorization Acquisition as an Evaluation Method for WSD
Judita Preiss | Anna Korhonen | Ted Briscoe
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Robust Accurate Statistical Annotation of General Text
Ted Briscoe | John Carroll
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
High Precision Extraction of Grammatical Relations
John Carroll | Ted Briscoe
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
High Precision Extraction of Grammatical Relations
John Carrol | Ted Briscoe
Proceedings of the Seventh International Workshop on Parsing Technologies

1999

pdf bib
Lexical rules in constraint based grammars
Ted Briscoe | Ann Copestake
Computational Linguistics, Volume 25, Number 4, December 1999

1998

pdf bib
Can Subcategorisation Probabilities Help a Statistical Parser
John Carroll | Guido Minnen | Ted Briscoe
Sixth Workshop on Very Large Corpora

1997

pdf bib
Co-Evolution of Language and of the Language Acquisition Device
Ted Briscoe
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Automatic Extraction of Subcategorization from Corpora
Ted Briscoe | John Carroll
Fifth Conference on Applied Natural Language Processing

pdf bib
Learning Stochastic Categorial Grammars
Miles Osborne | Ted Briscoe
CoNLL97: Computational Natural Language Learning

1996

pdf bib
Apportioning Development Effort in a Probabilistic LR Parsing System Through Evaluation
John Carroll | Ted Briscoe
Conference on Empirical Methods in Natural Language Processing

pdf bib
Controlling the Application of Lexical Rules
Ted Briscoe | Ann Copestake
Breadth and Depth of Semantic Lexicons

1995

pdf bib abs
Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels
Ted Briscoe | John Carroll
Proceedings of the Fourth International Workshop on Parsing Technologies

We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.