Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

Emily Bender, Hal Daumé III, Allyson Ettinger, Sudha Rao (Editors)


Anthology ID:
W17-54
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/W17-54
DOI:
10.18653/v1/W17-54
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/W17-54.pdf

pdf bib
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems
Emily Bender | Hal Daumé III | Allyson Ettinger | Sudha Rao

pdf bib
Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task
Allyson Ettinger | Sudha Rao | Hal Daumé III | Emily M. Bender

This paper presents a summary of the first Workshop on Building Linguistically Generalizable Natural Language Processing Systems, and the associated Build It Break It, The Language Edition shared task. The goal of this workshop was to bring together researchers in NLP and linguistics with a carefully designed shared task aimed at testing the generalizability of NLP systems beyond the distributions of their training data. We describe the motivation, setup, and participation of the shared task, provide discussion of some highlighted results, and discuss lessons learned.

pdf bib
Analysing Errors of Open Information Extraction Systems
Rudolf Schneider | Tom Oberhauser | Tobias Klatt | Felix A. Gers | Alexander Löser

We report results on benchmarking Open Information Extraction (OIE) systems using RelVis, a toolkit for benchmarking Open Information Extraction systems. Our comprehensive benchmark contains three data sets from the news domain and one data set from Wikipedia with overall 4522 labeled sentences and 11243 binary or n-ary OIE relations. In our analysis on these data sets we compared the performance of four popular OIE systems, ClausIE, OpenIE 4.2, Stanford OpenIE and PredPatt. In addition, we evaluated the impact of five common error classes on a subset of 749 n-ary tuples. From our deep analysis we unreveal important research directions for a next generation on OIE systems.

pdf bib
Massively Multilingual Neural Grapheme-to-Phoneme Conversion
Ben Peters | Jon Dehdari | Josef van Genabith

Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and automatic speech recognition systems. Most g2p systems are monolingual: they require language-specific data or handcrafting of rules. Such systems are difficult to extend to low resource languages, for which data and handcrafted rules are not available. As an alternative, we present a neural sequence-to-sequence approach to g2p which is trained on spelling–pronunciation pairs in hundreds of languages. The system shares a single encoder and decoder across all languages, allowing it to utilize the intrinsic similarities between different writing systems. We show an 11% improvement in phoneme error rate over an approach based on adapting high-resource monolingual g2p models to low-resource languages. Our model is also much more compact relative to previous approaches.

pdf bib
BIBI System Description: Building with CNNs and Breaking with Deep Reinforcement Learning
Yitong Li | Trevor Cohn | Timothy Baldwin

This paper describes our submission to the sentiment analysis sub-task of “Build It, Break It: The Language Edition (BIBI)”, on both the builder and breaker sides. As a builder, we use convolutional neural nets, trained on both phrase and sentence data. As a breaker, we use Q-learning to learn minimal change pairs, and apply a token substitution method automatically. We analyse the results to gauge the robustness of NLP systems.

pdf bib
Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World Knowledge to Fool Sentiment Analysis Systems
Taylor Mahler | Willy Cheung | Micha Elsner | David King | Marie-Catherine de Marneffe | Cory Shain | Symon Stevens-Guille | Michael White

This paper describes our “breaker” submission to the 2017 EMNLP “Build It Break It” shared task on sentiment analysis. In order to cause the “builder” systems to make incorrect predictions, we edited items in the blind test data according to linguistically interpretable strategies that allow us to assess the ease with which the builder systems learn various components of linguistic structure. On the whole, our submitted pairs break all systems at a high rate (72.6%), indicating that sentiment analysis as an NLP task may still have a lot of ground to cover. Of the breaker strategies that we consider, we find our semantic and pragmatic manipulations to pose the most substantial difficulties for the builder systems.

pdf bib
An Adaptable Lexical Simplification Architecture for Major Ibero-Romance Languages
Daniel Ferrés | Horacio Saggion | Xavier Gómez Guinovart

Lexical Simplification is the task of reducing the lexical complexity of textual documents by replacing difficult words with easier to read (or understand) expressions while preserving the original meaning. The development of robust pipelined multilingual architectures able to adapt to new languages is of paramount importance in lexical simplification. This paper describes and evaluates a modular hybrid linguistic-statistical Lexical Simplifier that deals with the four major Ibero-Romance Languages: Spanish, Portuguese, Catalan, and Galician. The architecture of the system is the same for the four languages addressed, only the language resources used during simplification are language specific.

pdf bib
Cross-genre Document Retrieval: Matching between Conversational and Formal Writings
Tomasz Jurczyk | Jinho D. Choi

This paper challenges a cross-genre document retrieval task, where the queries are in formal writing and the target documents are in conversational writing. In this task, a query, is a sentence extracted from either a summary or a plot of an episode in a TV show, and the target document consists of transcripts from the corresponding episode. To establish a strong baseline, we employ the current state-of-the-art search engine to perform document retrieval on the dataset collected for this work. We then introduce a structure reranking approach to improve the initial ranking by utilizing syntactic and semantic structures generated by NLP tools. Our evaluation shows an improvement of more than 4% when the structure reranking is applied, which is very promising.

pdf bib
ACTSA: Annotated Corpus for Telugu Sentiment Analysis
Sandeep Sricharan Mukku | Radhika Mamidi

Sentiment analysis deals with the task of determining the polarity of a document or sentence and has received a lot of attention in recent years for the English language. With the rapid growth of social media these days, a lot of data is available in regional languages besides English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labelled data of sentences for Telugu Sentiment Analysis. In this paper, we describe an effort to build a gold-standard annotated corpus of Telugu sentences to support Telugu Sentiment Analysis. The corpus, named ACTSA (Annotated Corpus for Telugu Sentiment Analysis) has a collection of Telugu sentences taken from different sources which were then pre-processed and manually annotated by native Telugu speakers using our annotation guidelines. In total, we have annotated 5457 sentences, which makes our corpus the largest resource currently available. The corpus and the annotation guidelines are made publicly available.

pdf bib
Strawman: An Ensemble of Deep Bag-of-Ngrams for Sentiment Analysis
Kyunghyun Cho

This paper describes a builder entry, named “strawman”, to the sentence-level sentiment analysis task of the “Build It, Break It” shared task of the First Workshop on Building Linguistically Generalizable NLP Systems. The goal of a builder is to provide an automated sentiment analyzer that would serve as a target for breakers whose goal is to find pairs of minimally-differing sentences that break the analyzer.

pdf bib
Breaking Sentiment Analysis of Movie Reviews
Ieva Staliūnaitė | Ben Bonfil

The current paper covers several strategies we used to ‘break’ predictions of sentiment analysis systems participating in the BLGNLP2017 workshop. Specifically, we identify difficulties of participating systems in understanding modals, subjective judgments, world-knowledge based references and certain differences in syntax and perspective.