Caroline Sporleder

2022

pdf bib
MONAPipe: Modes of Narration and Attribution Pipeline for German Computational Literary Studies and Language Analysis in spaCy
Tillmann Dönicke | Florian Barth | Hanna Varachkina | Caroline Sporleder
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)

2021

pdf bib abs
Employing Wikipedia as a resource for Named Entity Recognition in Morphologically complex under-resourced languages
Aravind Krishnan | Stefan Ziehe | Franziska Pannach | Caroline Sporleder
Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021)

We propose a novel approach for rapid prototyping of named entity recognisers through the development of semi-automatically annotated datasets. We demonstrate the proposed pipeline on two under-resourced agglutinating languages: the Dravidian language Malayalam and the Bantu language isiZulu. Our approach is weakly supervised and bootstraps training data from Wikipedia and Google Knowledge Graph. Moreover, our approach is relatively language independent and can consequently be ported quickly (and hence cost-effectively) from one language to another, requiring only minor language-specific tailoring.

2016

pdf bib
You Shall Know People by the Company They Keep: Person Name Disambiguation for Social Network Construction
Mariona Coll Ardanuy | Maarten van den Bos | Caroline Sporleder
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2015

pdf bib abs
Clustering of Novels Represented as Social Networks
Mariona Coll Adanay | Caroline Sporleder
Linguistic Issues in Language Technology, Volume 12, 2015 - Literature Lifts up Computational Linguistics

Within the field of literary analysis, there are few branches as confusing as that of genre theory. Literary criticism has failed so far to reach a consensus on what makes a genre a genre. In this paper, we examine the degree to which the character structure of a novel is indicative of the genre it belongs to. With the premise that novels are societies in miniature, we build static and dynamic social networks of characters as a strategy to represent the narrative structure of novels in a quantifiable manner. For each of the novels, we compute a vector of literary-motivated features extracted from their network representation. We perform clustering on the vectors and analyze the resulting clusters in terms of genre and authorship.

pdf bib
Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015)
Eduardo Blanco | Roser Morante | Caroline Sporleder
Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Semantics (ExProM 2015)

This paper describes a methodology for testing and evaluating the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. The methodology is being used in QA4MRE (QA for Machine Reading Evaluation), one of the labs of CLEF. The task was to answer a series of multiple choice tests, each based on a single document. This allows complex questions to be asked but makes evaluation simple and completely automatic. The evaluation architecture is completely multilingual: test documents, questions, and their answers are identical in all the supported languages. Background text collections are comparable collections harvested from the web for a set of predefined topics. Each test received an evaluation score between 0 and 1 using c@1. This measure encourages systems to reduce the number of incorrect answers while maintaining the number of correct ones by leaving some questions unanswered. 12 groups participated in the task, submitting 62 runs in 3 different languages (German, English, and Romanian). All runs were monolingual; no team attempted a cross-language task. We report here the conclusions and lessons learned after the first campaign in 2011.

pdf bib
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics
Roser Morante | Caroline Sporleder
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

2011

pdf bib
Enhancing Active Learning for Semantic Role Labeling via Compressed Dependency Trees
Chenhua Chen | Alexis Palmer | Caroline Sporleder
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
In Search of Missing Arguments: A Linguistic Approach
Josef Ruppenhofer | Philip Gorinski | Caroline Sporleder
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Robust Semantic Analysis for Unseen Data in FrameNet
Alexis Palmer | Afra Alishahi | Caroline Sporleder
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Linguistic Cues for Distinguishing Literal and Non-Literal Usages
Linlin Li | Caroline Sporleder
Coling 2010: Posters

pdf bib
Evaluating FrameNet-style semantic parsing: the role of coverage gaps in FrameNet
Alexis Palmer | Caroline Sporleder
Coling 2010: Posters

pdf bib
SemEval-2010 Task 10: Linking Events and Their Participants in Discourse
Josef Ruppenhofer | Caroline Sporleder | Roser Morante | Collin Baker | Martha Palmer
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Using Gaussian Mixture Models to Detect Figurative Language in Context
Linlin Li | Caroline Sporleder
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Roser Morante | Caroline Sporleder
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

pdf bib abs
Speaker Attribution in Cabinet Protocols
Josef Ruppenhofer | Caroline Sporleder | Fabian Shirokov
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Historical cabinet protocols are a useful resource which enable historians to identify the opinions expressed by politicians on different subjects and at different points of time. While cabinet protocols are often available in digitized form, so far the only method to access their information content is by keyword-based search, which often returns sub-optimal results. We present a method for enriching German cabinet protocols with information about the originators of statements. This requires automatic speaker attribution. Unlike many other approaches, our method can also deal with cases in which the speaker is not explicitly identified in the sentence itself. Such cases are very common in our domain. To avoid costly manual annotation of training data, we design a rule-based system which exploits morpho-syntactic cues. We show that such a system obtains good results, especially with respect to recall which is particularly important for information access.

pdf bib abs
Idioms in Context: The IDIX Corpus
Caroline Sporleder | Linlin Li | Philip Gorinski | Xaver Koch
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Idioms and other figuratively used expressions pose considerable problems to natural language processing applications because they are very frequent and often behave idiosyncratically. Consequently, there has been much research on the automatic detection and extraction of idiomatic expressions. Most studies focus on type-based idiom detection, i.e., distinguishing whether a given expression can (potentially) be used idiomatically. However, many expressions such as ""break the ice"" can have both literal and non-literal readings and need to be disambiguated in a given context (token-based detection). So far relatively few approaches have attempted context-based idiom detection. One reason for this may be that few annotated resources are available that disambiguate expressions in context. With the IDIX corpus, we aim to address this. IDIX is available as an add-on to the BNC and disambiguates different usages of a subset of idioms. We believe that this resource will be useful both for linguistic and computational linguistic studies.

pdf bib abs
Constructing a Textual Semantic Relation Corpus Using a Discourse Treebank
Rui Wang | Caroline Sporleder
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present our work on constructing a textual semantic relation corpus by making use of an existing treebank annotated with discourse relations. We extract adjacent text span pairs and group them into six categories according to the different discourse relations between them. After that, we present the details of our annotation scheme, which includes six textual semantic relations, 'backward entailment', 'forward entailment', 'equality', 'contradiction', 'overlapping', and 'independent'. We also discuss some ambiguous examples to show the difficulty of such annotation task, which cannot be easily done by an automatic mapping between discourse relations and semantic relations. We have two annotators and each of them performs the task twice. The basic statistics on the constructed corpus looks promising: we achieve 81.17% of agreement on the six semantic relation annotation with a .718 kappa score, and it increases to 91.21% if we collapse the last two labels with a .775 kappa score.

pdf bib
Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
Linlin Li | Benjamin Roth | Caroline Sporleder
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
Classifier Combination for Contextual Idiom Detection Without Labelled Data
Linlin Li | Caroline Sporleder
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation
Ines Rehbein | Josef Ruppenhofer | Caroline Sporleder
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Non-literal Use of Multiword Expressions
Linlin Li | Caroline Sporleder
Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (TextGraphs-4)

pdf bib
Semantic Argument Structure in DiscoursE: The SEASIDE project (project abstract)
Caroline Sporleder
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions
Caroline Sporleder | Linlin Li
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
Semantic Role Assignment for Event Nominalisations by Leveraging Verbal Data
Sebastian Padó | Marco Pennacchiotti | Caroline Sporleder
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
ILK: Machine learning of semantic relations with shallow features and almost no data
Iris Hendrickx | Roser Morante | Caroline Sporleder | Antal van den Bosch
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).
Caroline Sporleder | Antal van den Bosch | Claire Grover
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

pdf bib
Bootstrapping Information Extraction from Field Books
Sander Canisius | Caroline Sporleder
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Spotting the ‘Odd-one-out’: Data-Driven Error Detection and Correction in Textual Databases
Caroline Sporleder | Marieke van Erp | Tijn Porcelijn | Antal van den Bosch
Proceedings of the Workshop on Adaptive Text Extraction and Mining (ATEM 2006)

pdf bib abs
Identifying Named Entities in Text Databases from the Natural History Domain
Caroline Sporleder | Marieke van Erp | Tijn Porcelijn | Antal van den Bosch | Pim Arntzen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we investigate whether it is possible to bootstrap a named entity tagger for textual databases by exploiting the database structure to automatically generate domain and database-specific gazetteer lists. We compare three tagging strategies: (i) using the extracted gazetteers in a look-up tagger, (ii) using the gazetteers to automatically extract training data to train a database-specific tagger, and (iii) using a generic named entity tagger. Our results suggest that automatically built gazetteers in combination with a look-up tagger lead to a relatively good performance and that generic taggers do not perform particularly well on this type of data.