Héctor Martínez Alonso

Also published as: Héctor Martínez, Hector Martínez Alonso, Hector Martinez, Hector Martinez Alonso, Héctor Martinez Alonso, Héctor Martínez Alonso


pdf pdf bib
Automatic Annotation of Semantic Term Types in the Complete ACL Anthology Reference Corpus
Anne-Kathrin Schumann | Héctor Martínez Alonso

pdf pdf bib
Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer
Djamé Seddah | Eric de la Clergerie | Benoît Sagot | Héctor Martínez Alonso | Marie Candito

pdf pdf bib
Grotoco@SLAM: Second Language Acquisition Modeling with Simple Features, Learners and Task-wise Models
Sigrid Klerke | Héctor Martínez Alonso | Barbara Plank

We present our submission to the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM). We focus on evaluating a range of features for the task, including user-derived measures, while examining how far we can get with a simple linear classifier. Our analysis reveals that errors differ per exercise format, which motivates our final and best-performing system: a task-wise (per exercise-format) model.


pdf pdf bib
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf pdf bib
Annotating omission in statement pairs
Héctor Martínez Alonso | Amaury Delamaire | Benoît Sagot

We focus on the identification of omission in statement pairs. We compare three annotation schemes, namely two different crowdsourcing schemes and manual expert annotation. We show that the simplest of the two crowdsourcing approaches yields a better annotation quality than the more complex one. We use a dedicated classifier to assess whether the annotators’ behavior can be explained by straightforward linguistic features. The classifier benefits from a modeling that uses lexical information beyond length and overlap measures. However, for our task, we argue that expert and not crowdsourcing-based annotation is the best compromise between annotation cost and quality.

pdf pdf bib
Benchmarking Joint Lexical and Syntactic Analysis on Multiword-Rich Data
Matthieu Constant | Héctor Martinez Alonso

This article evaluates the extension of a dependency parser that performs joint syntactic analysis and multiword expression identification. We show that, given sufficient training data, the parser benefits from explicit multiword information and improves overall labeled accuracy score in eight of the ten evaluation cases.

pdf pdf bib
Improving neural tagging with lexical information
Benoît Sagot | Héctor Martínez Alonso

Neural part-of-speech tagging has achieved competitive results with the incorporation of character-based and pre-trained word embeddings. In this paper, we show that a state-of-the-art bi-LSTM tagger can benefit from using information from morphosyntactic lexicons as additional input. The tagger, trained on several dozen languages, shows a consistent, average improvement when using lexical information, even when also using character-based embeddings, thus showing the complementarity of the different sources of lexical information. The improvements are particularly important for the smaller datasets.

pdf pdf bib
When is multitask learning effective? Semantic sequence prediction under varying data conditions
Héctor Martínez Alonso | Barbara Plank

Multitask learning has been applied successfully to a range of tasks, mostly morphosyntactic. However, little is known on when MTL works and whether there are data characteristics that help to determine the success of MTL. In this paper we evaluate a range of semantic sequence labeling tasks in a MTL setup. We examine different auxiliary task configurations, amongst which a novel setup, and correlate their impact to data-dependent conditions. Our results show that MTL is not always effective, because significant improvements are obtained only for 1 out of 5 tasks. When successful, auxiliary tasks with compact and more uniform label distributions are preferable.

pdf pdf bib
Parsing Universal Dependencies without training
Héctor Martínez Alonso | Željko Agić | Barbara Plank | Anders Søgaard

We present UDP, the first training-free parser for Universal Dependencies (UD). Our algorithm is based on PageRank and a small set of specific dependency head rules. UDP features two-step decoding to guarantee that function words are attached as leaf nodes. The parser requires no training, and it is competitive with a delexicalized transfer system. UDP offers a linguistically sound unsupervised alternative to cross-lingual parsing for UD. The parser has very few parameters and distinctly robust to domain change across languages.


pdf pdf bib
Supersense tagging with inter-annotator disagreement
Héctor Martínez Alonso | Anders Johannsen | Barbara Plank

pdf pdf bib
Learning Paraphrasing for Multiword Expressions
Seid Muhie Yimam | Héctor Martínez Alonso | Martin Riedl | Chris Biemann

pdf pdf bib
From Noisy Questions to Minecraft Texts: Annotation Challenges in Extreme Syntax Scenario
Héctor Martínez Alonso | Djamé Seddah | Benoît Sagot

User-generated content presents many challenges for its automatic processing. While many of them do come from out-of-vocabulary effects, others spawn from different linguistic phenomena such as unusual syntax. In this work we present a French three-domain data set made up of question headlines from a cooking forum, game chat logs and associated forums from two popular online games (MINECRAFT & LEAGUE OF LEGENDS). We chose these domains because they encompass different degrees of lexical and syntactic compliance with canonical language. We conduct an automatic and manual evaluation of the difficulties of processing these domains for part-of-speech prediction, and introduce a pilot study to determine whether dependency analysis lends itself well to annotate these data. We also discuss the development cost of our data set.

pdf pdf bib
The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories
Bolette Pedersen | Anna Braasch | Anders Johannsen | Héctor Martínez Alonso | Sanni Nimb | Sussi Olsen | Anders Søgaard | Nicolai Hartvig Sørensen

We launch the SemDaX corpus which is a recently completed Danish human-annotated corpus available through a CLARIN academic license. The corpus includes approx. 90,000 words, comprises six textual domains, and is annotated with sense inventories of different granularity. The aim of the developed corpus is twofold: i) to assess the reliability of the different sense annotation schemes for Danish measured by qualitative analyses and annotation agreement scores, and ii) to serve as training and test data for machine learning algorithms with the practical purpose of developing sense taggers for Danish. To these aims, we take a new approach to human-annotated corpus resources by double annotating a much larger part of the corpus than what is normally seen: for the all-words task we double annotated 60% of the material and for the lexical sample task 100%. We include in the corpus not only the adjucated files, but also the diverging annotations. In other words, we consider not all disagreement to be noise, but rather to contain valuable linguistic information that can help us improve our annotation schemes and our learning algorithms.

pdf pdf bib
Multilingual Projection for Parsing Truly Low-Resource Languages
Željko Agić | Anders Johannsen | Barbara Plank | Héctor Martínez Alonso | Natalie Schluter | Anders Søgaard

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.

pdf pdf bib
CoastalCPH at SemEval-2016 Task 11: The importance of designing your Neural Networks right
Joachim Bingel | Natalie Schluter | Héctor Martínez Alonso

pdf pdf bib
MSejrKu at SemEval-2016 Task 14: Taxonomy Enrichment by Evidence Ranking
Michael Schlichtkrull | Héctor Martínez Alonso


pdf pdf bib
Inverted indexing for cross-lingual NLP
Anders Søgaard | Željko Agić | Héctor Martínez Alonso | Barbara Plank | Bernd Bohnet | Anders Johannsen

pdf pdf bib
Do dependency parsing metrics correlate with human judgments?
Barbara Plank | Héctor Martínez Alonso | Željko Agić | Danijela Merkler | Anders Søgaard

pdf pdf bib
Mining for unambiguous instances to adapt part-of-speech taggers to new domains
Dirk Hovy | Barbara Plank | Héctor Martínez Alonso | Anders Søgaard

pdf pdf bib
Learning to parse with IAA-weighted loss
Héctor Martínez Alonso | Barbara Plank | Arne Skjærholt | Anders Søgaard

pdf pdf bib
Non-canonical language is not harder to annotate than canonical language
Barbara Plank | Héctor Martínez Alonso | Anders Søgaard

pdf pdf bib
Supersense tagging for Danish
Héctor Martínez Alonso | Anders Johannsen | Sussi Olsen | Sanni Nimb | Nicolai Hartvig Sørensen | Anna Braasch | Anders Søgaard | Bolette Sandford Pedersen

pdf pdf bib
Looking hard: Eye tracking for detecting grammaticality of automatically compressed sentences
Sigrid Klerke | Héctor Martínez Alonso | Anders Søgaard

pdf pdf bib
Active learning for sense annotation
Héctor Martínez Alonso | Barbara Plank | Anders Johannsen | Anders Søgaard

pdf pdf bib
Coarse-grained sense annotation of Danish across textual domains
Sussi Olsen | Bolette S. Pedersen | Héctor Martínez Alonso | Anders Johannsen

pdf pdf bib
Predicting word sense annotation agreement
Héctor Martínez Alonso | Anders Johannsen | Oier Lopez de Lacalle | Eneko Agirre

pdf pdf bib
CPH: Sentiment analysis of Figurative Language on Twitter #easypeasy #not
Sarah McGillion | Héctor Martínez Alonso | Barbara Plank

pdf pdf bib
Any-language frame-semantic parsing
Anders Johannsen | Héctor Martínez Alonso | Anders Søgaard


pdf bib
Crowdsourcing as a preprocessing for complex semantic annotation tasks
Héctor Martínez Alonso | Lauren Romeo

pdf pdf bib
What’s in a p-value in NLP?
Anders Søgaard | Anders Johannsen | Barbara Plank | Dirk Hovy | Hector Martínez Alonso

pdf pdf bib
More or less supervised supersense tagging of Twitter
Anders Johannsen | Dirk Hovy | Héctor Martínez Alonso | Barbara Plank | Anders Søgaard

pdf pdf bib
Copenhagen-Malmö: Tree Approximations of Semantic Parsing Problems
Natalie Schluter | Anders Søgaard | Jakob Elming | Dirk Hovy | Barbara Plank | Héctor Martínez Alonso | Anders Johanssen | Sigrid Klerke


pdf pdf bib
Annotation of regular polysemy and underspecification
Héctor Martínez Alonso | Bolette Sandford Pedersen | Núria Bel

pdf pdf bib
Finding Dependency Parsing Limits over a Large Spanish Corpus
Muntsa Padró | Miguel Ballesteros | Héctor Martínez | Bernd Bohnet

pdf pdf bib
Down-stream effects of tree-to-dependency conversions
Jakob Elming | Anders Johannsen | Sigrid Klerke | Emanuele Lapponi | Hector Martinez Alonso | Anders Søgaard

pdf pdf bib
Class-based Word Sense Induction for dot-type nominals
Lauren Romeo | Héctor Martínez Alonso | Núria Bel

pdf pdf bib
Using Crowdsourcing to get Representations based on Regular Expressions
Anders Søgaard | Hector Martinez | Jakob Elming | Anders Johannsen


pdf pdf bib
EMNLP@CPH: Is frequency all there is to simplicity?
Anders Johannsen | Héctor Martínez | Sigrid Klerke | Anders Søgaard

pdf bib
A voting scheme to detect semantic underspecification
Héctor Martínez Alonso | Núria Bel | Bolette Sandford Pedersen


pdf pdf bib
Shared Task System Description: Frustratingly Hard Compositionality Prediction
Anders Johannsen | Hector Martinez | Christian Rishøj | Anders Søgaard

pdf pdf bib
Identification of sense selection in regular polysemy using shallow features
Héctor Martínez Alonso | Núria Bel | Bolette Sandford Pedersen