Maarten de Rijke

Also published as: Maarten De Rijke


2023

pdf bib
Generalizing Few-Shot Named Entity Recognizers to Unseen Domains with Type-Related Features
Zihan Wang | Ziqi Zhao | Zhumin Chen | Pengjie Ren | Maarten de Rijke | Zhaochun Ren
Findings of the Association for Computational Linguistics: EMNLP 2023

Few-shot named entity recognition (NER) has shown remarkable progress in identifying entities in low-resource domains. However, few-shot NER methods still struggle with out-of-domain (OOD) examples due to their reliance on manual labeling for the target domain. To address this limitation, recent studies enable generalization to an unseen target domain with only a few labeled examples using data augmentation techniques. Two important challenges remain: First, augmentation is limited to the training data, resulting in minimal overlap between the generated data and OOD examples. Second, knowledge transfer is implicit and insufficient, severely hindering model generalizability and the integration of knowledge from the source domain. In this paper, we propose a framework, prompt learning with type-related features (PLTR), to address these challenges. To identify useful knowledge in the source domain and enhance knowledge transfer, PLTR automatically extracts entity type-related features (TRFs) based on mutual information criteria. To bridge the gap between training and OOD data, PLTR generates a unique prompt for each unseen example by selecting relevant TRFs. We show that PLTR achieves significant performance improvements on in-domain and cross-domain datasets. The use of PLTR facilitates model adaptation and increases representation similarities between the source and unseen domains.

pdf bib
From Relevance to Utility: Evidence Retrieval with Feedback for Fact Verification
Hengran Zhang | Ruqing Zhang | Jiafeng Guo | Maarten de Rijke | Yixing Fan | Xueqi Cheng
Findings of the Association for Computational Linguistics: EMNLP 2023

Retrieval-enhanced methods have become a primary approach in fact verification (FV); it requires reasoning over multiple retrieved pieces of evidence to verify the integrity of a claim. To retrieve evidence, existing work often employs off-the-shelf retrieval models whose design is based on the probability ranking principle. We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence. We introduce the feedback-based evidence retriever (FER) that optimizes the evidence retrieval process by incorporating feedback from the claim verifier. As a feedback signal we use the divergence in utility between how effectively the verifier utilizes the retrieved evidence and the ground-truth evidence to produce the final claim label. Empirical studies demonstrate the superiority of FER over prevailing baselines.

pdf bib
MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering
Vaishali Pal | Andrew Yates | Evangelos Kanoulas | Maarten de Rijke
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advances in tabular question answering (QA) with large language models are constrained in their coverage and only answer questions over a single table. However, real-world queries are complex in nature, often over multiple tables in a relational database or web page. Single table questions do not involve common table operations such as set operations, Cartesian products (joins), or nested queries. Furthermore, multi-table operations often result in a tabular output, which necessitates table generation capabilities of tabular QA models. To fill this gap, we propose a new task of answering questions over multiple tables. Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers. To enable effective training, we build a pre-training dataset comprising of 132,645 SQL queries and tabular answers. Further, we evaluate the generated tables by introducing table-specific metrics of varying strictness assessing various levels of granularity of the table structure. MultiTabQA outperforms state-of-the-art single table QA models adapted to a multi-table QA setting by finetuning on three datasets: Spider, Atis and GeoQuery.

pdf bib
Answering Ambiguous Questions via Iterative Prompting
Weiwei Sun | Hengyi Cai | Hongshen Chen | Pengjie Ren | Zhumin Chen | Maarten de Rijke | Zhaochun Ren
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist. To provide feasible answers to an ambiguous question,one approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity. An alternative is to gather candidate answers and aggregate them, but this method can be computationally costly and may neglect dependencies among answers. In this paper, we present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions. Specifically, we integrate an answering model with a prompting model in an iterative manner. The prompting model adaptively tracks the reading process and progressively triggers the answering model to compose distinct and relevant answers. Additionally, we develop a task-specific post-pretraining approach for both the answering model and the prompting model, which greatly improves the performance of our framework. Empirical studies on two commonly-used open benchmarks show that AmbigPrompt achieves state-of-the-art or competitive results while using less memory and having a lower inference latency than competing approaches. Additionally, AmbigPrompt also performs well in low-resource settings.

2022

pdf bib
News Article Retrieval in Context for Event-centric Narrative Creation
Nikos Voskarides | Edgar Meij | Sabrina Sauer | Maarten de Rijke
Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022)

Writers such as journalists often use automatic tools to find relevant content to include in their narratives. In this paper, we focus on supporting writers in the news domain to develop event-centric narratives. Given an incomplete narrative that specifies a main event and a context, we aim to retrieve news articles that discuss relevant events that would enable the continuation of the narrative. We formally define this task and propose a retrieval dataset construction procedure that relies on existing news articles to simulate incomplete narratives and relevant articles. Experiments on two datasets derived from this procedure show that state-of-the-art lexical and semantic rankers are not sufficient for this task. We show that combining those with a ranker that ranks articles by reverse chronological order outperforms those rankers alone. We also perform an in-depth quantitative and qualitative analysis of the results that sheds light on the characteristics of this task.

2021

pdf bib
A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues
Yangjun Zhang | Pengjie Ren | Maarten de Rijke
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Conversational dialogue systems (CDSs) are hard to evaluate due to the complexity of natural language. Automatic evaluation of dialogues often shows insufficient correlation with human judgements. Human evaluation is reliable but labor-intensive. We introduce a human-machine collaborative framework, HMCEval, that can guarantee reliability of the evaluation outcomes with reduced human effort. HMCEval casts dialogue evaluation as a sample assignment problem, where we need to decide to assign a sample to a human or a machine for evaluation. HMCEval includes a model confidence estimation module to estimate the confidence of the predicted sample assignment, and a human effort estimation module to estimate the human effort should the sample be assigned to human evaluation, as well as a sample assignment execution module that finds the optimum assignment solution based on the estimated confidence and effort. We assess the performance of HMCEval on the task of evaluating malevolence in dialogues. The experimental results show that HMCEval achieves around 99% evaluation accuracy with half of the human effort spared, showing that HMCEval provides reliable evaluation outcomes while reducing human effort by a large amount.

pdf bib
Learning to Ask Conversational Questions by Optimizing Levenshtein Distance
Zhongkun Liu | Pengjie Ren | Zhumin Chen | Zhaochun Ren | Maarten de Rijke | Ming Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Conversational Question Simplification (CQS) aims to simplify self-contained questions into conversational ones by incorporating some conversational characteristics, e.g., anaphora and ellipsis. Existing maximum likelihood estimation based methods often get trapped in easily learned tokens as all tokens are treated equally during training. In this work, we introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimizes the minimum Levenshtein distance through explicit editing actions. RISE is able to pay attention to tokens that are related to conversational characteristics. To train RISE, we devise an Iterative Reinforce Training (IRT) algorithm with a Dynamic Programming based Sampling (DPS) process to improve exploration. Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods and generalizes well on unseen data.

2020

pdf bib
WN-Salience: A Corpus of News Articles with Entity Salience Annotations
Chuan Wu | Evangelos Kanoulas | Maarten de Rijke | Wei Lu
Proceedings of the Twelfth Language Resources and Evaluation Conference

Entities can be found in various text genres, ranging from tweets and web pages to user queries submitted to web search engines. Existing research either considers all entities in the text equally important, or heuristics are used to measure their salience. We believe that a key reason for the relatively limited work on entity salience is the lack of appropriate datasets. To support research on entity salience, we present a new dataset, the WikiNews Salience dataset (WN-Salience), which can be used to benchmark tasks such as entity salience detection and salient entity linking. WN-Salience is built on top of Wikinews, a Wikimedia project whose mission is to present reliable news articles. Entities in Wikinews articles are identified by the authors of the articles and are linked to Wikinews categories when they are salient or to Wikipedia pages otherwise. The dataset is built automatically, and consists of approximately 7,000 news articles, and 90,000 in-text entity annotations. We compare the WN-Salience dataset against existing datasets on the task and analyze their differences. Furthermore, we conduct experiments on entity salience detection; the results demonstrate that WN-Salience is a challenging testbed that is complementary to existing ones.

pdf bib
Guided Dialogue Policy Learning without Adversarial Learning in the Loop
Ziming Li | Sungjin Lee | Baolin Peng | Jinchao Li | Julia Kiseleva | Maarten de Rijke | Shahin Shayandeh | Jianfeng Gao
Findings of the Association for Computational Linguistics: EMNLP 2020

Reinforcement learning methods have emerged as a popular choice for training an efficient and effective dialogue policy. However, these methods suffer from sparse and unstable reward signals returned by a user simulator only when a dialogue finishes. Besides, the reward signal is manually designed by human experts, which requires domain knowledge. Recently, a number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy. However, to alternatively update the dialogue policy and the reward model on the fly, we are limited to policy-gradient-based algorithms, such as REINFORCE and PPO. Moreover, the alternating training of a dialogue agent and the reward model can easily get stuck in local optima or result in mode collapse. To overcome the listed issues, we propose to decompose the adversarial training into two steps. First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common reinforcement learning method to guide the dialogue policy learning. This approach is applicable to both on-policy and off-policy reinforcement learning methods. Based on our extensive experimentation, we can conclude the proposed method: (1) achieves a remarkable task success rate using both on-policy and off-policy reinforcement learning methods; and (2) has potential to transfer knowledge from existing domains to a new domain.

pdf bib
Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems
Ziming Li | Julia Kiseleva | Maarten de Rijke
Findings of the Association for Computational Linguistics: EMNLP 2020

Dialogue policy learning for task-oriented dialogue systems has enjoyed great progress recently mostly through employing reinforcement learning methods. However, these approaches have become very sophisticated. It is time to re-evaluate it. Are we really making progress developing dialogue agents only based on reinforcement learning? We demonstrate how (1) traditional supervised learning together with (2) a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art reinforcement learning-based methods. First, we introduce a simple dialogue action decoder to predict the appropriate actions. Then, the traditional multi-label classification solution for dialogue policy learning is extended by adding dense layers to improve the dialogue agent performance. Finally, we employ the Gumbel-Softmax estimator to alternatively train the dialogue agent and the dialogue reward model without using reinforcement learning. Based on our extensive experimentation, we can conclude the proposed methods can achieve more stable and higher performance with fewer efforts, such as the domain knowledge required to design a user simulator and the intractable parameter tuning in reinforcement learning. Our main goal is not to beat RL with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.

2018

pdf bib
Why are Sequence-to-Sequence Models So Dull? Understanding the Low-Diversity Problem of Chatbots
Shaojie Jiang | Maarten de Rijke
Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI

Diversity is a long-studied topic in information retrieval that usually refers to the requirement that retrieved results should be non-repetitive and cover different aspects. In a conversational setting, an additional dimension of diversity matters: an engaging response generation system should be able to output responses that are diverse and interesting. Sequence-to-sequence (Seq2Seq) models have been shown to be very effective for response generation. However, dialogue responses generated by Seq2Seq models tend to have low diversity. In this paper, we review known sources and existing approaches to this low-diversity problem. We also identify a source of low diversity that has been little studied so far, namely model over-confidence. We sketch several directions for tackling model over-confidence and, hence, the low-diversity problem, including confidence penalties and label smoothing.

2016

pdf bib
Siamese CBOW: Optimizing Word Embeddings for Sentence Representations
Tom Kenter | Alexey Borisov | Maarten de Rijke
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Learning to Explain Entity Relationships in Knowledge Graphs
Nikos Voskarides | Edgar Meij | Manos Tsagkias | Maarten de Rijke | Wouter Weerkamp
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Prior-informed Distant Supervision for Temporal Evidence Classification
Ridho Reinanda | Maarten de Rijke
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2010

pdf bib
Mining User Experiences from Online Forums: An Exploration
Valentin Jijkoun | Wouter Weerkamp | Maarten de Rijke | Paul Ackermans | Gijs Geleijnse
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media

pdf bib
Generating Focused Topic-Specific Sentiment Lexicons
Valentin Jijkoun | Maarten de Rijke | Wouter Weerkamp
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections
Wouter Weerkamp | Krisztian Balog | Maarten de Rijke
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Credibility Improves Topical Blog Post Retrieval
Wouter Weerkamp | Maarten de Rijke
Proceedings of ACL-08: HLT

2007

pdf bib
Learning to Transform Linguistic Graphs
Valentin Jijkoun | Maarten de Rijke
Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing

pdf bib
UVA: Language Modeling Techniques for Web People Search
Krisztian Balog | Leif Azzopardi | Maarten de Rijke
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
A Cascaded Machine Learning Approach to Interpreting Temporal Expressions
David Ahn | Joris van Rantwijk | Maarten de Rijke
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

2006

pdf bib
Why Are They Excited? Identifying and Explaining Spikes in Blog Mood Levels
Krisztian Balog | Gilad Mishne | Maarten de Rijke
Demonstrations

pdf bib
The Multilingual Question Answering Track at CLEF
Bernardo Magnini | Danilo Giampiccolo | Lili Aunimo | Christelle Ayache | Petya Osenova | Anselmo Peñas | Maarten de Rijke | Bogdan Sacaleanu | Diana Santos | Richard Sutcliffe
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents an overview of the Multilingual Question Answering evaluation campaigns which have been organized at CLEF (Cross Language Evaluation Forum) since 2003. Over the years, the competition has registered a steady increment in the number of participants and languages involved. In fact, from the original eight groups which participated in 2003 QA track, the number of competitors in 2005 rose to twenty-four. Also, the performances of the systems have steadily improved, and the average of the best performances in the 2005 saw an increase of 10% with respect to the previous year.

pdf bib
Representing and Querying Multi-dimensional Markup for Question Answering
Wouter Alink | Valentin Jijkoun | David Ahn | Maarten de Rijke | Peter Boncz | Arjen de Vries
Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing

pdf bib
Learning to Recognize Blogs: A Preliminary Exploration
Erik Elgersma | Maarten de Rijke
Proceedings of the Workshop on NEW TEXT Wikis and blogs and other dynamic text sources

pdf bib
Finding Similar Sentences across Multiple Languages in Wikipedia
Sisay Fissaha Adafre | Maarten de Rijke
Proceedings of the Workshop on NEW TEXT Wikis and blogs and other dynamic text sources

2005

pdf bib
Feature Engineering and Post-Processing for Temporal Expression Recognition Using Conditional Random Fields
Sisay Fissaha Adafre | Maarten de Rijke
Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing

2004

pdf bib
Comparing the Ambiguity Reduction Abilities of Probabilistic Context-Free Grammars
Gabriel Infante-Lopez | Maarten de Rijke
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Using WordNet to Measure Semantic Orientations of Adjectives
Jaap Kamps | Maarten Marx | Robert J. Mokken | Maarten de Rijke
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Enriching the Output of a Parser Using Memory-based Learning
Valentin Jijkoun | Maarten de Rijke
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Alternative approaches for Generating Bodies of Grammar Rules
Gabriel Infante-Lopez | Maarten de Rijke
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Information Extraction for Question Answering: Improving Recall Through Syntactic Patterns
Valentin Jijkoun | Jori Mur | Maarten de Rijke
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
BioGrapher: Biography Questions as a Restricted Domain Question Answering Task
Oren Tsur | Maarten de Rijke | Khalil Sima’an
Proceedings of the Conference on Question Answering in Restricted Domains

pdf bib
The University of Amsterdam at Senseval-3: Semantic roles and Logic forms
David Ahn | Sisay Fissaha | Valentin Jijkoun | Maarten De Rijke
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text