Danushka Bollegala


2024

pdf bib
In-Contextual Gender Bias Suppression for Large Language Models
Daisuke Oba | Masahiro Kaneko | Danushka Bollegala
Findings of the Association for Computational Linguistics: EACL 2024

Despite their impressive performance in a wide range of NLP tasks, Large Language Models (LLMs) have been reported to encode worrying-levels of gender biases. Prior work has proposed debiasing methods that require human labelled examples, data augmentation and fine-tuning of LLMs, which are computationally costly. Moreover, one might not even have access to the model parameters for performing debiasing such as in the case of closed LLMs such as GPT-4. To address this challenge, we propose bias suppression that prevents biased generations of LLMs by simply providing textual preambles constructed from manually designed templates and real-world statistics, without accessing to model parameters. We show that, using CrowsPairs dataset, our textual preambles covering counterfactual statements can suppress gender biases in English LLMs such as LLaMA2. Moreover, we find that gender-neutral descriptions of gender-biased objects can also suppress their gender biases. Moreover, we show that bias suppression has acceptable adverse effect on downstream task performance with HellaSwag and COPA.

2023

pdf bib
Together We Make Sense–Learning Meta-Sense Embeddings
Haochen Luo | Yi Zhou | Danushka Bollegala
Findings of the Association for Computational Linguistics: ACL 2023

Sense embedding learning methods learn multiple vectors for a given ambiguous word, corresponding to its different word senses. For this purpose, different methods have been proposed in prior work on sense embedding learning that use different sense inventories, sense-tagged corpora and learning methods. However, not all existing sense embeddings cover all senses of ambiguous words equally well due to the discrepancies in their training resources. To address this problem, we propose the first-ever meta-sense embedding method – Neighbour Preserving Meta-Sense Embeddings, which learns meta-sense embeddings by combining multiple independently trained source sense embeddings such that the sense neighbourhoods computed from the source embeddings are preserved in the meta-embedding space. Our proposed method can combine source sense embeddings that cover different sets of word senses. Experimental results on Word Sense Disambiguation (WSD) and Word-in-Context (WiC) tasks show that the proposed meta-sense embedding method consistently outperforms several competitive baselines. An anonymised version of the source code implementation for our proposed method is submitted to reviewing system. Both source code and the learnt meta-sense embeddings will be publicly released upon paper acceptance.

pdf bib
Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings
Taichi Aida | Danushka Bollegala
Findings of the Association for Computational Linguistics: ACL 2023

Languages are dynamic entities, where the meanings associated with words constantly change with time. Detecting the semantic variation of words is an important task for various NLP applications that must make time-sensitive predictions. Existing work on semantic variation prediction have predominantly focused on comparing some form of an averaged contextualised representation of a target word computed from a given corpus. However, some of the previously associated meanings of a target word can become obsolete over time (e.g. meaning of gay as happy), while novel usages of existing words are observed (e.g. meaning of cell as a mobile phone).We argue that mean representations alone cannot accurately capture such semantic variations and propose a method that uses the entire cohort of the contextualised embeddings of the target word, which we refer to as the sibling distribution. Experimental results on SemEval-2020 Task 1 benchmark dataset for semantic variation prediction show that our method outperforms prior work that consider only the mean embeddings, and is comparable to the current state-of-the-art. Moreover, a qualitative analysis shows that our method detects important semantic changes in words that are not captured by the existing methods.

pdf bib
Solving Cosine Similarity Underestimation between High Frequency Words by 2 Norm Discounting
Saeth Wannasuphoprasit | Yi Zhou | Danushka Bollegala
Findings of the Association for Computational Linguistics: ACL 2023

Cosine similarity between two words, computed using their contextualised token embeddings obtained from masked language models (MLMs) such as BERT has shown to underestimate the actual similarity between those words CITATION.This similarity underestimation problem is particularly severe for high frequent words. Although this problem has been noted in prior work, no solution has been proposed thus far. We observe that the 2 norm of contextualised embeddings of a word correlates with its log-frequency in the pretraining corpus.Consequently, the larger 2 norms associated with the high frequent words reduce the cosine similarity values measured between them, thus underestimating the similarity scores.To solve this issue, we propose a method to discount the 2 norm of a contextualised word embedding by the frequency of that word in a corpus when measuring the cosine similarities between words.We show that the so called stop words behave differently from the rest of the words, which require special consideration during their discounting process.Experimental results on a contextualised word similarity dataset show that our proposed discounting method accurately solves the similarity underestimation problem.An anonymized version of the source code of our proposed method is submitted to the reviewing system.

pdf bib
Can Word Sense Distribution Detect Semantic Changes of Words?
Xiaohang Tang | Yi Zhou | Taichi Aida | Procheta Sen | Danushka Bollegala
Findings of the Association for Computational Linguistics: EMNLP 2023

Semantic Change Detection of words is an important task for various NLP applications that must make time-sensitive predictions. Some words are used over time in novel ways to express new meanings, and these new meanings establish themselves as novel senses of existing words. On the other hand, Word Sense Disambiguation (WSD) methods associate ambiguous words with sense ids, depending on the context in which they occur. Given this relationship between WSD and SCD, we explore the possibility of predicting whether a target word has its meaning changed between two corpora collected at different time steps, by comparing the distributions of senses of that word in each corpora. For this purpose, we use pretrained static sense embeddings to automatically annotate each occurrence of the target word in a corpus with a sense id. Next, we compute the distribution of sense ids of a target word in a given corpus. Finally, we use different divergence or distance measures to quantify the semantic change of the target word across the two given corpora. Our experimental results on SemEval 2020 Task 1 dataset show that word sense distributions can be accurately used to predict semantic changes of words in English, German, Swedish and Latin.

pdf bib
Swap and Predict – Predicting the Semantic Changes in Words across Corpora by Context Swapping
Taichi Aida | Danushka Bollegala
Findings of the Association for Computational Linguistics: EMNLP 2023

Meanings of words change over time and across domains. Detecting the semantic changes of words is an important task for various NLP applications that must make time-sensitive predictions. We consider the problem of predicting whether a given target word, w, changes its meaning between two different text corpora, 𝒞1 and 𝒞2. For this purpose, we propose Swapping-based Semantic Change Detection (SSCD), an unsupervised method that randomly swaps contexts between 𝒞1 and 𝒞2 where w occurs. We then look at the distribution of contextualised word embeddings of w, obtained from a pretrained masked language model (MLM), representing the meaning of w in its occurrence contexts in 𝒞1 and 𝒞2. Intuitively, if the meaning of w does not change between 𝒞1 and 𝒞2, we would expect the distributions of contextualised word embeddings of w to remain the same before and after this random swapping process. Despite its simplicity, we demonstrate that even by using pretrained MLMs without any fine-tuning, our proposed context swapping method accurately predicts the semantic changes of words in four languages (English, German, Swedish, and Latin) and across different time spans (over 50 years and about five years). Moreover, our method achieves significant performance improvements compared to strong baselines for the English semantic change prediction task. Source code is available at https://github.com/a1da4/svp-swap .

pdf bib
A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings
Danushka Bollegala | Shuichi Otake | Tomoya Machide | Ken-ichi Kawarabayashi
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)

pdf bib
A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models
Yi Zhou | Jose Camacho-Collados | Danushka Bollegala
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Various types of social biases have been reported with pretrained Masked Language Models (MLMs) in prior work. However, multiple underlying factors are associated with an MLM such as its model size, size of the training data, training objectives, the domain from which pretraining data is sampled, tokenization, and languages present in the pretrained corpora, to name a few. It remains unclear as to which of those factors influence social biases that are learned by MLMs. To study the relationship between model factors and the social biases learned by an MLM, as well as the downstream task performance of the model, we conduct a comprehensive study over 39 pretrained MLMs covering different model sizes, training objectives, tokenization methods, training data domains and languages. Our results shed light on important factors often neglected in prior literature, such as tokenization or model objectives.

pdf bib
Learning to Predict Concept Ordering for Common Sense Generation
Tianhui Zhang | Danushka Bollegala | Bei Peng
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated
Masahiro Kaneko | Danushka Bollegala | Naoaki Okazaki
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Evaluating the Robustness of Discrete Prompts
Yoichi Ishibashi | Danushka Bollegala | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Discrete prompts have been used for fine-tuning Pre-trained Language Models for diverse NLP tasks. In particular, automatic methods that generate discrete prompts from a small set of training instances have reported superior performance. However, a closer look at the learnt prompts reveals that they contain noisy and counter-intuitive lexical constructs that would not be encountered in manually-written prompts. This raises an important yet understudied question regarding the robustness of automatically learnt discrete prompts when used in downstream tasks. To address this question, we conduct a systematic study of the robustness of discrete prompts by applying carefully designed perturbations into an application using AutoPrompt and then measure their performance in two Natural Language Inference (NLI) datasets. Our experimental results show that although the discrete prompt-based method remains relatively robust against perturbations to NLI inputs, they are highly sensitive to other types of perturbations such as shuffling and deletion of prompt tokens. Moreover, they generalize poorly across different NLI datasets. We hope our findings will inspire future work on robust discrete prompt learning.

pdf bib
Comparing Intrinsic Gender Bias Evaluation Measures without using Human Annotated Examples
Masahiro Kaneko | Danushka Bollegala | Naoaki Okazaki
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Numerous types of social biases have been identified in pre-trained language models (PLMs), and various intrinsic bias evaluation measures have been proposed for quantifying those social biases. Prior works have relied on human annotated examples to compare existing intrinsic bias evaluation measures. However, this approach is not easily adaptable to different languages nor amenable to large scale evaluations due to the costs and difficulties when recruiting human annotators. To overcome this limitation, we propose a method to compare intrinsic gender bias evaluation measures without relying on human-annotated examples. Specifically, we create multiple bias-controlled versions of PLMs using varying amounts of male vs. female gendered sentences, mined automatically from an unannotated corpus using gender-related word lists. Next, each bias-controlled PLM is evaluated using an intrinsic bias evaluation measure, and the rank correlation between the computed bias scores and the gender proportions used to fine-tune the PLMs is computed. Experiments on multiple corpora and PLMs repeatedly show that the correlations reported by our proposed method that does not require human annotated examples are comparable to those computed using human annotated examples in prior work.

pdf bib
Learning Dynamic Contextualised Word Embeddings via Template-based Temporal Adaptation
Xiaohang Tang | Yi Zhou | Danushka Bollegala
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dynamic contextualised word embeddings (DCWEs) represent the temporal semantic variations of words. We propose a method for learning DCWEs by time-adapting a pretrained Masked Language Model (MLM) using time-sensitive templates. Given two snapshots C1 and C2 of a corpus taken respectively at two distinct timestamps T1 and T2, we first propose an unsupervised method to select (a) pivot terms related to both C1 and C2, and (b) anchor terms that are associated with a specific pivot term in each individual snapshot.We then generate prompts by filling manually compiled templates using the extracted pivot and anchor terms.Moreover, we propose an automatic method to learn time-sensitive templates from C1 and C2, without requiring any human supervision.Next, we use the generated prompts to adapt a pretrained MLM to T2 by fine-tuning using those prompts.Multiple experiments show that our proposed method significantly reduces the perplexity of test sentences in C2, outperforming the current state-of-the-art.

pdf bib
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Danushka Bollegala | Ruihong Huang | Alan Ritter
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

2022

pdf bib
On the Curious Case of l2 norm of Sense Embeddings
Yi Zhou | Danushka Bollegala
Findings of the Association for Computational Linguistics: EMNLP 2022

We show that the l2 norm of a static sense embedding encodes information related to the frequency of that sense in the training corpus used to learn the sense embeddings. This finding can be seen as an extension of a previously known relationship for word embeddings to sense embeddings. Our experimental results show that in spite of its simplicity, the l2 norm of sense embeddings is a surprisingly effective feature for several word sense related tasks such as (a) most frequent sense prediction, (b) word-in-context (WiC), and (c) word sense disambiguation (WSD). In particular, by simply including the l2 norm of a sense embedding as a feature in a classifier, we show that we can improve WiC and WSD methods that use static sense embeddings.

pdf bib
Gender Bias in Meta-Embeddings
Masahiro Kaneko | Danushka Bollegala | Naoaki Okazaki
Findings of the Association for Computational Linguistics: EMNLP 2022

Different methods have been proposed to develop meta-embeddings from a given set of source embeddings. However, the source embeddings can contain unfair gender-related biases, and how these influence the meta-embeddings has not been studied yet. We study the gender bias in meta-embeddings created under three different settings:(1) meta-embedding multiple sources without performing any debiasing (Multi-Source No-Debiasing),(2) meta-embedding multiple sources debiased by a single method (Multi-Source Single-Debiasing), and(3) meta-embedding a single source debiased by different methods (Single-Source Multi-Debiasing).Our experimental results show that meta-embedding amplifies the gender biases compared to input source embeddings.We find that debiasing not only the sources but also their meta-embedding is needed to mitigate those biases. Moreover, we propose a novel debiasing method based on meta-embedding learning where we use multiple debiasing methods on a single source embedding and then create a single unbiased meta-embedding.

pdf bib
Position-based Prompting for Health Outcome Generation
Micheal Abaho | Danushka Bollegala | Paula Williamson | Susanna Dodd
Proceedings of the 21st Workshop on Biomedical Language Processing

Probing factual knowledge in Pre-trained Language Models (PLMs) using prompts has indirectly implied that language models (LMs) can be treated as knowledge bases. To this end, this phenomenon has been effective, especially when these LMs are fine-tuned towards not just data, but also to the style or linguistic pattern of the prompts themselves. We observe that satisfying a particular linguistic pattern in prompts is an unsustainable, time-consuming constraint in the probing task, especially because they are often manually designed and the range of possible prompt template patterns can vary depending on the prompting task. To alleviate this constraint, we propose using a position-attention mechanism to capture positional information of each word in a prompt relative to the mask to be filled, hence avoiding the need to re-construct prompts when the prompts’ linguistic pattern changes. Using our approach, we demonstrate the ability of eliciting answers (in a case study on health outcome generation) to not only common prompt templates like Cloze and Prefix but also rare ones too, such as Postfix and Mixed patterns whose masks are respectively at the start and in multiple random places of the prompt. More so, using various biomedical PLMs, our approach consistently outperforms a baseline in which the default PLMs representation is used to predict masked tokens.

pdf bib
Zero-shot Cross-Lingual Counterfactual Detection via Automatic Extraction and Prediction of Clue Phrases
Asahi Ushio | Danushka Bollegala
Proceedings of the 2nd Workshop on Multi-lingual Representation Learning (MRL)

Counterfactual statements describe events that did not or cannot take place unless some conditions are satisfied. Existing counterfactual detection (CFD) methods assume the availability of manually labelled statements for each language they consider, limiting the broad applicability of CFD. In this paper, we consider the problem of zero-shot cross-lingual transfer learning for CFD. Specifically, we propose a novel loss function based on the clue phrase prediction for generalising a CFD model trained on a source language to multiple target languages, without requiring any human-labelled data. We obtain clue phrases that express various language-specific lexical indicators of counterfactuality in the target language in an unsupervised manner using a neural alignment model. We evaluate our method on the Amazon Multilingual Counterfactual Dataset (AMCD) for English, German, and Japanese languages in the zero-shot cross-lingual transfer setup where no manual annotations are used for the target language during training. The best CFD model fine-tuned on XLM-R improves the macro F1 score by 25% for German and 20% for Japanese target languages compared to a model that is trained only using English source language data.

pdf bib
Query Obfuscation by Semantic Decomposition
Danushka Bollegala | Tomoya Machide | Ken-ichi Kawarabayashi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We propose a method to protect the privacy of search engine users by decomposing the queries using semantically related and unrelated distractor terms. Instead of a single query, the search engine receives multiple decomposed query terms. Next, we reconstruct the search results relevant to the original query term by aggregating the search results retrieved for the decomposed query terms. We show that the word embeddings learnt using a distributed representation learning method can be used to find semantically related and distractor query terms. We derive the relationship between the obfuscity achieved through the proposed query anonymisation method and the reconstructability of the original search results using the decomposed queries. We analytically study the risk of discovering the search engine users’ information intents under the proposed query obfuscation method, and empirically evaluate its robustness against clustering-based attacks. Our experimental results show that the proposed method can accurately reconstruct the search results for user queries, without compromising the privacy of the search engine users.

pdf bib
Unsupervised Attention-based Sentence-Level Meta-Embeddings from Contextualised Language Models
Keigo Takahashi | Danushka Bollegala
Proceedings of the Thirteenth Language Resources and Evaluation Conference

A variety of contextualised language models have been proposed in the NLP community, which are trained on diverse corpora to produce numerous Neural Language Models (NLMs). However, different NLMs have reported different levels of performances in downstream NLP applications when used as text representations. We propose a sentence-level meta-embedding learning method that takes independently trained contextualised word embedding models and learns a sentence embedding that preserves the complementary strengths of the input source NLMs. Our proposed method is unsupervised and is not tied to a particular downstream task, which makes the learnt meta-embeddings in principle applicable to different tasks that require sentence representations. Specifically, we first project the token-level embeddings obtained by the individual NLMs and learn attention weights that indicate the contributions of source embeddings towards their token-level meta-embeddings. Next, we apply mean and max pooling to produce sentence-level meta-embeddings from token-level meta-embeddings. Experimental results on semantic textual similarity benchmarks show that our proposed unsupervised sentence-level meta-embedding method outperforms previously proposed sentence-level meta-embedding methods as well as a supervised baseline.

pdf bib
Sense Embeddings are also Biased – Evaluating Social Biases in Static and Contextualised Sense Embeddings
Yi Zhou | Masahiro Kaneko | Danushka Bollegala
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sense embedding learning methods learn different embeddings for the different senses of an ambiguous word. One sense of an ambiguous word might be socially biased while its other senses remain unbiased. In comparison to the numerous prior work evaluating the social biases in pretrained word embeddings, the biases in sense embeddings have been relatively understudied. We create a benchmark dataset for evaluating the social biases in sense embeddings and propose novel sense-specific bias evaluation measures. We conduct an extensive evaluation of multiple static and contextualised sense embeddings for various types of social biases using the proposed measures. Our experimental results show that even in cases where no biases are found at word-level, there still exist worrying levels of social biases at sense-level, which are often ignored by the word-level bias evaluation measures.

pdf bib
Debiasing Isn’t Enough! – on the Effectiveness of Debiasing MLMs and Their Social Biases in Downstream Tasks
Masahiro Kaneko | Danushka Bollegala | Naoaki Okazaki
Proceedings of the 29th International Conference on Computational Linguistics

We study the relationship between task-agnostic intrinsic and task-specific extrinsic social bias evaluation measures for MLMs, and find that there exists only a weak correlation between these two types of evaluation measures. Moreover, we find that MLMs debiased using different methods still re-learn social biases during fine-tuning on downstream tasks. We identify the social biases in both training instances as well as their assigned labels as reasons for the discrepancy between intrinsic and extrinsic bias evaluation measurements. Overall, our findings highlight the limitations of existing MLM bias evaluation measures and raise concerns on the deployment of MLMs in downstream applications using those measures.

pdf bib
Gender Bias in Masked Language Models for Multiple Languages
Masahiro Kaneko | Aizhan Imankulova | Danushka Bollegala | Naoaki Okazaki
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Masked Language Models (MLMs) pre-trained by predicting masked tokens on large corpora have been used successfully in natural language processing tasks for a variety of languages. Unfortunately, it was reported that MLMs also learn discriminative biases regarding attributes such as gender and race. Because most studies have focused on MLMs in English, the bias of MLMs in other languages has rarely been investigated. Manual annotation of evaluation data for languages other than English has been challenging due to the cost and difficulty in recruiting annotators. Moreover, the existing bias evaluation methods require the stereotypical sentence pairs consisting of the same context with attribute words (e.g. He/She is a nurse).We propose Multilingual Bias Evaluation (MBE) score, to evaluate bias in various languages using only English attribute word lists and parallel corpora between the target language and English without requiring manually annotated data. We evaluated MLMs in eight languages using the MBE and confirmed that gender-related biases are encoded in MLMs for all those languages. We manually created datasets for gender bias in Japanese and Russian to evaluate the validity of the MBE.The results show that the bias scores reported by the MBE significantly correlates with that computed from the above manually created datasets and the existing English datasets for gender bias.

pdf bib
Learning to Borrow– Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion
Huda Hakami | Mona Hakami | Angrosh Mandya | Danushka Bollegala
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Prior work on integrating text corpora with knowledge graphs (KGs) to improve Knowledge Graph Embedding (KGE) have obtained good performance for entities that co-occur in sentences in text corpora. Such sentences (textual mentions of entity-pairs) are represented as Lexicalised Dependency Paths (LDPs) between two entities. However, it is not possible to represent relations between entities that do not co-occur in a single sentence using LDPs. In this paper, we propose and evaluate several methods to address this problem, where we borrow LDPs from the entity pairs that co-occur in sentences in the corpus (i.e. with mentions entity pairs) to represent entity pairs that do not co-occur in any sentence in the corpus (i.e. without mention entity pairs). We propose a supervised borrowing method, SuperBorrow, that learns to score the suitability of an LDP to represent a without-mentions entity pair using pre-trained entity embeddings and contextualised LDP representations. Experimental results show that SuperBorrow improves the link prediction performance of multiple widely-used prior KGE methods such as TransE, DistMult, ComplEx and RotatE.

2021

pdf bib
Dictionary-based Debiasing of Pre-trained Word Embeddings
Masahiro Kaneko | Danushka Bollegala
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Word embeddings trained on large corpora have shown to encode high levels of unfair discriminatory gender, racial, religious and ethnic biases. In contrast, human-written dictionaries describe the meanings of words in a concise, objective and an unbiased manner. We propose a method for debiasing pre-trained word embeddings using dictionaries, without requiring access to the original training resources or any knowledge regarding the word embedding algorithms used. Unlike prior work, our proposed method does not require the types of biases to be pre-defined in the form of word lists, and learns the constraints that must be satisfied by unbiased word embeddings automatically from dictionary definitions of the words. Specifically, we learn an encoder to generate a debiased version of an input word embedding such that it (a) retains the semantics of the pre-trained word embedding, (b) agrees with the unbiased definition of the word according to the dictionary, and (c) remains orthogonal to the vector space spanned by any biased basis vectors in the pre-trained word embedding space. Experimental results on standard benchmark datasets show that the proposed method can accurately remove unfair biases encoded in pre-trained word embeddings, while preserving useful semantics.

pdf bib
Debiasing Pre-trained Contextualised Embeddings
Masahiro Kaneko | Danushka Bollegala
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In comparison to the numerous debiasing methods proposed for the static non-contextualised word embeddings, the discriminative biases in contextualised embeddings have received relatively little attention. We propose a fine-tuning method that can be applied at token- or sentence-levels to debias pre-trained contextualised embeddings. Our proposed method can be applied to any pre-trained contextualised embedding model, without requiring to retrain those models. Using gender bias as an illustrative example, we then conduct a systematic study using several state-of-the-art (SoTA) contextualised representations on multiple benchmark datasets to evaluate the level of biases encoded in different contextualised embeddings before and after debiasing using the proposed method. We find that applying token-level debiasing for all tokens and across all layers of a contextualised embedding model produces the best performance. Interestingly, we observe that there is a trade-off between creating an accurate vs. unbiased contextualised embedding model, and different contextualised embedding models respond differently to this trade-off.

pdf bib
RelWalk - A Latent Variable Model Approach to Knowledge Graph Embedding
Danushka Bollegala | Huda Hakami | Yuichi Yoshida | Ken-ichi Kawarabayashi
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Embedding entities and relations of a knowledge graph in a low-dimensional space has shown impressive performance in predicting missing links between entities. Although progresses have been achieved, existing methods are heuristically motivated and theoretical understanding of such embeddings is comparatively underdeveloped. This paper extends the random walk model of word embeddings to Knowledge Graph Embeddings (KGEs) to derive a scoring function that evaluates the strength of a relation R between two entities h (head) and t (tail). Moreover, we show that marginal loss minimisation, a popular objective used in much prior work in KGE, follows naturally from the log-likelihood ratio maximisation under the probabilities estimated from the KGEs according to our theoretical relationship. We propose a learning objective motivated by the theoretical analysis to learn KGEs from a given knowledge graph. Using the derived objective, accurate KGEs are learnt from FB15K237 and WN18RR benchmark datasets, providing empirical evidence in support of the theory.

pdf bib
I Wish I Would Have Loved This One, But I Didn’t – A Multilingual Dataset for Counterfactual Detection in Product Review
James O’Neill | Polina Rozenshtein | Ryuichi Kiryo | Motoko Kubota | Danushka Bollegala
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Counterfactual statements describe events that did not or cannot take place. We consider the problem of counterfactual detection (CFD) in product reviews. For this purpose, we annotate a multilingual CFD dataset from Amazon product reviews covering counterfactual statements written in English, German, and Japanese languages. The dataset is unique as it contains counterfactuals in multiple languages, covers a new application area of e-commerce reviews, and provides high quality professional annotations. We train CFD models using different text representation methods and classifiers. We find that these models are robust against the selectional biases introduced due to cue phrase-based sentence selection. Moreover, our CFD dataset is compatible with prior datasets and can be merged to learn accurate CFD models. Applying machine translation on English counterfactual examples to create multilingual data performs poorly, demonstrating the language-specificity of this problem, which has been ignored so far.

pdf bib
Detect and Classify – Joint Span Detection and Classification for Health Outcomes
Micheal Abaho | Danushka Bollegala | Paula Williamson | Susanna Dodd
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

A health outcome is a measurement or an observation used to capture and assess the effect of a treatment. Automatic detection of health outcomes from text would undoubtedly speed up access to evidence necessary in healthcare decision making. Prior work on outcome detection has modelled this task as either (a) a sequence labelling task, where the goal is to detect which text spans describe health outcomes, or (b) a classification task, where the goal is to classify a text into a predefined set of categories depending on an outcome that is mentioned somewhere in that text. However, this decoupling of span detection and classification is problematic from a modelling perspective and ignores global structural correspondences between sentence-level and word-level information present in a given text. To address this, we propose a method that uses both word-level and sentence-level information to simultaneously perform outcome span detection and outcome type classification. In addition to injecting contextual information to hidden vectors, we use label attention to appropriately weight both word and sentence level information. Experimental results on several benchmark datasets for health outcome detection show that our proposed method consistently outperforms decoupled methods, reporting competitive results.

pdf bib
Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance
Masaru Isonuma | Junichiro Mori | Danushka Bollegala | Ichiro Sakata
Transactions of the Association for Computational Linguistics, Volume 9

This paper presents a novel unsupervised abstractive summarization method for opinionated texts. While the basic variational autoencoder-based models assume a unimodal Gaussian prior for the latent code of sentences, we alternate it with a recursive Gaussian mixture, where each mixture component corresponds to the latent code of a topic sentence and is mixed by a tree-structured topic distribution. By decoding each Gaussian component, we generate sentences with tree-structured topic guidance, where the root sentence conveys generic content, and the leaf sentences describe specific topics. Experimental results demonstrate that the generated topic sentences are appropriate as a summary of opinionated texts, which are more informative and cover more input contents than those generated by the recent unsupervised summarization model (Bražinskas et al., 2020). Furthermore, we demonstrate that the variance of latent Gaussians represents the granularity of sentences, analogous to Gaussian word embedding (Vilnis and McCallum, 2015).

2020

pdf bib
Multi-Source Attention for Unsupervised Domain Adaptation
Xia Cui | Danushka Bollegala
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

We model source-selection in multi-source Unsupervised Domain Adaptation (UDA) as an attention-learning problem, where we learn attention over the sources per given target instance. We first independently learn source-specific classification models, and a relatedness map between sources and target domains using pseudo-labelled target domain instances. Next, we learn domain-attention scores over the sources for aggregating the predictions of the source-specific models. Experimental results on two cross-domain sentiment classification datasets show that the proposed method reports consistently good performance across domains, and at times outperforming more complex prior proposals. Moreover, the computed domain-attention scores enable us to find explanations for the predictions made by the proposed method.

pdf bib
Do not let the history haunt you: Mitigating Compounding Errors in Conversational Question Answering
Angrosh Mandya | James O’ Neill | Danushka Bollegala | Frans Coenen
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Conversational Question Answering (CoQA) task involves answering a sequence of inter-related conversational questions about a contextual paragraph. Although existing approaches employ human-written ground-truth answers for answering conversational questions at test time, in a realistic scenario, the CoQA model will not have any access to ground-truth answers for the previous questions, compelling the model to rely upon its own previously predicted answers for answering the subsequent questions. In this paper, we find that compounding errors occur when using previously predicted answers at test time, significantly lowering the performance of CoQA systems. To solve this problem, we propose a sampling strategy that dynamically selects between target answers and model predictions during training, thereby closely simulating the situation at test time. Further, we analyse the severity of this phenomena as a function of the question type, conversation length and domain type.

pdf bib
Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction
Danushka Bollegala | Ryuichi Kiryo | Kosuke Tsujino | Haruki Yukawa
Proceedings of the Twelfth Language Resources and Evaluation Conference

Language-independent tokenisation (LIT) methods that do not require labelled language resources or lexicons have recently gained popularity because of their applicability in resource-poor languages. Moreover, they compactly represent a language using a fixed size vocabulary and can efficiently handle unseen or rare words. On the other hand, language-specific tokenisation (LST) methods have a long and established history, and are developed using carefully created lexicons and training resources. Unlike subtokens produced by LIT methods, LST methods produce valid morphological subwords. Despite the contrasting trade-offs between LIT vs. LST methods, their performance on downstream NLP tasks remain unclear. In this paper, we empirically compare the two approaches using semantic similarity measurement as an evaluation task across a diverse set of languages. Our experimental results covering eight languages show that LST consistently outperforms LIT when the vocabulary size is large, but LIT can produce comparable or better results than LST in many languages with comparatively smaller (i.e. less than 100K words) vocabulary sizes, encouraging the use of LIT when language-specific resources are unavailable, incomplete or a smaller model is required. Moreover, we find that smoothed inverse frequency (SIF) to be an accurate method to create word embeddings from subword embeddings for multilingual semantic similarity prediction tasks. Further analysis of the nearest neighbours of tokens show that semantically and syntactically related tokens are closely embedded in subword embedding spaces.

pdf bib
Tree-Structured Neural Topic Model
Masaru Isonuma | Junichiro Mori | Danushka Bollegala | Ichiro Sakata
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper presents a tree-structured neural topic model, which has a topic distribution over a tree with an infinite number of branches. Our model parameterizes an unbounded ancestral and fraternal topic distribution by applying doubly-recurrent neural networks. With the help of autoencoding variational Bayes, our model improves data scalability and achieves competitive performance when inducing latent topics and tree structures, as compared to a prior tree-structured topic model (Blei et al., 2010). This work extends the tree-structured topic model such that it can be incorporated with neural models for downstream tasks.

pdf bib
Autoencoding Improves Pre-trained Word Embeddings
Masahiro Kaneko | Danushka Bollegala
Proceedings of the 28th International Conference on Computational Linguistics

Prior works investigating the geometry of pre-trained word embeddings have shown that word embeddings to be distributed in a narrow cone and by centering and projecting using principal component vectors one can increase the accuracy of a given set of pre-trained word embeddings. However, theoretically, this post-processing step is equivalent to applying a linear autoencoder to minimize the squared L2 reconstruction error. This result contradicts prior work (Mu and Viswanath, 2018) that proposed to remove the top principal components from pre-trained embeddings. We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings, without requiring access to additional linguistic resources or labeled data.

pdf bib
Graph Convolution over Multiple Dependency Sub-graphs for Relation Extraction
Angrosh Mandya | Danushka Bollegala | Frans Coenen
Proceedings of the 28th International Conference on Computational Linguistics

We propose a contextualised graph convolution network over multiple dependency-based sub-graphs for relation extraction. A novel method to construct multiple sub-graphs using words in shortest dependency path and words linked to entities in the dependency parse is proposed. Graph convolution operation is performed over the resulting multiple sub-graphs to obtain more informative features useful for relation extraction. Our experimental results show that the proposed method achieves superior performance over the existing GCN-based models achieving state-of-the-art performance on cross-sentence n-ary relation extraction dataset and SemEval 2010 Task 8 sentence-level relation extraction dataset. Our model also achieves a comparable performance to the SoTA on the TACRED dataset.

2019

pdf bib
Self-Adaptation for Unsupervised Domain Adaptation
Xia Cui | Danushka Bollegala
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Lack of labelled data in the target domain for training is a common problem in domain adaptation. To overcome this problem, we propose a novel unsupervised domain adaptation method that combines projection and self-training based approaches. Using the labelled data from the source domain, we first learn a projection that maximises the distance among the nearest neighbours with opposite labels in the source domain. Next, we project the source domain labelled data using the learnt projection and train a classifier for the target class prediction. We then use the trained classifier to predict pseudo labels for the target domain unlabelled data. Finally, we learn a projection for the target domain as we did for the source domain using the pseudo-labelled target domain data, where we maximise the distance between nearest neighbours having opposite pseudo labels. Experiments on a standard benchmark dataset for domain adaptation show that the proposed method consistently outperforms numerous baselines and returns competitive results comparable to that of SOTA including self-training, tri-training, and neural adaptations.

pdf bib
Gender-preserving Debiasing for Pre-trained Word Embeddings
Masahiro Kaneko | Danushka Bollegala
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Word embeddings learnt from massive text collections have demonstrated significant levels of discriminative biases such as gender, racial or ethnic biases, which in turn bias the down-stream NLP applications that use those word embeddings. Taking gender-bias as a working example, we propose a debiasing method that preserves non-discriminative gender-related information, while removing stereotypical discriminative gender biases from pre-trained word embeddings. Specifically, we consider four types of information: feminine, masculine, gender-neutral and stereotypical, which represent the relationship between gender vs. bias, and propose a debiasing method that (a) preserves the gender-related information in feminine and masculine words, (b) preserves the neutrality in gender-neutral words, and (c) removes the biases from stereotypical words. Experimental results on several previously proposed benchmark datasets show that our proposed method can debias pre-trained word embeddings better than existing SoTA methods proposed for debiasing word embeddings while preserving gender-related but non-discriminative information.

2018

pdf bib
An Empirical Study on Fine-Grained Named Entity Recognition
Khai Mai | Thai-Hoang Pham | Minh Trung Nguyen | Tuan Duc Nguyen | Danushka Bollegala | Ryohei Sasano | Satoshi Sekine
Proceedings of the 27th International Conference on Computational Linguistics

Named entity recognition (NER) has attracted a substantial amount of research. Recently, several neural network-based models have been proposed and achieved high performance. However, there is little research on fine-grained NER (FG-NER), in which hundreds of named entity categories must be recognized, especially for non-English languages. It is still an open question whether there is a model that is robust across various settings or the proper model varies depending on the language, the number of named entity categories, and the size of training datasets. This paper first presents an empirical comparison of FG-NER models for English and Japanese and demonstrates that LSTM+CNN+CRF (Ma and Hovy, 2016), one of the state-of-the-art methods for English NER, also works well for English FG-NER but does not work well for Japanese, a language that has a large number of character types. To tackle this problem, we propose a method to improve the neural network-based Japanese FG-NER performance by removing the CNN layer and utilizing dictionary and category embeddings. Experiment results show that the proposed method improves Japanese FG-NER F-score from 66.76% to 75.18%.

pdf bib
Learning Word Meta-Embeddings by Autoencoding
Danushka Bollegala | Cong Bao
Proceedings of the 27th International Conference on Computational Linguistics

Distributed word embeddings have shown superior performances in numerous Natural Language Processing (NLP) tasks. However, their performances vary significantly across different tasks, implying that the word embeddings learnt by those methods capture complementary aspects of lexical semantics. Therefore, we believe that it is important to combine the existing word embeddings to produce more accurate and complete meta-embeddings of words. We model the meta-embedding learning problem as an autoencoding problem, where we would like to learn a meta-embedding space that can accurately reconstruct all source embeddings simultaneously. Thereby, the meta-embedding space is enforced to capture complementary information in different source embeddings via a coherent common embedding space. We propose three flavours of autoencoded meta-embeddings motivated by different requirements that must be satisfied by a meta-embedding. Our experimental results on a series of benchmark evaluations show that the proposed autoencoded meta-embeddings outperform the existing state-of-the-art meta-embeddings in multiple tasks.

pdf bib
Why does PairDiff work? - A Mathematical Analysis of Bilinear Relational Compositional Operators for Analogy Detection
Huda Hakami | Kohei Hayashi | Danushka Bollegala
Proceedings of the 27th International Conference on Computational Linguistics

Representing the semantic relations that exist between two given words (or entities) is an important first step in a wide-range of NLP applications such as analogical reasoning, knowledge base completion and relational information retrieval. A simple, yet surprisingly accurate method for representing a relation between two words is to compute the vector offset (PairDiff) between their corresponding word embeddings. Despite the empirical success, it remains unclear as to whether PairDiff is the best operator for obtaining a relational representation from word embeddings. We conduct a theoretical analysis of generalised bilinear operators that can be used to measure the l2 relational distance between two word-pairs. We show that, if the word embed- dings are standardised and uncorrelated, such an operator will be independent of bilinear terms, and can be simplified to a linear form, where PairDiff is a special case. For numerous word embedding types, we empirically verify the uncorrelation assumption, demonstrating the general applicability of our theoretical result. Moreover, we experimentally discover PairDiff from the bilinear relational compositional operator on several benchmark analogy datasets.

pdf bib
Joint Learning of Sense and Word Embeddings
Mohammed Alsuhaibani | Danushka Bollegala
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Sentiment-Stance-Specificity (SSS) Dataset: Identifying Support-based Entailment among Opinions.
Pavithra Rajendran | Danushka Bollegala | Simon Parsons
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Dataset for Inter-Sentence Relation Extraction using Distant Supervision
Angrosh Mandya | Danushka Bollegala | Frans Coenen | Katie Atkinson
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Learning Neural Word Salience Scores
Krasen Samardzhiev | Andrew Gargett | Danushka Bollegala
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Measuring the salience of a word is an essential step in numerous NLP tasks. Heuristic approaches such as tfidf have been used so far to estimate the salience of words. We propose Neural Word Salience (NWS) scores, unlike heuristics, are learnt from a corpus. Specifically, we learn word salience scores such that, using pre-trained word embeddings as the input, can accurately predict the words that appear in a sentence, given the words that appear in the sentences preceding or succeeding that sentence. Experimental results on sentence similarity prediction show that the learnt word salience scores perform comparably or better than some of the state-of-the-art approaches for representing sentences on benchmark datasets for sentence similarity, while using only a fraction of the training and prediction times required by prior methods. Moreover, our NWS scores positively correlate with psycholinguistic measures such as concreteness, and imageability implying a close connection to the salience as perceived by humans.

pdf bib
Solving Feature Sparseness in Text Classification using Core-Periphery Decomposition
Xia Cui | Sadamori Kojaku | Naoki Masuda | Danushka Bollegala
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Feature sparseness is a problem common to cross-domain and short-text classification tasks. To overcome this feature sparseness problem, we propose a novel method based on graph decomposition to find candidate features for expanding feature vectors. Specifically, we first create a feature-relatedness graph, which is subsequently decomposed into core-periphery (CP) pairs and use the peripheries as the expansion candidates of the cores. We expand both training and test instances using the computed related features and use them to train a text classifier. We observe that prioritising features that are common to both training and test instances as cores during the CP decomposition to further improve the accuracy of text classification. We evaluate the proposed CP-decomposition-based feature expansion method on benchmark datasets for cross-domain sentiment classification and short-text classification. Our experimental results show that the proposed method consistently outperforms all baselines on short-text classification tasks, and perform competitively with pivot-based cross-domain sentiment classification methods.

pdf bib
Is Something Better than Nothing? Automatically Predicting Stance-based Arguments Using Deep Learning and Small Labelled Dataset
Pavithra Rajendran | Danushka Bollegala | Simon Parsons
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Online reviews have become a popular portal among customers making decisions about purchasing products. A number of corpora of reviews have been widely investigated in NLP in general, and, in particular, in argument mining. This is a subset of NLP that deals with extracting arguments and the relations among them from user-based content. A major problem faced by argument mining research is the lack of human-annotated data. In this paper, we investigate the use of weakly supervised and semi-supervised methods for automatically annotating data, and thus providing large annotated datasets. We do this by building on previous work that explores the classification of opinions present in reviews based whether the stance is expressed explicitly or implicitly. In the work described here, we automatically annotate stance as implicit or explicit and our results show that the datasets we generate, although noisy, can be used to learn better models for implicit/explicit opinion classification.

pdf bib
Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings
Joshua Coates | Danushka Bollegala
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Creating accurate meta-embeddings from pre-trained source embeddings has received attention lately. Methods based on global and locally-linear transformation and concatenation have shown to produce accurate meta-embeddings. In this paper, we show that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is comparable or better than more complex meta-embedding learning methods. The result seems counter-intuitive given that vector spaces in different source embeddings are not comparable and cannot be simply averaged. We give insight into why averaging can still produce accurate meta-embedding despite the incomparability of the source vector spaces.

2016

pdf bib
Contextual stance classification of opinions: A step towards enthymeme reconstruction in online reviews
Pavithra Rajendran | Danushka Bollegala | Simon Parsons
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

2015

pdf bib
Unsupervised Cross-Domain Word Representation Learning
Danushka Bollegala | Takanori Maehara | Ken-ichi Kawarabayashi
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Learning to Predict Distributions of Words Across Domains
Danushka Bollegala | David Weir | John Carroll
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification
Danushka Bollegala | David Weir | John Carroll
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Towards Semi-Supervised Classification of Discourse Relations using Feature Correlations
Hugo Hernault | Danushka Bollegala | Mitsuru Ishizuka
Proceedings of the SIGDIAL 2010 Conference

pdf bib
A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension
Hugo Hernault | Danushka Bollegala | Mitsuru Ishizuka
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
A Relational Model of Semantic Similarity between Words using Automatically Extracted Lexical Pattern Clusters from the Web
Danushka Bollegala | Yutaka Matsuo | Mitsuru Ishizuka
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
A Co-occurrence Graph-based Approach for Personal Name Alias Extraction from Anchor Texts
Danushka Bollegala | Yutaka Matsuo | Mitsuru Ishizuka
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

2007

pdf bib
An Integrated Approach to Measuring Semantic Similarity between Words Using Information Available on the Web
Danushka Bollegala | Yutaka Matsuo | Mitsuru Ishizuka
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

2006

pdf bib
A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization
Danushka Bollegala | Naoaki Okazaki | Mitsuru Ishizuka
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Extracting Key Phrases to Disambiguate Personal Name Queries in Web Search
Danushka Bollegala | Yutaka Matsuo | Mitsuru Ishizuka
Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?

2005

pdf bib
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation
Danushka Bollegala | Naoaki Okazaki | Mitsuru Ishizuka
Second International Joint Conference on Natural Language Processing: Full Papers