Jiaxin Pei


2023

pdf bib
When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset
Jiaxin Pei | David Jurgens
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

Annotators are not fungible. Their demographics, life experiences, and backgrounds all contribute to how they label data. However, NLP has only recently considered how annotator identity might influence their decisions. Here, we present POPQUORN (the Potato-Prolific dataset for Question-Answering, Offensiveness, text Rewriting and politeness rating with demographic Nuance). POPQUORN contains 45,000 annotations from 1,484 annotators, drawn from a representative sample regarding sex, age, and race as the US population. Through a series of analyses, we show that annotators’ background plays a significant role in their judgments. Further, our work shows that backgrounds not previously considered in NLP (e.g., education), are meaningful and should be considered. Our study suggests that understanding the background of annotators and collecting labels from a demographically balanced pool of crowd workers is important to reduce the bias of datasets. The dataset, annotator background, and annotation interface are available at https://github.com/Jiaxin-Pei/potato-prolific-dataset.

pdf bib
SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research
Dimosthenis Antypas | Asahi Ushio | Francesco Barbieri | Leonardo Neves | Kiamehr Rezaee | Luis Espinosa-Anke | Jiaxin Pei | Jose Camacho-Collados
Findings of the Association for Computational Linguistics: EMNLP 2023

Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks. This fragmented landscape makes it hard for the community to know, for instance, given a task, which is the best performing model and how it compares with others. To alleviate this issue, we introduce a unified benchmark for NLP evaluation in social media, SuperTweetEval, which includes a heterogeneous set of tasks and datasets combined, adapted and constructed from scratch. We benchmarked the performance of a wide range of models on SuperTweetEval and our results suggest that, despite the recent advances in language modelling, social media remains challenging.

pdf bib
SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis
Jiaxin Pei | Vítor Silva | Maarten Bos | Yozen Liu | Leonardo Neves | David Jurgens | Francesco Barbieri
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Intimacy is an important social aspect of language. Computational modeling of intimacy in language could help many downstream applications like dialogue systems and offensiveness detection. Despite its importance, resources and approaches on modeling textual intimacy remain rare. To address this gap, we introduce MINT, a new Multilingual intimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic along with SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis. Our task attracted 45 participants from around the world. While the participants are able to achieve overall good performance on languages in the training set, zero-shot prediction of intimacy in unseen languages remains challenging. Here we provide an overview of the task, summaries of the common approaches, and potential future directions on modeling intimacy across languages. All the relevant resources are available at https: //sites.google.com/umich.edu/ semeval-2023-tweet-intimacy.

pdf bib
Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics
Aparna Ananthasubramaniam | Hong Chen | Jason Yan | Kenan Alkiek | Jiaxin Pei | Agrima Seth | Lavinia Dunagan | Minje Choi | Benjamin Litterer | David Jurgens
Proceedings of the First Workshop on Social Influence in Conversations (SICon 2023)

Linguistic style matching (LSM) in conversations can be reflective of several aspects of social influence such as power or persuasion. However, how LSM relates to the outcomes of online communication on platforms such as Reddit is an unknown question. In this study, we analyze a large corpus of two-party conversation threads in Reddit where we identify all occurrences of LSM using two types of style: the use of function words and formality. Using this framework, we examine how levels of LSM differ in conversations depending on several social factors within Reddit: post and subreddit features, conversation depth, user tenure, and the controversiality of a comment. Finally, we measure the change of LSM following loss of status after community banning. Our findings reveal the interplay of LSM in Reddit conversations with several community metrics, suggesting the importance of understanding conversation engagement when understanding community dynamics.

pdf bib
Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
Minje Choi | Jiaxin Pei | Sagar Kumar | Chang Shu | David Jurgens
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand social language. Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor & sarcasm, offensiveness, sentiment & emotion, and trustworthiness. In tests on the benchmark, we demonstrate that current models attain only moderate performance but reveal significant potential for task transfer among different types and categories of tasks, which were predicted from theory. Through zero-shot evaluations, we show that pretrained models already possess some innate but limited capabilities of social language understanding and training on one category of tasks can improve zero-shot testing on others. Our benchmark provides a systematic way to analyze model performance on an important dimension of language and points to clear room for improvement to build more socially-aware LLMs. The resources are released at https://github.com/minjechoi/SOCKET.

2022

pdf bib
Modeling Information Change in Science Communication with Semantically Matched Paraphrases
Dustin Wright | Jiaxin Pei | David Jurgens | Isabelle Augenstein
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Whether the media faithfully communicate scientific information has long been a core issue to the science community. Automatically identifying paraphrased scientific findings could enable large-scale tracking and analysis of information changes in the science communication process, but this requires systems to understand the similarity between scientific information across multiple domains. To this end, we present the SCIENTIFIC PARAPHRASE AND INFORMATION CHANGE DATASET (SPICED), the first paraphrase dataset of scientific findings annotated for degree of information change. SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers. We demonstrate that SPICED poses a challenging task and that models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims. Finally, we show that models trained on SPICED can reveal large-scale trends in the degrees to which people and organizations faithfully communicate new scientific findings. Data, code, and pre-trained models are available at http://www.copenlu.com/publication/2022_emnlp_wright/.

pdf bib
POTATO: The Portable Text Annotation Tool
Jiaxin Pei | Aparna Ananthasubramaniam | Xingyao Wang | Naitian Zhou | Apostolos Dedeloudis | Jackson Sargent | David Jurgens
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present POTATO, the Portable text annotation tool, a free, fully open-sourced annotation system that 1) supports labeling many types of text and multimodal data; 2) offers easy-to-configure features to maximize the productivity of both deployers and annotators (convenient templates for common ML/NLP tasks, active learning, keypress shortcuts, keyword highlights, tooltips); and 3) supports a high degree of customization (editable UI, inserting pre-screening questions, attention and qualification tests). Experiments over two annotation tasks suggest that POTATO improves labeling speed through its specially-designed productivity features, especially for long documents and complex tasks. POTATO is available at https://github.com/davidjurgens/potato and will continue to be updated.

2021

pdf bib
Measuring Sentence-Level and Aspect-Level (Un)certainty in Science Communications
Jiaxin Pei | David Jurgens
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Certainty and uncertainty are fundamental to science communication. Hedges have widely been used as proxies for uncertainty. However, certainty is a complex construct, with authors expressing not only the degree but the type and aspects of uncertainty in order to give the reader a certain impression of what is known. Here, we introduce a new study of certainty that models both the level and the aspects of certainty in scientific findings. Using a new dataset of 2167 annotated scientific findings, we demonstrate that hedges alone account for only a partial explanation of certainty. We show that both the overall certainty and individual aspects can be predicted with pre-trained language models, providing a more complete picture of the author’s intended communication. Downstream analyses on 431K scientific findings from news and scientific abstracts demonstrate that modeling sentence-level and aspect-level certainty is meaningful for areas like science communication. Both the model and datasets used in this paper are released at https://blablablab.si.umich.edu/projects/certainty/.

2020

pdf bib
Pre-train and Plug-in: Flexible Conditional Text Generation with Variational Auto-Encoders
Yu Duan | Canwen Xu | Jiaxin Pei | Jialong Han | Chenliang Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Conditional Text Generation has drawn much attention as a topic of Natural Language Generation (NLG) which provides the possibility for humans to control the properties of generated contents. Current conditional generation models cannot handle emerging conditions due to their joint end-to-end learning fashion. When a new condition added, these techniques require full retraining. In this paper, we present a new framework named Pre-train and Plug-in Variational Auto-Encoder (PPVAE) towards flexible conditional text generation. PPVAE decouples the text generation module from the condition representation module to allow “one-to-many” conditional generation. When a fresh condition emerges, only a lightweight network needs to be trained and works as a plug-in for PPVAE, which is efficient and desirable for real-world applications. Extensive experiments demonstrate the superiority of PPVAE against the existing alternatives with better conditionality and diversity but less training effort.

pdf bib
MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization
Canwen Xu | Jiaxin Pei | Hongtao Wu | Yiyu Liu | Chenliang Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MATINF contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by MATINF.

pdf bib
Quantifying Intimacy in Language
Jiaxin Pei | David Jurgens
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Intimacy is a fundamental aspect of how we relate to others in social settings. Language encodes the social information of intimacy through both topics and other more subtle cues (such as linguistic hedging and swearing). Here, we introduce a new computational framework for studying expressions of the intimacy in language with an accompanying dataset and deep learning model for accurately predicting the intimacy level of questions (Pearson r = 0.87). Through analyzing a dataset of 80.5M questions across social media, books, and films, we show that individuals employ interpersonal pragmatic moves in their language to align their intimacy with social settings. Then, in three studies, we further demonstrate how individuals modulate their intimacy to match social norms around gender, social distance, and audience, each validating key findings from studies in social psychology. Our work demonstrates that intimacy is a pervasive and impactful social dimension of language.

2018

pdf bib
S2SPMN: A Simple and Effective Framework for Response Generation with Relevant Information
Jiaxin Pei | Chenliang Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

How to generate relevant and informative responses is one of the core topics in response generation area. Following the task formulation of machine translation, previous works mainly consider response generation task as a mapping from a source sentence to a target sentence. To realize this mapping, existing works tend to design intuitive but complex models. However, the relevant information existed in large dialogue corpus is mainly overlooked. In this paper, we propose Sequence to Sequence with Prototype Memory Network (S2SPMN) to exploit the relevant information provided by the large dialogue corpus to enhance response generation. Specifically, we devise two simple approaches in S2SPMN to select the relevant information (named prototypes) from the dialogue corpus. These prototypes are then saved into prototype memory network (PMN). Furthermore, a hierarchical attention mechanism is devised to extract the semantic information from the PMN to assist the response generation process. Empirical studies reveal the advantage of our model over several classical and strong baselines.