Rakesh Verma


2024

pdf bib
DetectiveReDASers at HSD-2Lang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages
Fatima Zahra Qachfar | Bryan Tuck | Rakesh Verma
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

This paper addresses hate speech detection in Turkish and Arabic tweets, contributing to the HSD-2Lang Shared Task. We propose a specialized pooling strategy within a soft-voting ensemble framework to improve classification in Turkish and Arabic language models. Our approach also includes expanding the training sets through cross-lingual translation, introducing a broader spectrum of hate speech examples. Our method attains F1-Macro scores of 0.6964 for Turkish (Subtask A) and 0.7123 for Arabic (Subtask B). While achieving these results, we also consider the computational overhead, striking a balance between the effectiveness of our unique pooling strategy, data augmentation, and soft-voting ensemble. This approach advances the practical application of language models in low-resource languages for hate speech detection.

2023

pdf bib
DetectiveRedasers at ArAIEval Shared Task: Leveraging Transformer Ensembles for Arabic Deception Detection
Bryan Tuck | Fatima Zahra Qachfar | Dainis Boumber | Rakesh Verma
Proceedings of ArabicNLP 2023

This paper outlines a methodology aimed at combating disinformation in Arabic social media, a strategy that secured a first-place finish in tasks 2A and 2B at the ArAIEval shared task during the ArabicNLP 2023 conference. Our team, DetectiveRedasers, developed a hyperparameter-optimized pipeline centered around singular BERT-based models for the Arabic language, enhanced by a soft-voting ensemble strategy. Subsequent evaluation on the test dataset reveals that ensembles, although generally resilient, do not always outperform individual models. The primary contributions of this paper are its multifaceted strategy, which led to winning solutions for both binary (2A) and multiclass (2B) disinformation classification tasks.

pdf bib
ReDASPersuasion at ArAIEval Shared Task: Multilingual and Monolingual Models For Arabic Persuasion Detection
Fatima Zahra Qachfar | Rakesh Verma
Proceedings of ArabicNLP 2023

To enhance persuasion detection, we investigate the use of multilingual systems on Arabic data by conducting a total of 22 experiments using baselines, multilingual, and monolingual language transformers. Our aim is to provide a comprehensive evaluation of the various systems employed throughout this task, with the ultimate goal of comparing their performance and identifying the most effective approach. Our empirical analysis shows that *ReDASPersuasion* system performs best when combined with multilingual “XLM-RoBERTa” and monolingual pre-trained transformers on Arabic dialects like “CAMeLBERT-DA SA” depending on the NLP classification task.

pdf bib
ReDASPersuasion at SemEval-2023 Task 3: Persuasion Detection using Multilingual Transformers and Language Agnostic Features
Fatima Zahra Qachfar | Rakesh Verma
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes a multilingual persuasion detection system that incorporates persuasion technique attributes for a multi-label classification task. The proposed method has two advantages. First, it combines persuasion features with a sequence classification transformer to classify persuasion techniques. Second, it is a language agnostic approach that supports a total of 100 languages, guaranteed by the multilingual transformer module and the Google translator interface. We found that our persuasion system outperformed the SemEval baseline in all languages except zero shot prediction languages, which did not constitute the main focus of our research. With the highest F1-Micro score of 0.45, Italian achieved the eighth position on the leaderboard.

2021

pdf bib
Claim Verification Using a Multi-GAN Based Model
Amartya Hatua | Arjun Mukherjee | Rakesh Verma
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This article describes research on claim verification carried out using a multiple GAN-based model. The proposed model consists of three pairs of generators and discriminators. The generator and discriminator pairs are responsible for generating synthetic data for supported and refuted claims and claim labels. A theoretical discussion about the proposed model is provided to validate the equilibrium state of the model. The proposed model is applied to the FEVER dataset, and a pre-trained language model is used for the input text data. The synthetically generated data helps to gain information that improves classification performance over state of the art baselines. The respective F1 scores after applying the proposed method on FEVER 1.0 and FEVER 2.0 datasets are 0.65+-0.018 and 0.65+-0.051.

2017

pdf bib
ICE: Idiom and Collocation Extractor for Research and Education
Vasanthi Vuppuluri | Shahryar Baki | An Nguyen | Rakesh Verma
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

Collocation and idiom extraction are well-known challenges with many potential applications in Natural Language Processing (NLP). Our experimental, open-source software system, called ICE, is a python package for flexibly extracting collocations and idioms, currently in English. It also has a competitive POS tagger that can be used alone or as part of collocation/idiom extraction. ICE is available free of cost for research and educational uses in two user-friendly formats. This paper gives an overview of ICE and its performance, and briefly describes the research underlying the extraction algorithms.

2016

pdf bib
University of Houston at CL-SciSumm 2016: SVMs with tree kernels and Sentence Similarity
Luis Moraes | Shahryar Baki | Rakesh Verma | Daniel Lee
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)

2015

pdf bib
A New Approach for Idiom Identification Using Meanings and the Web
Rakesh Verma | Vasanthi Vuppuluri
Proceedings of the International Conference Recent Advances in Natural Language Processing