Sudeep Kumar Sahoo

2020

pdf bib abs
Team Solomon at SemEval-2020 Task 4: Be Reasonable: Exploiting Large-scale Language Models for Commonsense Reasoning
Vertika Srivastava | Sudeep Kumar Sahoo | Yeon Hyang Kim | Rohit R.r | Mayank Raj | Ajay Jaiswal
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present our submission for SemEval 2020 Task 4 - Commonsense Validation and Explanation (ComVE). The objective of this task was to develop a system that can differentiate statements that make sense from the ones that don’t. ComVE comprises of three subtasks to challenge and test a system’s capability in understanding commonsense knowledge from various dimensions. Commonsense reasoning is a challenging task in the domain of natural language understanding and systems augmented with it can improve performance in various other tasks such as reading comprehension, and inferencing. We have developed a system that leverages commonsense knowledge from pretrained language models trained on huge corpus such as RoBERTa, GPT2, etc. Our proposed system validates the reasonability of a given statement against the backdrop of commonsense knowledge acquired by these models and generates a logical reason to support its decision. Our system ranked 2nd in subtask C with a BLEU score of 19.3, which by far is the most challenging subtask as it required systems to generate the rationale behind the choice of an unreasonable statement. In subtask A and B, we achieved 96% and 94% accuracy respectively standing at 4th position in both the subtasks.

This paper describes our system (Solomon) details and results of participation in the SemEval 2020 Task 11 ”Detection of Propaganda Techniques in News Articles”. We participated in Task ”Technique Classification” (TC) which is a multi-class classification task. To address the TC task, we used RoBERTa based transformer architecture for fine-tuning on the propaganda dataset. The predictions of RoBERTa were further fine-tuned by class-dependent-minority-class classifiers. A special classifier, which employs dynamically adapted Least Common Sub-sequence algorithm, is used to adapt to the intricacies of repetition class. Compared to the other participating systems, our submission is ranked 4th on the leaderboard.

2019

pdf bib abs
Vernon-fenwick at SemEval-2019 Task 4: Hyperpartisan News Detection using Lexical and Semantic Features
Vertika Srivastava | Ankita Gupta | Divya Prakash | Sudeep Kumar Sahoo | Rohit R.R | Yeon Hyang Kim
Proceedings of the 13th International Workshop on Semantic Evaluation

In this paper, we present our submission for SemEval-2019 Task 4: Hyperpartisan News Detection. Hyperpartisan news articles are sharply polarized and extremely biased (onesided). It shows blind beliefs, opinions and unreasonable adherence to a party, idea, faction or a person. Through this task, we aim to develop an automated system that can be used to detect hyperpartisan news and serve as a prescreening technique for fake news detection. The proposed system jointly uses a rich set of handcrafted textual and semantic features. Our system achieved 2nd rank on the primary metric (82.0% accuracy) and 1st rank on the secondary metric (82.1% F1-score), among all participating teams. Comparison with the best performing system on the leaderboard shows that our system is behind by only 0.2% absolute difference in accuracy.

pdf bib abs
SolomonLab at SemEval-2019 Task 8: Question Factuality and Answer Veracity Prediction in Community Forums
Ankita Gupta | Sudeep Kumar Sahoo | Divya Prakash | Rohit R.R | Vertika Srivastava | Yeon Hyang Kim
Proceedings of the 13th International Workshop on Semantic Evaluation

We describe our system for SemEval-2019, Task 8 on “Fact-Checking in Community Question Answering Forums (cQA)”. cQA forums are very prevalent nowadays, as they provide an effective means for communities to share knowledge. Unfortunately, this shared information is not always factual and fact-verified. In this task, we aim to identify factual questions posted on cQA and verify the veracity of answers to these questions. Our approach relies on data augmentation and aggregates cues from several dimensions such as semantics, linguistics, syntax, writing style and evidence obtained from trusted external sources. In subtask A, our submission is ranked 3rd, with an accuracy of 83.14%. Our current best solution stands 1st on the leaderboard with 88% accuracy. In subtask B, our present solution is ranked 2nd, with 58.33% MAP score.

Co-authors

Mayank Raj 2

Ajay Jaiswal 2

Venues

semeval4