Paheli Bhattacharya


2022

pdf bib
Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation
Abhay Shukla | Paheli Bhattacharya | Soham Poddar | Rajdeep Mukherjee | Kripabandhu Ghosh | Pawan Goyal | Saptarshi Ghosh
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Summarization of legal case judgement documents is a challenging problem in Legal NLP. However, not much analyses exist on how different families of summarization models (e.g., extractive vs. abstractive) perform when applied to legal case documents. This question is particularly important since many recent transformer-based abstractive summarization models have restrictions on the number of input tokens, and legal documents are known to be very long. Also, it is an open question on how best to evaluate legal case document summarization systems. In this paper, we carry out extensive experiments with several extractive and abstractive summarization methods (both supervised and unsupervised) over three legal summarization datasets that we have developed. Our analyses, that includes evaluation by law practitioners, lead to several interesting insights on legal summarization in specific and long document summarization in general.

2016

pdf bib
Query Translation for Cross-Language Information Retrieval using Multilingual Word Clusters
Paheli Bhattacharya | Pawan Goyal | Sudeshna Sarkar
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)

In Cross-Language Information Retrieval, finding the appropriate translation of the source language query has always been a difficult problem to solve. We propose a technique towards solving this problem with the help of multilingual word clusters obtained from multilingual word embeddings. We use word embeddings of the languages projected to a common vector space on which a community-detection algorithm is applied to find clusters such that words that represent the same concept from different languages fall in the same group. We utilize these multilingual word clusters to perform query translation for Cross-Language Information Retrieval for three languages - English, Hindi and Bengali. We have experimented with the FIRE 2012 and Wikipedia datasets and have shown improvements over several standard methods like dictionary-based method, a transliteration-based model and Google Translate.