Novelty Goes Deep. A Deep Neural Solution To Document Level Novelty Detection

Tirthankar Ghosal, Vignesh Edithal, Asif Ekbal, Pushpak Bhattacharyya, George Tsatsaronis, Srinivasa Satya Sameer Kumar Chivukula


Abstract
The rapid growth of documents across the web has necessitated finding means of discarding redundant documents and retaining novel ones. Capturing redundancy is challenging as it may involve investigating at a deep semantic level. Techniques for detecting such semantic redundancy at the document level are scarce. In this work we propose a deep Convolutional Neural Networks (CNN) based model to classify a document as novel or redundant with respect to a set of relevant documents already seen by the system. The system is simple and do not require any manual feature engineering. Our novel scheme encodes relevant and relative information from both source and target texts to generate an intermediate representation which we coin as the Relative Document Vector (RDV). The proposed method outperforms the existing state-of-the-art on a document-level novelty detection dataset by a margin of ∼5% in terms of accuracy. We further demonstrate the effectiveness of our approach on a standard paraphrase detection dataset where paraphrased passages closely resemble to semantically redundant documents.
Anthology ID:
C18-1237
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2802–2813
Language:
URL:
https://aclanthology.org/C18-1237
DOI:
Bibkey:
Cite (ACL):
Tirthankar Ghosal, Vignesh Edithal, Asif Ekbal, Pushpak Bhattacharyya, George Tsatsaronis, and Srinivasa Satya Sameer Kumar Chivukula. 2018. Novelty Goes Deep. A Deep Neural Solution To Document Level Novelty Detection. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2802–2813, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Novelty Goes Deep. A Deep Neural Solution To Document Level Novelty Detection (Ghosal et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1237.pdf
Code
 edithal-14/A-Deep-Neural-Solution-To-Document-Level-Novelty-Detection-COLING-2018-
Data
SNLI