Query-based summarization using MDL principle

Marina Litvak, Natalia Vanetik


Abstract
Query-based text summarization is aimed at extracting essential information that answers the query from original text. The answer is presented in a minimal, often predefined, number of words. In this paper we introduce a new unsupervised approach for query-based extractive summarization, based on the minimum description length (MDL) principle that employs Krimp compression algorithm (Vreeken et al., 2011). The key idea of our approach is to select frequent word sets related to a given query that compress document sentences better and therefore describe the document better. A summary is extracted by selecting sentences that best cover query-related frequent word sets. The approach is evaluated based on the DUC 2005 and DUC 2006 datasets which are specifically designed for query-based summarization (DUC, 2005 2006). It competes with the best results.
Anthology ID:
W17-1004
Volume:
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
George Giannakopoulos, Elena Lloret, John M. Conroy, Josef Steinberger, Marina Litvak, Peter Rankel, Benoit Favre
Venue:
MultiLing
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–31
Language:
URL:
https://aclanthology.org/W17-1004
DOI:
10.18653/v1/W17-1004
Bibkey:
Cite (ACL):
Marina Litvak and Natalia Vanetik. 2017. Query-based summarization using MDL principle. In Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, pages 22–31, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Query-based summarization using MDL principle (Litvak & Vanetik, MultiLing 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1004.pdf