Markus Zopf


2019

pdf bib
Learning Analogy-Preserving Sentence Embeddings for Answer Selection
Aïssatou Diallo | Markus Zopf | Johannes Fürnkranz
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Answer selection aims at identifying the correct answer for a given question from a set of potentially correct answers. Contrary to previous works, which typically focus on the semantic similarity between a question and its answer, our hypothesis is that question-answer pairs are often in analogical relation to each other. Using analogical inference as our use case, we propose a framework and a neural network architecture for learning dedicated sentence embeddings that preserve analogical properties in the semantic space. We evaluate the proposed method on benchmark datasets for answer selection and demonstrate that our sentence embeddings indeed capture analogical properties better than conventional embeddings, and that analogy-based question answering outperforms a comparable similarity-based technique.

2018

pdf bib
Auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus
Markus Zopf
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Estimating Summary Quality with Pairwise Preferences
Markus Zopf
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Automatic evaluation systems in the field of automatic summarization have been relying on the availability of gold standard summaries for over ten years. Gold standard summaries are expensive to obtain and often require the availability of domain experts to achieve high quality. In this paper, we propose an alternative evaluation approach based on pairwise preferences of sentences. In comparison to gold standard summaries, they are simpler and cheaper to obtain. In our experiments, we show that humans are able to provide useful feedback in the form of pairwise preferences. The new framework performs better than the three most popular versions of ROUGE with less expensive human input. We also show that our framework can reuse already available evaluation data and achieve even better results.

pdf bib
Which Scores to Predict in Sentence Regression for Text Summarization?
Markus Zopf | Eneldo Loza Mencía | Johannes Fürnkranz
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

The task of automatic text summarization is to generate a short text that summarizes the most important information in a given set of documents. Sentence regression is an emerging branch in automatic text summarizations. Its key idea is to estimate the importance of information via learned utility scores for individual sentences. These scores are then used for selecting sentences from the source documents, typically according to a greedy selection strategy. Recently proposed state-of-the-art models learn to predict ROUGE recall scores of individual sentences, which seems reasonable since the final summaries are evaluated according to ROUGE recall. In this paper, we show in extensive experiments that following this intuition leads to suboptimal results and that learning to predict ROUGE precision scores leads to better results. The crucial difference is to aim not at covering as much information as possible but at wasting as little space as possible in every greedy step.

2016

pdf bib
Beyond Centrality and Structural Features: Learning Information Importance for Text Summarization
Markus Zopf | Eneldo Loza Mencía | Johannes Fürnkranz
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib
Sequential Clustering and Contextual Importance Measures for Incremental Update Summarization
Markus Zopf | Eneldo Loza Mencía | Johannes Fürnkranz
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Unexpected events such as accidents, natural disasters and terrorist attacks represent an information situation where it is crucial to give users access to important and non-redundant information as early as possible. Incremental update summarization (IUS) aims at summarizing events which develop over time. In this paper, we propose a combination of sequential clustering and contextual importance measures to identify important sentences in a stream of documents in a timely manner. Sequential clustering is used to cluster similar sentences. The created clusters are scored by a contextual importance measure which identifies important information as well as redundant information. Experiments on the TREC Temporal Summarization 2015 shared task dataset show that our system achieves superior results compared to the best participating systems.

pdf bib
The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach
Markus Zopf | Maxime Peyrard | Judith Eckle-Kohler
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Research in multi-document summarization has focused on newswire corpora since the early beginnings. However, the newswire genre provides genre-specific features such as sentence position which are easy to exploit in summarization systems. Such easy to exploit genre-specific features are available in other genres as well. We therefore present the new hMDS corpus for multi-document summarization, which contains heterogeneous source documents from multiple text genres, as well as summaries with different lengths. For the construction of the corpus, we developed a novel construction approach which is suited to build large and heterogeneous summarization corpora with little effort. The method reverses the usual process of writing summaries for given source documents: it combines already available summaries with appropriate source documents. In a detailed analysis, we show that our new corpus is significantly different from the homogeneous corpora commonly used, and that it is heterogeneous along several dimensions. Our experimental evaluation using well-known state-of-the-art summarization systems shows that our corpus poses new challenges in the field of multi-document summarization. Last but not least, we make our corpus publicly available to the research community at the corpus web page https://github.com/AIPHES/hMDS.