Evaluating Summaries Automatically - A system Proposal

Paulo C F de Oliveira, Edson Wilson Torrens, Alexandre Cidral, Sidney Schossland, Evandro Bittencourt


Abstract
We propose in this paper an automatic evaluation procedure based on a metric which could provide summary evaluation without human assistance. Our system includes two metrics, which are presented and discussed. The first metric is based on a known and powerful statistical test, the X2 goodness-of-fit test, and has been used in several applications. The second metric is derived from three common metrics used to evaluate Natural Language Processing (NLP) systems, namely precision, recall and f-measure. The combination of these two metrics is intended to allow one to assess the quality of summaries quickly, cheaply and without the need of human intervention, minimizing though, the role of subjective judgment and bias.
Anthology ID:
L08-1063
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/123_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Paulo C F de Oliveira, Edson Wilson Torrens, Alexandre Cidral, Sidney Schossland, and Evandro Bittencourt. 2008. Evaluating Summaries Automatically - A system Proposal. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Evaluating Summaries Automatically - A system Proposal (de Oliveira et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/123_paper.pdf