MSF2 The Portuguese/Spanish corpus of Multi-Sentence Fusion (Repository)

From ACL Wiki
Jump to navigation Jump to search


  • ADCR ID: ADCR2020T001
  • Name of Dataset: MSF2 Corpus
  • Citation: If you use the MSF2 corpus in your research, please include the following citation in any resulting papers:
Elvys Linhares Pontes, Juan-Manuel Torres-Moreno, Stéphane Huet, Andréa Linhares. A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task. Proceedings of the 11th edition of the Language Resources and Evaluation Conference, May 2018, Miyazaki, Japan.
Elvys Linhares Pontes, Juan-Manuel Torres-Moreno, Stéphane Huet, Andréa Linhares. hal-01722130 ArXiv
  • Description: The MSF2 corpus consists of three directories : src : sentence clusters in raw and tokenized formats

ref : manual compressions to be used for ROUGE/BLEU automatic evaluation; pos : tokenized and Part-Of-Speech tagged sentences (using TreeTagger Pos-tagger). For more information, please see the documentation file that is included in the package.