Paraphrase Identification (State of the art)

From ACL Wiki
Revision as of 12:32, 24 March 2009 by Pdturney (talk | contribs)
Jump to navigation Jump to search
  • Microsoft Research Paraphrase Corpus (MSRP)
  • see Dolan, Quirk, and Brockett (2004)
  • train: 4076 sentence pairs (2753 positive: 67.5%)
  • test: 1725 sentence pairs (1147 positive: 66.5%)


Sample data

  • Sentence 1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
  • Sentence 2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
  • Class: 1 (true paraphrase)


References

Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 350-356.


See also