Difference between revisions of "Paraphrase Identification (State of the art)"
Jump to navigation
Jump to search
(New page: * [http://research.microsoft.com/en-us/downloads/607D14D9-20CD-47E3-85BC-A2F65CD28042/default.aspx Microsoft Research Paraphrase Corpus] (MSRP) * see Dolan, Quirk, and Brockett (2004) * tr...) |
|||
Line 7: | Line 7: | ||
== Sample data == | == Sample data == | ||
− | * Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence. | + | * Sentence 1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence. |
− | * Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence. | + | * Sentence 2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence. |
* Class: 1 (true paraphrase) | * Class: 1 (true paraphrase) | ||
Revision as of 12:32, 24 March 2009
- Microsoft Research Paraphrase Corpus (MSRP)
- see Dolan, Quirk, and Brockett (2004)
- train: 4076 sentence pairs (2753 positive: 67.5%)
- test: 1725 sentence pairs (1147 positive: 66.5%)
Sample data
- Sentence 1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
- Sentence 2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
- Class: 1 (true paraphrase)
References
Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 350-356.