Difference between revisions of "Paraphrase Identification (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
Line 11: Line 11:
 
* Class: 1 (true paraphrase)
 
* Class: 1 (true paraphrase)
  
 +
 +
== Table of results ==
 +
 +
{| border="1" cellpadding="5" cellspacing="1" width="100%"
 +
|-
 +
! Algorithm
 +
! Reference
 +
! Type
 +
! Accuracy
 +
! F
 +
|-
 +
| MCS
 +
| Mihalcea et al. (2006)
 +
| combination of several word similarity measures
 +
| 70.3%
 +
| 81.3%
 +
|-
 +
|}
  
 
== References ==
 
== References ==
Line 16: Line 34:
 
Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora:
 
Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora:
 
Exploiting massively parallel news sources], ''Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)'', Geneva, Switzerland, pp. 350-356.
 
Exploiting massively parallel news sources], ''Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)'', Geneva, Switzerland, pp. 350-356.
 +
 +
Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity, ''Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)'', Boston, Massachusetts, pp. 775-780.
  
  

Revision as of 13:46, 24 March 2009

  • Microsoft Research Paraphrase Corpus (MSRP)
  • see Dolan, Quirk, and Brockett (2004)
  • train: 4076 sentence pairs (2753 positive: 67.5%)
  • test: 1725 sentence pairs (1147 positive: 66.5%)


Sample data

  • Sentence 1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
  • Sentence 2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
  • Class: 1 (true paraphrase)


Table of results

Algorithm Reference Type Accuracy F
MCS Mihalcea et al. (2006) combination of several word similarity measures 70.3% 81.3%

References

Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 350-356.

Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity, Proceedings of the National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts, pp. 775-780.


See also