Paraphrase Identification (State of the art)
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
- source: Microsoft Research Paraphrase Corpus (MSRP)
- task: given a pair of sentences, classify them as paraphrases or not paraphrases
- see: Dolan et al. (2004)
- train: 4,076 sentence pairs (2,753 positive: 67.5%)
- test: 1,725 sentence pairs (1,147 positive: 66.5%)
Sample data
- Sentence 1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
- Sentence 2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
- Class: 1 (true paraphrase)
Table of results
Algorithm | Reference | Description | Accuracy | F |
---|---|---|---|---|
MCS | Mihalcea et al. (2006) | unsupervised combination of several word similarity measures | 70.3% | 81.3% |
WDDP | Wan et al. (2006) | supervised dependency-based features | 75.6% | 83.0% |
References
Dolan, B., Quirk, C., and Brockett, C. (2004). Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources, Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 350-356.
Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity, Proceedings of the National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts, pp. 775-780.
Wan, S., Dras, M., Dale, R., and Paris, C. (2006). Using dependency-based features to take the "para-farce" out of paraphrase, Proceedings of the Australasian Language Technology Workshop (ALTW 2006), pp. 131-138.