Difference between revisions of "Paraphrase Identification (State of the art)"
Line 40: | Line 40: | ||
| 72.0% | | 72.0% | ||
| 81.6% | | 81.6% | ||
+ | |- | ||
+ | | matrixJcn | ||
+ | | Fernando and Stevenson (2008) | ||
+ | | unsupervised JCN WordNet similarity with matrix | ||
+ | | 74.1% | ||
+ | | 82.4% | ||
|- | |- | ||
| WDDP | | WDDP | ||
Line 52: | Line 58: | ||
Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], ''Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)'', Geneva, Switzerland, pp. 350-356. | Dolan, B., Quirk, C., and Brockett, C. (2004). [http://acl.ldc.upenn.edu/C/C04/C04-1051.pdf Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources], ''Proceedings of the 20th international conference on Computational Linguistics (COLING 2004)'', Geneva, Switzerland, pp. 350-356. | ||
+ | |||
+ | Fernando, S., and Stevenson, M. (2008). [http://nlp.shef.ac.uk/talks/Fernando_20080304.pdf A semantic similarity approach to paraphrase detection], ''Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium''. | ||
Mihalcea, R., Corley, C., and Strapparava, C. (2006). [http://reference.kfupm.edu.sa/content/c/o/corpus_based_and_knowledge_based_measure_3759629.pdf Corpus-based and knowledge-based measures of text semantic similarity], ''Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)'', Boston, Massachusetts, pp. 775-780. | Mihalcea, R., Corley, C., and Strapparava, C. (2006). [http://reference.kfupm.edu.sa/content/c/o/corpus_based_and_knowledge_based_measure_3759629.pdf Corpus-based and knowledge-based measures of text semantic similarity], ''Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)'', Boston, Massachusetts, pp. 775-780. |
Revision as of 13:46, 24 March 2009
- source: Microsoft Research Paraphrase Corpus (MSRP)
- task: given a pair of sentences, classify them as paraphrases or not paraphrases
- see: Dolan et al. (2004)
- train: 4,076 sentence pairs (2,753 positive: 67.5%)
- test: 1,725 sentence pairs (1,147 positive: 66.5%)
Sample data
- Sentence 1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
- Sentence 2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
- Class: 1 (true paraphrase)
Table of results
Algorithm | Reference | Description | Accuracy | F |
---|---|---|---|---|
RMLMG | Rus et al. (2008) | unsupervised graph subsumption | 70.6% | 80.5% |
MCS | Mihalcea et al. (2006) | unsupervised combination of several word similarity measures | 70.3% | 81.3% |
QKC | Qiu et al. (2006) | supervised sentence dissimilarity classification | 72.0% | 81.6% |
matrixJcn | Fernando and Stevenson (2008) | unsupervised JCN WordNet similarity with matrix | 74.1% | 82.4% |
WDDP | Wan et al. (2006) | supervised dependency-based features | 75.6% | 83.0% |
References
Dolan, B., Quirk, C., and Brockett, C. (2004). Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources, Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 350-356.
Fernando, S., and Stevenson, M. (2008). A semantic similarity approach to paraphrase detection, Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium.
Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity, Proceedings of the National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts, pp. 775-780.
Qiu, L. and Kan, M.Y. and Chua, T.S. (2006). Paraphrase recognition via dissimilarity significance classification, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 18-26.
Rus, V. and McCarthy, P.M. and Lintean, M.C. and McNamara, D.S. and Graesser, A.C. (2008). Paraphrase identification with lexico-syntactic graph subsumption, FLAIRS 2008, pp. 201-206.
Wan, S., Dras, M., Dale, R., and Paris, C. (2006). Using dependency-based features to take the "para-farce" out of paraphrase, Proceedings of the Australasian Language Technology Workshop (ALTW 2006), pp. 131-138.