Difference between revisions of "Paraphrase Identification (State of the art)"

From ACL Wiki
Jump to: navigation, search
Line 33: Line 33:
 
| unsupervised combination of several word similarity measures
 
| unsupervised combination of several word similarity measures
 
| 70.3%
 
| 70.3%
 +
| 81.3%
 +
|-
 +
| STS
 +
| Islam and Inkpen (2007)
 +
| unsupervised combination of semantic and string similarity
 +
| 72.6%
 
| 81.3%
 
| 81.3%
 
|-
 
|-
Line 60: Line 66:
  
 
Fernando, S., and Stevenson, M. (2008). [http://nlp.shef.ac.uk/talks/Fernando_20080304.pdf A semantic similarity approach to paraphrase detection], ''Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium''.
 
Fernando, S., and Stevenson, M. (2008). [http://nlp.shef.ac.uk/talks/Fernando_20080304.pdf A semantic similarity approach to paraphrase detection], ''Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium''.
 +
 +
Islam, A. and Inkpen, D. (2007). [http://www.site.uottawa.ca/~diana/publications/ranlp_2007_textsim_camera_ready.pdf Semantic similarity of short texts], ''Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2007)'', Borovets, Bulgaria, pp. 291-297.
  
 
Mihalcea, R., Corley, C., and Strapparava, C. (2006). [http://reference.kfupm.edu.sa/content/c/o/corpus_based_and_knowledge_based_measure_3759629.pdf Corpus-based and knowledge-based measures of text semantic similarity], ''Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)'', Boston, Massachusetts, pp. 775-780.
 
Mihalcea, R., Corley, C., and Strapparava, C. (2006). [http://reference.kfupm.edu.sa/content/c/o/corpus_based_and_knowledge_based_measure_3759629.pdf Corpus-based and knowledge-based measures of text semantic similarity], ''Proceedings of the National Conference on Artificial Intelligence (AAAI 2006)'', Boston, Massachusetts, pp. 775-780.

Revision as of 15:03, 24 March 2009

  • source: Microsoft Research Paraphrase Corpus (MSRP)
  • task: given a pair of sentences, classify them as paraphrases or not paraphrases
  • see: Dolan et al. (2004)
  • train: 4,076 sentence pairs (2,753 positive: 67.5%)
  • test: 1,725 sentence pairs (1,147 positive: 66.5%)


Sample data

  • Sentence 1: Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.
  • Sentence 2: Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
  • Class: 1 (true paraphrase)


Table of results

Algorithm Reference Description Accuracy F
RMLMG Rus et al. (2008) unsupervised graph subsumption 70.6% 80.5%
MCS Mihalcea et al. (2006) unsupervised combination of several word similarity measures 70.3% 81.3%
STS Islam and Inkpen (2007) unsupervised combination of semantic and string similarity 72.6% 81.3%
QKC Qiu et al. (2006) supervised sentence dissimilarity classification 72.0% 81.6%
matrixJcn Fernando and Stevenson (2008) unsupervised JCN WordNet similarity with matrix 74.1% 82.4%
WDDP Wan et al. (2006) supervised dependency-based features 75.6% 83.0%

References

Dolan, B., Quirk, C., and Brockett, C. (2004). Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources, Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 350-356.

Fernando, S., and Stevenson, M. (2008). A semantic similarity approach to paraphrase detection, Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium.

Islam, A. and Inkpen, D. (2007). Semantic similarity of short texts, Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria, pp. 291-297.

Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity, Proceedings of the National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts, pp. 775-780.

Qiu, L. and Kan, M.Y. and Chua, T.S. (2006). Paraphrase recognition via dissimilarity significance classification, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 18-26.

Rus, V. and McCarthy, P.M. and Lintean, M.C. and McNamara, D.S. and Graesser, A.C. (2008). Paraphrase identification with lexico-syntactic graph subsumption, FLAIRS 2008, pp. 201-206.

Wan, S., Dras, M., Dale, R., and Paris, C. (2006). Using dependency-based features to take the "para-farce" out of paraphrase, Proceedings of the Australasian Language Technology Workshop (ALTW 2006), pp. 131-138.


See also