Identifying Paraphrases between Technical and Lay Corpora

Louise Deléger, Pierre Zweigenbaum


Abstract
In previous work, we presented a preliminary study to identify paraphrases between technical and lay discourse types from medical corpora dedicated to the French language. In this paper, we test the hypothesis that the same kinds of paraphrases as for French can be detected between English technical and lay discourse types and report the adaptation of our method from French to English. Starting from the constitution of monolingual comparable corpora, we extract two kinds of paraphrases: paraphrases between nominalizations and verbal constructions and paraphrases between neo-classical compounds and modern-language phrases. We do this relying on morphological resources and a set of extraction rules we adapt from the original approach for French. Results show that paraphrases could be identified with a rather good precision, and that these types of paraphrase are relevant in the context of the opposition between technical and lay discourse types. These observations are consistent with the results obtained for French, which demonstrates the portability of the approach as well as the similarity of the two languages as regards the use of those kinds of expressions in technical and lay discourse types.
Anthology ID:
L10-1327
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/472_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Louise Deléger and Pierre Zweigenbaum. 2010. Identifying Paraphrases between Technical and Lay Corpora. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Identifying Paraphrases between Technical and Lay Corpora (Deléger & Zweigenbaum, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/472_Paper.pdf