Dhouha Bouamor


2016

pdf bib
Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality
Dhouha Bouamor | Leonardo Campillos Llanos | Anne-Laure Ligozat | Sophie Rosset | Pierre Zweigenbaum
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

While measuring the readability of texts has been a long-standing research topic, assessing the technicality of terms has only been addressed more recently and mostly for the English language. In this paper, we train a learning-to-rank model to determine a specialization degree for each term found in a given list. Since no training data for this task exist for French, we train our system with non-lexical features on English data, namely, the Consumer Health Vocabulary, then apply it to French. The features include the likelihood ratio of the term based on specialized and lay language models, and tests for containing morphologically complex words. The evaluation of this approach is conducted on 134 terms from the UMLS Metathesaurus and 868 terms from the Eugloss thesaurus. The Normalized Discounted Cumulative Gain obtained by our system is over 0.8 on both test sets. Besides, thanks to the learning-to-rank approach, adding morphological features to the language model features improves the results on the Eugloss thesaurus.

pdf bib
Managing Linguistic and Terminological Variation in a Medical Dialogue System
Leonardo Campillos Llanos | Dhouha Bouamor | Pierre Zweigenbaum | Sophie Rosset
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce a dialogue task between a virtual patient and a doctor where the dialogue system, playing the patient part in a simulated consultation, must reconcile a specialized level, to understand what the doctor says, and a lay level, to output realistic patient-language utterances. This increases the challenges in the analysis and generation phases of the dialogue. This paper proposes methods to manage linguistic and terminological variation in that situation and illustrates how they help produce realistic dialogues. Our system makes use of lexical resources for processing synonyms, inflectional and derivational variants, or pronoun/verb agreement. In addition, specialized knowledge is used for processing medical roots and affixes, ontological relations and concept mapping, and for generating lay variants of terms according to the patient’s non-expert discourse. We also report the results of a first evaluation carried out by 11 users interacting with the system. We evaluated the non-contextual analysis module, which supports the Spoken Language Understanding step. The annotation of task domain entities obtained 91.8% of Precision, 82.5% of Recall, 86.9% of F-measure, 19.0% of Slot Error Rate, and 32.9% of Sentence Error Rate.

2015

pdf bib
Un patient virtuel dialogant
Leonardo Campillos | Dhouha Bouamor | Éric Bilinski | Anne-Laure Ligozat | Pierre Zweigenbaum | Sophie Rosset
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

Le démonstrateur que nous décrivons ici est un prototype de système de dialogue dont l’objectif est de simuler un patient. Nous décrivons son fonctionnement général en insistant sur les aspects concernant la langue et surtout le rapport entre langue médicale de spécialité et langue générale.

pdf bib
Description of the PatientGenesys Dialogue System
Leonardo Campillos Llanos | Dhouha Bouamor | Éric Bilinski | Anne-Laure Ligozat | Pierre Zweigenbaum | Sophie Rosset
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2013

pdf bib
Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge
Dhouha Bouamor | Adrian Popescu | Nasredine Semmar | Pierre Zweigenbaum
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Towards a Generic Approach for Bilingual Lexicon Extraction from Comparable Corpora
Dhouha Bouamor | Nasredine Semmar | Pierre Zweigenbaum
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
(Utilisation de la similarité sémantique pour l’extraction de lexiques bilingues à partir de corpus comparables) [in French]
Dhouha Bouamor | Nasredine Semmar | Pierre Zweigenbaum
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf bib
Mining a Bilingual Lexicon of MultiWord Expressions : A Statistical Machine Translation Evaluation Perspective (Acquisition de lexique bilingue d’expressions polylexicales: Une application à la traduction automatique statistique) [in French]
Dhouha Bouamor
Proceedings of RECITAL 2013

pdf bib
Building Specialized Bilingual Lexicons Using Word Sense Disambiguation
Dhouha Bouamor | Nasredine Semmar | Pierre Zweigenbaum
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Using WordNet and Semantic Similarity for Bilingual Terminology Mining from Comparable Corpora
Dhouha Bouamor | Nasredine Semmar | Pierre Zweigenbaum
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

pdf bib
Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
Dhouha Bouamor | Nasredine Semmar | Pierre Zweigenbaum
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Automatic Construction of a MultiWord Expressions Bilingual Lexicon: A Statistical Machine Translation Evaluation Perspective
Dhouha Bouamor | Nasredine Semmar | Pierre Zweigenbaum
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon

pdf bib
Identifying bilingual Multi-Word Expressions for Statistical Machine Translation
Dhouha Bouamor | Nasredine Semmar | Pierre Zweigenbaum
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT). In this paper, we describe a strategy for detecting translation pairs of MWEs in a French-English parallel corpus. In addition we introduce three methods aiming to integrate extracted bilingual MWE S in M OSES, a phrase based Statistical Machine Translation (SMT) system. We experimentally show that these textual units can improve translation quality.