Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA

José-Miguel Benedí, Eduardo Lleida, Amparo Varona, María-José Castro, Isabel Galiano, Raquel Justo, Iñigo López de Letona, Antonio Miguel


Abstract
In the framework of the DIHANA project, we present the acquisitionprocess of a spontaneous speech dialogue corpus in Spanish. Theselected application consists of information retrieval by telephone for nationwide trains. A total of 900 dialogues from 225 users were acquired using the “Wizard of Oz” technique. In this work, we present the design and planning of the dialogue scenes and the wizard strategy used for the acquisition of the corpus. Then, we also present the acquisition tools and a description of the acquisition process.
Anthology ID:
L06-1304
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/504_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
José-Miguel Benedí, Eduardo Lleida, Amparo Varona, María-José Castro, Isabel Galiano, Raquel Justo, Iñigo López de Letona, and Antonio Miguel. 2006. Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA (Benedí et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/504_pdf.pdf