A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System

Tianyu Zhao, Tatsuya Kawahara


Abstract
In spoken dialog systems (SDSs), dialog act (DA) segmentation and recognition provide essential information for response generation. A majority of previous works assumed ground-truth segmentation of DA units, which is not available from automatic speech recognition (ASR) in SDS. We propose a unified architecture based on neural networks, which consists of a sequence tagger for segmentation and a classifier for recognition. The DA recognition model is based on hierarchical neural networks to incorporate the context of preceding sentences. We investigate sharing some layers of the two components so that they can be trained jointly and learn generalized features from both tasks. An evaluation on the Switchboard Dialog Act (SwDA) corpus shows that the jointly-trained models outperform independently-trained models, single-step models, and other reported results in DA segmentation, recognition, and joint tasks.
Anthology ID:
W18-5021
Volume:
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Kazunori Komatani, Diane Litman, Kai Yu, Alex Papangelis, Lawrence Cavedon, Mikio Nakano
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
201–208
Language:
URL:
https://aclanthology.org/W18-5021
DOI:
10.18653/v1/W18-5021
Bibkey:
Cite (ACL):
Tianyu Zhao and Tatsuya Kawahara. 2018. A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pages 201–208, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System (Zhao & Kawahara, SIGDIAL 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5021.pdf