Automatic Token and Turn Level Language Identification for Code-Switched Text Dialog: An Analysis Across Language Pairs and Corpora

Vikram Ramanarayanan; Robert Pugh

doi:10.18653/v1/W18-5009

Automatic Token and Turn Level Language Identification for Code-Switched Text Dialog: An Analysis Across Language Pairs and Corpora

Abstract

We examine the efficacy of various feature–learner combinations for language identification in different types of text-based code-switched interactions – human-human dialog, human-machine dialog as well as monolog – at both the token and turn levels. In order to examine the generalization of such methods across language pairs and datasets, we analyze 10 different datasets of code-switched text. We extract a variety of character- and word-based text features and pass them into multiple learners, including conditional random fields, logistic regressors and recurrent neural networks. We further examine the efficacy of novel character-level embedding and GloVe features in improving performance and observe that our best-performing text system significantly outperforms a majority vote baseline across language pairs and datasets.

Anthology ID:: W18-5009
Volume:: Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
Month:: July
Year:: 2018
Address:: Melbourne, Australia
Editors:: Kazunori Komatani, Diane Litman, Kai Yu, Alex Papangelis, Lawrence Cavedon, Mikio Nakano
Venue:: SIGDIAL
SIG:: SIGDIAL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 80–88
Language:
URL:: https://aclanthology.org/W18-5009
DOI:: 10.18653/v1/W18-5009
Bibkey:
Cite (ACL):: Vikram Ramanarayanan and Robert Pugh. 2018. Automatic Token and Turn Level Language Identification for Code-Switched Text Dialog: An Analysis Across Language Pairs and Corpora. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pages 80–88, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):: Automatic Token and Turn Level Language Identification for Code-Switched Text Dialog: An Analysis Across Language Pairs and Corpora (Ramanarayanan & Pugh, SIGDIAL 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-5009.pdf

PDF Cite Search