Detecting Code-Switching between Turkish-English Language Pair

Zeynep Yirmibeşoğlu; Gülşen Eryiğit

doi:10.18653/v1/W18-6115

Detecting Code-Switching between Turkish-English Language Pair

Abstract

Code-switching (usage of different languages within a single conversation context in an alternative manner) is a highly increasing phenomenon in social media and colloquial usage which poses different challenges for natural language processing. This paper introduces the first study for the detection of Turkish-English code-switching and also a small test data collected from social media in order to smooth the way for further studies. The proposed system using character level n-grams and conditional random fields (CRFs) obtains 95.6% micro-averaged F1-score on the introduced test data set.

Anthology ID:: W18-6115
Volume:: Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
Month:: November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:: WNUT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 110–115
Language:
URL:: https://aclanthology.org/W18-6115
DOI:: 10.18653/v1/W18-6115
Bibkey:
Cite (ACL):: Zeynep Yirmibeşoğlu and Gülşen Eryiğit. 2018. Detecting Code-Switching between Turkish-English Language Pair. In Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pages 110–115, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Detecting Code-Switching between Turkish-English Language Pair (Yirmibeşoğlu & Eryiğit, WNUT 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-6115.pdf

PDF Cite Search