We invite submissions for the Workshop on Computational Approaches to Linguistic Code-Switching. This edition will be the seventh edition of the workshop that is collocated with NAACL 2025.
Bilingual and multilingual speakers often engage in code-switching (CS), mixing languages within a conversation, influenced by cultural nuances. CS can occur at inter-sentential, intra-sentential, and morphological levels, posing challenges for language understanding and generation. Models trained for a single language often struggle with mixed-language input. Despite advances in multilingual pre-trained language models (LMs), they may still perform poorly on CS data. Research on LMs' ability to process CS data, considering cultural nuances, reasoning, coverage, and performance biases, remains underexplored.
As CS becomes more common in informal communication like newsgroups, tweets, and social media, research on LMs processing mixed-language data is urgently needed. This workshop aims to unite researchers working on spoken and written CS technologies, promoting collaboration to improve AI's handling of CS across diverse linguistic contexts.
Website: https://code-switching.github.io/2025
Topics of Interest
The workshop invites contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop include the following:
- Development of data and model resources to support research on CS data
- New data augmentation techniques for improving robustness on CS data
- New approaches for NLP downstream tasks: question answering, conversational agents, named entity recognition, sentiment analysis, machine translation, language generation, and ASR in CS data
- NLP techniques for the syntactic analysis of CS data
- Domain, dialect, genre adaptation techniques applied to CS data processing
- Language modeling approaches to CS data processing
- Sociolinguistic and/or sociopragmatic aspects of CS
- Techniques and metrics for automatically evaluating synthetically generated CS text
- Utilization of LLMs and assessment of their performance on NLP tasks for CS data
- Survey and position papers discussing the challenges of CS data to NLP techniques
- Ethical issues and consideration on CS applications
Important Dates
- Workshop submission deadline (regular and non-archival submissions): 21 February 2025
- Notification of acceptance: 8 March 2025
- Camera ready papers due: 17 March 2025
- Workshop date: 3/4 May 2025
All deadlines are 11.59 pm UTC -12h ("anywhere on Earth").
Submission Portal
https://openreview.net/group?id=aclweb.org/NAACL/2025/Workshop/CALCS
Shared Task
We are also organizing a shared task competition focused on automatically evaluating synthetically generated CS text. Automatic CS text generation is valuable for various tasks, especially given the scarcity of such data. Data augmentation has proven effective in improving model performance across tasks and languages. Furthermore, the need for generating CS text in dialogue systems has been emphasized by the benefits achieved in enabling chatbots to produce code-switching. As the demand for generating CS text increases, robust evaluation methods are essential to assess the quality of generations in terms of accuracy and fluency. This area still lacks sufficient research in data and methodologies. Our shared task aims to enable further progress in this field.
We invite all interested peers to get in touch about participation and follow the website for further updates. A separate CFP will be sent out for the shared task.
Organizing Committee
- Barid Xi Ai, National University of Singapore
- Injy Hamed, MBZUAI
- Mahardika Krisna Ihsani, MBZUAI
- Sudipta Kar, Amazon Alexa AI
- Garry Kuwanto, Boston University
- Thamar Solorio, MBZUAI and University of Houston
- Derry Tanti Wijaya, Monash University Indonesia
- Genta Indra Winata, Capital One AI Foundations
- Marina Zhukova, University of California
Contact email: calcsworkshops@gmail.com