Workshop on RESOURCEs and representations For Under-resourced Languages and domains

Event Notification Type: 
Call for Papers
Abbreviated Title: 
RESOURCEFUL-2025
Location: 
Hestia Hotel Europa
Friday, 2 May 2025
Country: 
Estonia
City: 
Tallinn
Contact: 
Nikolai Ilinykh
Špela Arhar Holdt
Barbara Scalvini
Submission Deadline: 
Monday, 9 December 2024

============================================
Call for Papers and Extended Abstracts
RESOURCEFUL 2025
2 March, Tallinn, Estonia
============================================

Important dates
============================================
* Submission deadline (both papers and abstracts): December 9th 2024
* Notification of acceptance: January 20th 2025
* Camera-ready version: February 3rd 2025
* Workshop date: March 2nd 2025
All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
============================================

We would like to invite you to submit papers to the 3rd Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2025) co-located with The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025) in Tallinn, Estonia on March 2nd, 2025.

Overview
============================================
The workshop is a part of the workshop series RESOURCEFUL, focusing on RESOURCEs and representations For Under-resourced Languages and domains. The main goal of the workshop is to further explore the role of the type and quality of resources that are available to computational linguists as well as challenges and directions for constructing new resources in light of the latest trends in natural language processing, computational linguistics and artificial intelligence.

On the one hand, data-driven machine learning techniques in natural language processing have achieved remarkable performance in many tasks, but to do so, large quantities of quality data (mostly text) are required. One question that has been raised is whether text-only data is enough to capture semantics or other modalities such as images, sounds, situated context and embodiment are required. Interpretability studies of large language models have revealed that even with large datasets the models still do not cover all the contexts of human social activity and are prone to capturing unwanted bias where data is focused towards only some contexts. Collecting, managing and understanding linguistic data in the age of machine learning is challenging and different tools are required to address these questions.

On the other hand, expert-driven annotator-based resources have been constructed over the years based on theoretical work in linguistics, psychology and related fields and a large amount of work has been done both theoretically and practically. One challenge is understanding to what degree such resources which have traditionally been aimed at rule-based natural language processing approaches are relevant today for both machine learning techniques and neuro-symbolic methods. Both types of resources are used by computational linguists. How can they be adapted for one another? To what degree can data-driven approaches be used to facilitate expert-driven annotation? What are the current challenges for expert-based annotation and data-driven methods? How can crowdsourcing and citizen science be used in building resources? How can we evaluate and reduce unwanted bias? How can these resources contribute to machine learning approaches of these tasks?

Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, psychology, computational linguistics, speech, computer science, machine learning, computer vision, etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and presentations of on-going or completed research.

Topics of interest
============================================
We would like to open a forum by bringing together students, researchers, and experts to address and discuss the following points:

* The types of linguistic knowledge that should be captured by the models across different contexts and tasks.
* Practical methods for sampling and extracting knowledge.
* Relevance of traditional NLP resources for use in data-driven approaches.
* Use of data-driven approaches to enhance expert-driven annotation processes.
* Current challenges faced in expert-based annotation.
* Crowdsourcing and citizen science initiatives to build and enrich linguistic resources.
* Methods to evaluate and mitigate unwanted biases in linguistic models and data.
* Creating anonymised and pseudonymised datasets and models
* Evaluating the role of modern LLMs in the creation of new linguistic resources.

Submission details
============================================
We invite submissions of both long (8 pages) and short papers (4 pages) with any number of pages for references. All submissions must follow the NoDaLida template, available in both LaTeX and MS Word, the templates are available at the official conference website, https://www.nodalida-bhlt2025.eu/call-for-papers#h.v2k63awq0fpe. Submissions must be anonymous and submitted in the PDF format through OpenReview.

We also invite submissions of maximum 2-page extended non-anonymous abstracts with any number of pages for references describing work in progress, negative results and opinion pieces. The abstracts, which should follow the same formatting templates as the archival track, will be considered by the workshop organisers and the accepted ones will be posted on the workshop website.

Papers of any length related to our theme and already published elsewhere will be considered for acceptance for presentation. However, these will not be considered for publication in the proceedings.

We will make the submission link available in late October.

Organisers
============================================
Špela Arhar Holdt, University of Ljubljana, Slovenia [core organiser]
Mattias Appelgren, University of Gothenburg, Sweden
Micaella Bruton, Stockholm University, Sweden
Dana Dannélls, Språkbanken Text, University of Gothenburg, Sweden
Simon Dobnik, CLASP, University of Gothenburg, Sweden
Nikolai Ilinykh, CLASP, University of Gothenburg, Sweden [core organiser]
Crina Tudor, Stockholm University, Sweden
Beáta Megyesi, Stockholm University, Sweden
Joakim Nivre, RISE and Uppsala University, Sweden
Iben Nyholm Debess, University of the Faroe Islands, Faroe Islands
Barbara Scalvini, University of the Faroe Islands, Faroe Islands [core organiser]
Sara Stymne, Uppsala University, Sweden
Jörg Tiedemann, University of Helsinki, Finland
Lilja Øvrelid, University of Oslo, Norway

Contact information
============================================
For questions and comments, please email Spela.ArharHoldt [at] ff.uni-lj.si, barbaras [at] setur.fo, nikolai.ilinykh [at] gu.se.