Attachment | Size |
---|---|
lemonade-logo.png | 66.46 KB |
Dear colleagues!
Publication of negative results is difficult in most fields, but in NLP the problem is exacerbated by the near-universal focus on improvements in benchmarks. This situation implicitly discourages hypothesis-driven research, and it turns creation and fine-tuning of NLP models into art rather than science. Furthermore, it increases the time, effort, and carbon emissions spent on developing and tuning models, as the researchers have no opportunity to learn what has already been tried and failed.
This workshop invites both practical and theoretical unexpected or negative results that have important implications for future research, highlight methodological issues with existing approaches, and/or point out pervasive misunderstandings or bad practices. In particular, the most successful NLP models currently rely on Transformer-based large language models (LLMs). To complement all the success stories, it would be insightful to see where and possibly why they fail. Any NLP tasks are welcome: sequence labeling, question answering, inference, dialogue, machine translation - you name it.
A successful negative results paper would contribute one of the following:
- broadly applicable recommendations for training/fine-tuning/prompting, especially if X that didn’t work is something that many practitioners would think reasonable to try, and if the demonstration of X’s failure is accompanied by some explanation/hypothesis;
- ablation studies of components in previously proposed models, showing that their contributions are different from what was initially reported;
- datasets or probing tasks showing that previous approaches do not generalize to other domains or language phenomena;
- trivial baselines that work suspiciously well for a given task/dataset;
- cross-lingual studies showing that a technique X is only successful for a certain language or language family;
- experiments on (in)stability of the previously published results due to hardware, random initializations, preprocessing pipeline components, etc.;
- theoretical arguments and/or proofs for why X should not be expected to work;
- demonstration of issues with data processing/collection/annotation pipelines, especially if they are widely used;
- demonstration of issues with evaluation metrics (e.g. accuracy, F1 or BLEU), which prevent their usage for fair comparison of methods;
- demonstration of issues with under-reporting of training details of pre-trained models, including test data contamination and invalid comparisons.
- preliminary results of a non-mainstream approach or architecture that are valuable for the community
Important Dates
- Submission deadline: January 30, 2025
- Submission due for papers reviewed through ACL Rolling Review: February 20, 2025
- Notification of acceptance: March 1, 2025
- Camera-ready papers due: March 10, 2025
- Workshop: TBA, between May 3-4, 2025
Submission
Please confirm the workshop website at https://insights-workshop.github.io/2025/cfp/ for the submission link and more details.