Deadline extended: "Dimensions of Meaning: Distributional and Curated Semantics" Workshop at NAACL

Event Notification Type: 
Call for Papers
Abbreviated Title: 
DistCurate 2022
Location: 
Collocated with NAACL
Thursday, 14 July 2022
State: 
Washington
Country: 
USA
City: 
Seattle
Contact: 
Collin F. Baker
Miriam R. L. Petruck
Submission Deadline: 
Tuesday, 12 April 2022

Important dates

  • April 12, 2022: Workshop Paper New Due Date
  • May 6, 2022: Notification of Acceptance
  • May 20, 2022: Camera-ready papers due
  • July 14, 2022: Workshop (Probably hybrid-- in-person and virtual)

CALL FOR PAPERS
Broadly speaking, computational linguistics research can be divided into two main streams: The first consists of work that relies primarily on operationalizing prior knowledge about language and its use, such as scripts, planning, scenarios, scripts for virtual assistants and FrameNet (FN) frames (Ruppenhofer et al., 2016) as well as lexical databases such as WordNet (Fellbaum 1998), VerbNet (Kipper et al., 2000), and PropBank (Palmer et al., 2005), among others. The second seeks to derive knowledge directly from data (text, speech, and increasingly vision) with unsupervised (or distantly supervised) methods, which are distributional and frequency-based, in Linguistics (Biber et al. 2020), Cognitive Science (Xu and Xu 2021), and Computational Linguistics, notably vector embeddings like BERT (Devlin et al. 2019). They are often complementary: Kuznetsov and Gurevych (2018) combine POS tagging and lemmatization to improve vector embeddings; Qian et al. (2021) combine syntactic knowledge with neural language models to improve accuracy.

These issues are as pertinent today as they were at the 1994 ACL workshop "The Balancing Act: Combining Symbolic and Statistical Approaches to Language". Despite great advances in statistical approaches to meaning, many questions remain unresolved. Specifically, what important dimensions of meaning can vectors obtain that FrameNet or other human-curated resources cannot and vice versa? What techniques best recover different dimensions of meaning? We invite papers that explore the space at the intersection of these two methodologies.
The goal of the workshop is to bring together researchers working in each of these two main approaches to address questions such as:

  • What are the strengths and limitations of each approach?
  • Are there types of knowledge that can be extracted from text/speech by one of them and not the other? Why?
  • How well can each represent relations and enable reasoning over text?
  • What limits the further progress of each approach?
  • In a perfect world, how could the field overcome these limitations? Would combining the two approaches solve all the problems?
  • What would overcoming such limitations accomplish for NLP/NLU?

We seek papers that explore the differences between knowledge-based approaches (particularly frame-based approaches) and distributional approaches, alignment tools or comparisons between tasks done with distributional techniques versus FrameNet, those that focus on one approach or the other, as it pertains to obtaining dimensions of meaning, and those that demonstrate ways to combine both approaches. These topics are inherently crosslinguistic, so we encourage submissions from all regions and all languages. We also encourage submissions by undergraduates and graduate students, who we will be happy to mentor.

Submissions may be short papers (up to 4 pages plus references) or long papers (up to 8 pages plus references). They must conform to the ACL format; paper formatting guidelines are at ACLPUB. We strongly advise using one of the ACL templates for LaTeX or MS Word available from that website or on Overleaf.

We are using the START system for reviewing; please submit your papers to the workshop submissions page HERE. (If the link does not work, you can paste "https://www.softconf.com/naacl2022/DistCurate2022/" into your web browser. If you have further difficulty please email
DistCurate2022_naacl2022 [at] softconf.com

Organizers

  • Collin Baker, ICSI
  • Michael Ellsworth, ICSI
  • Miriam R. L. Petruck, ICSI

Invited Speaker

  • Christopher Potts (Stanford U)

Program Committee

  • Omri Abend (Hebrew U Jerusalem)
  • Gerard de Melo (U Potsdam)
  • Katrin Erk (U Texas Austin)
  • Annette Frank (U Heidelberg)
  • Richard Futrell (U CA Irvine)
  • Christopher Potts (Stanford U)
  • Michael Roth (U Saarland)
  • Nathan Schneider (Georgetown U)

References

Baker, Collin F. and Arthur Lorenzi. 2020. Exploring Crosslinguistic Frame Alignment. In Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet. 77–84. Marseille, France. European Language Resources Association.

Biber, Douglas, Jesse Egbert, and Daniel Keller. 2020. Reconceptualizing register in a continuous situational space. Corpus Linguistics and Linguistic Theory16.3.581-616.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186. Minneapolis, Minnesota. Association for Computational Linguistics.

Ellsworth, Michael, Collin Baker, and Miriam R. L. Petruck. 2021. FrameNet and Typology. In Proceedings of the 3rd Workshop on Computational Typology and Multilingual NLP. 61–66. Association for Computational Linguistics.

Fellbaum, Christiane. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

Kipper, Karen, Hoa Trang Dang, and Martha Palmer. 2000. Class-Based Construction of a Verb Lexicon. In Proceedings of the 17th National Conference on Artificial Intelligence and the 12th Conference on Innovative Applications of Artificial Intelligence. 91-696. Austin TX. AAAI Press.

Kuznetsov, Ilia and Iryna Gurevych. 2018. From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources. In Proceedings of the 27th International Conference on Computational Linguistics. 233–244. Santa Fe, New Mexico, USA. Association for Computational Linguistics.

Palmer, Martha, Paul Kingsbury, and Dan Gildea. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics. 31.1.71-106.

Qian, Peng, Tahira Naseem, Roger Levy, and Ramón Fernandez Astudillo. 2021. Structural Guidance for Transformer Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . 3735–3745. Online. Association for Computational Linguistics.

Ruppenhofer, Josef Michael Ellsworth, Miriam R. L Petruck, Christopher R. Johnson, Collin F. Baker, Jan Scheffczyk. 2016. FrameNet II: Extended Theory and Practice.Online. Berkeley. ICSI.

Xu, Aotao and Yang Xu. 2021. Chaining and the formation of spatial semantic categories in childhood. In Proceedings of the 43rd Annual Meeting of the Cognitive Science Society