2024Q3 Reports: SIGWAC

From Admin Wiki
Jump to navigation Jump to search

The Special Interest Group of the ​Association for Computational Linguistics (ACL) on Web as Corpus.


  • to promote interest in the use of the web as a source of linguistic data, either for language research or language modelling
  • to serve as a communication channel for exchanging news, findings, experience in using web as a source of linguistic data
  • to endorse projects, resources and technologies that are considered examples of good practice or state-of-the-art
  • to sponsor meetings and workshops on topics related to using the web as source of linguistic data


  • Co-Chair: Nikola Ljubešić (email: nikola.ljubesic(at)ijs.si)
  • Co-Chair: Benoît Sagot (email: benoit.sagot(at)inria.fr)
  • Co-Secretary: ​Veronika Laippala (mavela(at)utu.fi)
  • Co-Secretary: Pedro Ortiz Suarez (pedro.ortiz(at)dfki.de)

The current SIGWAC officers' term is January 2023 - December 2025.


The current officers were elected in the elections held in December 2022.


The focus of the officers is on the following activities:

  • Updating the web page of the SIG https://www.sigwac.org.uk given the current focus of the ACL community on heavy usage of web data
  • Improving the visibility of the SIG by endorsing prominent projects, resources and technologies
  • Encouraging the discussion on most pressing topics of using the web as a source of linguistic data, inter alia technical aspects, and especially, the future of web as a source of human linguistic data in light of generative AI technologies
  • Endorsing events in the first part of their term, while in the last year a SIG-related event is planned


The website of the SIG is https://www.sigwac.org.uk. The officers are working on a new visual solution of the website.


Membership of the SIG is measured through the mailing list subscription. At this point there are 175 mailing list subscriptions.