2016Q3 Reports: SIGWAC

From Admin Wiki
Jump to: navigation, search


The Special Interest Group on the Web as Corpus (SIGWAC) has 178 members as of 11 June 2016 (based on subscriptions to the SIGWAC mailing list).

The SIGWAC community keeps in touch through a mailing list (http://devel.sslmit.unibo.it/mailman/listinfo/sigwac) and the SIGWAC home page (http://sigwac.org.uk/).

The SIGWAC board

Elections were held in July/August 2015. There were only two nominations for the two vacant positions, who were thus elected unopposed.

Chair: Roland Schäfer (http://rolandschaefer.net/) Researcher at Freie Universität Berlin

Secretary: Egon W. Stemle (http://iiegn.eu/work/) Researcher at European Academy of Bozen/Bolzano

The current board serves from 1 Aug 2015 to 31 July 2018.

10th Web as Corpus Workshop (WAC-X) in 2016

The 10th Web as Corpus workshop (WAC-X) will be co-located with the ACL Conference in Berlin on 12 August 2016. It features five oral and four poster presentations. The workshop program will be published on the workshop homepage:


WAC-X will also feature two satellite events: (1) the final meeting of the EmpiriST shared task on Automatic Linguistic Annotation of (German) Computer-Mediated Communication/Social Media organized by the German Society for Computational Linguistics and Language Technology (GSCL):


(2) A panel discussion on Corpora, open science, and copyright reforms, which focuses on the problematic legal situation for corpus designers in many EU countries where there are no Fair Use exemptions to copyright.

Events planned for 2017

WAC-XI: SIGWAC intends to organize the 11th Web as Corpus Workshop (WAC-XI) in 2017, co-located with one of the major computational linguistics or corpus linguistics conferences (ACL, LREC, etc.). Organizers, schedule, and details are to be confirmed at WAC-X in August 2016.

CleanerEval: Based on an online survey among the SIGWAC members conducted by the organizers of WAC-9 in 2014 and discussions at WAC-9, SIGWAC plans to organize a 2017 shared task on combined boilerplate detection and text quality evaluation. Taking place ten years after the 2007 SIGWAC CleanEval shared task, it might be called CleanerEval. The SIG on Digital Humanities of the German Society for Computational Linguistics and Language Technology (GSCL) – GSCL AK DigHum – has expressed interest in co-organizing CleanerEval. According to the preliminary discussions, at least English, German, and French tracks will most likely be organized. Interest was also expressed to include at least one lesser resourced language represented on the WWW (such as Malay). The final discussion at WAC-X will serve as a starting point for the organization of CleanerEval.