ACL Logo ACL Anthology
A Digital Archive of Research Papers in Computational Linguistics

Google search the Anthology

Special Interest Group on Web as Corpus (SIGWAC)

To SIGWAC Home Page

» Toggle Table of Contents

2016 Proceedings of the 10th Web as Corpus Workshop
2014 Proceedings of the 9th Web as Corpus Workshop (WaC-9)
2010 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
2007 WAC3, Louvain-la-Neuve, Belgium, 15-16 September 2007
2006 Proceedings of the 2nd International Workshop on Web as Corpus
2005 WAC1, at Corpus Linguistics conference, Birmingham, UK, July 2005


  1. Proceedings of the 10th Web as Corpus Workshop

  2. W16-2601: Roland Schäfer; Felix Bildhauer
    Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison
  3. W16-2602: Adrien Barbaresi
    Efficient construction of metadata-enhanced web corpora
  4. W16-2603: Andrew Salway; Dag Elgesem; Knut Hofland; Øystein Reigem; Lubos Steskal
    Topically-focused Blog Corpora for Multiple Languages
  5. W16-2604: Anne Krause
    The Challenges and Joys of Analysing Ongoing Language Change in Web-based Corpora: a Case Study
  6. W16-2605: Quirin Würschinger; Mohammad Fazleh Elahi; Desislava Zhekova; Hans-Jörg Schmid
    Using the Web and Social Media as Corpora for Monitoring the Spread of Neologisms. The case of 'rapefugee', 'rapeugee', and 'rapugee'.
  7. W16-2606: Michael Beißwenger; Sabine Bartsch; Stefan Evert; Kay-Michael Würzner
    EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora
  8. W16-2607: Thomas Proisl; Peter Uhrig
    SoMaJo: State-of-the-art tokenization for German web and social media texts
  9. W16-2608: Jakob Prange; Andrea Horbach; Stefan Thater
    UdS-(retrain|distributional|surface): Improving POS Tagging for OOV Words in German CMC and Web Data
  10. W16-2609: Gideon Mendels; Erica Cooper; Julia Hirschberg
    Babler - Data Collection from the Web to Support Speech Recognition and Keyword Search
  11. W16-2610: Nikola Ljubešić; Darja Fišer
    A Global Analysis of Emoji Usage
  12. W16-2611: Erika Dalan; Serge Sharoff
    Genre classification for a corpus of academic webpages
  13. W16-2612: Roland Schäfer
    On Bias-free Crawling and Representative Web Corpora
  14. W16-2613: Steffen Remus; Gerold Hintz; Chris Biemann; Christian M. Meyer; Darina Benikova; Judith Eckle-Kohler; Margot Mieskes; Thomas Arnold
    EmpiriST: AIPHES - Robust Tokenization and POS-Tagging for Different Genres
  15. W16-2614: Egon Stemle
    bot.zen $@$ EmpiriST 2015 - A minimally-deep learning PoS-tagger (trained for German CMC and Web data)
  16. W16-2615: Tobias Horsmann; Torsten Zesch
    LTL-UDE $@$ EmpiriST 2015: Tokenization and PoS Tagging of Social Media Text


  1. Proceedings of the 9th Web as Corpus Workshop (WaC-9)

  2. W14-04 [bib]: Entire Volume
  3. W14-0400 [bib]: Front Matter

  4. W14-0401 [bib]: Adrien Barbaresi
    Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources
  5. W14-0402 [bib]: Roland Schäfer; Adrien Barbaresi; Felix Bildhauer
    Focused Web Corpus Crawling
  6. W14-0403 [bib]: Maik Stührenberg
    Less Destructive Cleaning of Web Documents by Using Standoff Annotation
  7. W14-0404 [bib]: Magali Sanches Duran; Lucas Avanço; Sandra Aluísio; Thiago Pardo; Maria da Graça Volpe Nunes
    Some Issues on the Normalization of a Corpus of Products Reviews in Portuguese
  8. W14-0405 [bib]: Nikola Ljubešić; Filip Klubička
    {bs,hr,sr}WaC - Web Corpora of Bosnian, Croatian and Serbian
  9. W14-0406 [bib]: Verena Lyding; Egon Stemle; Claudia Borghetti; Marco Brunello; Sara Castagnoli; Felice Dell'Orletta; Henrik Dittmann; Alessandro Lenci; Vito Pirrelli
    The PAISÀ Corpus of Italian Web Texts


  1. Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop

  2. W10-15 [bib]: Entire Volume
  3. W10-1500 [bib]: Front Matter

  4. W10-1501 [bib]: Emiliano Raul Guevara
    NoWaC: a large web-based corpus for Norwegian
  5. W10-1502 [bib]: Markus Dickinson; Ross Israel; Sun-Hee Lee
    Building a Korean Web Corpus for Analyzing Learner Language
  6. W10-1503 [bib]: Amit Goyal; Jagadeesh Jagaralamudi; Hal Daumé III; Suresh Venkatasubramanian
    Sketching Techniques for Large Scale NLP
  7. W10-1504 [bib]: George Dillon
    Building Webcorpora of Academic Prose with BootCaT
  8. W10-1505 [bib]: Stefan Evert
    Google Web 1T 5-Grams Made Easy (but not for the computer)


  1. WAC3, Louvain-la-Neuve, Belgium, 15-16 September 2007

    To Meeting Home Page


  1. Proceedings of the 2nd International Workshop on Web as Corpus

  2. W06-1700: Front Matter

  3. W06-1701: BBEAndrás Kornai]; BBEPéter Halácsy]; BBEViktor Nagy]; BBECsaba Oravecz]; BBEViktor Trón]; BBEDániel Varga]
    Web-based frequency dictionaries for medium density languages
  4. W06-1702: BBEMike Cafarella]; BBEOren Etzioni]
    BE: A search engine for NLP research
  5. W06-1703: BBEMasatsugu Tonoike]; BBEMitsuhiro Kida]; BBEToshihiro Takagi]; BBEYasuhiro Sasaki]; BBETakehito Utsuro ]; BBES. Sato]
    A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the Web
  6. W06-1704: BBEGemma Boleda]; BBEStefan Bott]; BBERodrigo Meza]; BBECarlos Castillo]; BBEToni Badia]; BBEVicente López]
    CUCWeb: A Catalan corpus built from the Web
  7. W06-1705: BBEPaul Rayson]; BBEJames Walkerdine]; BBEWilliam H. Fletcher]; BBEAdam Kilgarriff]
    Annotated Web as corpus
  8. W06-1706: BBEArno Scharl]; BBEAlbert Weichselbraun]
    Web coverage of the 2004 US Presidential election
  9. W06-1707: BBECédrick Fairon]
    Corporator: A tool for creating RSS-based specialized corpora
  10. W06-1708: BBEDavide Fossati]; BBEGabriele Ghidoni]; BBEBarbara Di Eugenio]; BBEIsabel Cruz]; BBEHuiyong Xiao]; BBERajen Subba]
    The problem of ontology alignment on the Web: A first report
  11. W06-1709: BBEKie Zuraw]
    Using the Web as a phonological corpus: A case study from Tagalog
  12. W06-1710: BBERüdiger Gleim]; BBEAlexander Mehler ]; BBEMatthias Dehmer]
    Web corpus mining by instance of Wikipedia


  1. WAC1, at Corpus Linguistics conference, Birmingham, UK, July 2005

    To Meeting Home Page