Workshop Statistical Natural Language Processing and Weighted Automata

Event Notification Type: 
Call for Papers
Abbreviated Title: 
StatFSM 2016
Location: 
Humboldt University (ACL 2016)
Friday, 12 August 2016
State: 
Berlin
Country: 
Germany
Contact Email: 
City: 
Berlin
Contact: 
Bryan Jurish (Berlin Brandenburg Academy of Sciences and Humanities)
Andreas Maletti (Stuttgart University)
Uwe Springmann (HU Berlin, LMU Munich)
Kay-Michael Würzner (Berlin Brandenburg Academy of Sciences and Humanities)
Submission Deadline: 
Sunday, 8 May 2016

====================================
StatFSM 2016 - First Call For Papers
====================================

ACL 2016 Workshop
Statistical Natural Language Processing and Weighted Automata
=============================================================

August 12, 2016 Berlin, Germany
http://zwei.dwds.de/statfsm/

Deadline for paper submissions: 8 May, 2016 (11:59pm GMT -12)

This workshop is endorsed by SIGFSM, the Special Interest Group on
Finite State Automata and Natural Language Processing and OCR-D, the
DFG coordination project for the improvement of OCR methods.

Workshop description
----------------------------------
The past 20 years have seen a fundamental paradigm shift in the field
of automated natural language processing: though long dominated by
rule-based techniques, the vast majority of contemporary approaches
are now based on underlying statistical models. Many classes of
statistical models such as Hidden Markov Models have direct
connections to graph- rsp. automata-theory. Open research questions
remain however regarding the formal relation between automata and
other popular statistical models such as Conditional Random Fields or
Support Vector Machines.

The purpose of the workshop is to bring together researchers
interested in statistical natural language processing, automata
theory and application. While the interests and methods of these
different communities overlap considerably, there has been little
institutional recognition of shared problems and techniques.

Special Theme:
Automata-based techniques in Optical Character Recognition
-------------------------------------------------------------------------------------------
Increasing efforts by libraries and publishing houses to digitize
sources not originating in electronic form and the resulting vast
quantity of digitally available books has led in recent years to a
commensurate demand for high-quality, flexible, and cost-efficient
text transcription techniques, Optical Character Recognition (OCR)
being of great interest in this regard. Paralleling the increased use
of OCR techniques on the part of text providers, interest in
computational linguistic research on the topic has grown as well,
since many typical OCR-related tasks touch on the discipline's core
issues. Our proposed special theme is aimed to reflect the growing
interest in OCR-related topics from the fields of computational
linguistics and digital humanities on the one hand, and to raise
awareness of the associated challenges among the automata research
community on the other.

Keynote Speaker
--------------------------
We are happy to announce that Jason Eisner (Johns Hopkins University) has
agreed to give a keynote at the workshop.

Focus of content
-------------------------
We invite researchers to submit papers containing substantial,
original, and unpublished research, potentially including strong work
in progress. Appropriate topics include (but are not limited to) the
following:

- weighted automata, their theory and applications,
- statistical NLP; in particular approaches using finite-state
techniques,
- results concerning the relation of statistical models and weighted
automata,
- automata-based formalizations or implementations of statistical
methods,
- machine learning approaches relating to the other topics,
- machine learning of finite-state models of natural language,
- systems and frameworks for OCR/OLR with a connection to
automata-based methods,
- statistical approaches to automated page segmentation and document
analysis,
- supervised or unsupervised extraction of lexica, language- or
error-models for OCR post-correction, and
- systems and frameworks for post-correction or -segmentation of OCR
output texts, especially those making use of weighted automata.

Submissions
--------------------
All submissions should follow the ACL 2016 style guidelines and must
be in PDF format. Style files are available for download from the ACL
2016 website at http://acl2016.org/files/acl2016.zip.

Long papers which describe substantial, original, completed and
unpublished work may consist of up to eight (8) pages of content,
plus references. Short papers which report focused contributions,
ongoing research, negative results or system descriptions may consist
of up to four (4) pages of content, plus references.

Reviewing will be double-blind, and thus no author information should
be included in the papers; self-reference should be avoided as well.
Papers that do not conform to these requirequirements will be rejected
without review. Accepted papers will appear in the workshop
proceedings.

Papers should be submitted electronically using the Softconf START
conference management system via:

https://www.softconf.com/acl2016/StatFSM/

Please choose the appropriate submission type from the submission
page. Submissions must be uploaded by the submission deadline
8 May, 2016 (11:59pm GMT -12 hours).

Schedule
--------------

May 8, 2016
Long and Short Paper submission deadline

June 5, 2016
Notification of acceptance

June 22, 2016
Camera-ready deadline

August 12, 2016
Workshop

Programme Committee
-----------------------------------
- Borja Balle (Lancaster University, UK)
- Francisco Casacuberta (Instituto Tecnológico de Informática, Spain)
- Simon Clematide (University of Zurich, Switzerland)
- Gregory Crane (University of Leipzig, Germany)
- Frank Drewes (Umeå University, Sweden)
- Jason Eisner (Johns Hopkins University, Baltimore, MD, USA)
- Colin de la Higuera (Nantes University, France)
- Mans Hulden (University of Colorado, Boulder, CO, USA)
- Krister Lindén (University of Helsinki, Finnland)
- Kevin Knight (University of Southern California, CA, USA)
- Marcus Eichenberger-Liwicki (University of Kaiserslautern, Germany)
- Stoyan Mihov (Bulgarian Academy of Sciences, Sofia, Bulgaria)
- Mark-Jan Nederhof (University of St Andrews, UK)
- Michael Riley (Google Inc., USA)
- Martin Reynaert (Tilburg University, the Netherlands)
- Brian Roark (Google Inc., USA)
- Richard Sproat (Google Inc., USA)
- Heiko Vogler (Dresden University of Technology, Germany)
- Bruce Watson (Stellenbosch University, South Africa)

Organizers
----------------
- Bryan Jurish (Berlin Brandenburg Academy of Sciences and Humanities)
- Andreas Maletti (Stuttgart University)
- Uwe Springmann (HU Berlin, LMU Munich)
- Kay-Michael Würzner (Berlin Brandenburg Academy of Sciences and
Humanities)

Contact
------------
For any inquiries regarding the workshop please send an email to
statfsm2016 [at] bbaw.de