ACL'04 Programme co-chairs report

Walter Daelemans & Marilyn Walker



Our first action was to write the Call for Papers for the
conference. The major decision that we took here was to depart from
previous practice, and describe the conference topics in terms of a
large set of inclusive keywords, and avoid describing specific areas
that papers could be submitted under. This decision was taken to
encourage researchers in interdisciplinary or not well represented
areas to submit papers to the conference. We also then had to verify
that the START conference system would allow papers to be submitted by
checking off multiple keywords, rather than selecting a single area.
We also attempted to coordinate our submission and notification dates
with the COLING conference in Geneva, so as to allow our notification
date to occur prior to the COLING submission date, but this was not
possible.

Next we selected a PC, consisting of 11 area chairs. In order to make
sure that a comparable number of papers could be assigned to all area
chairs, we looked for people with some breadth in the field, who apart
from their primary area of expertise, could also manage additional
topics. We believed that this would facilitate our decision to list a
large inclusive number of keywords in the Call for Papers, rather than
describing area chairs in terms of a single area that they would
handle. We also believed that this would make it easier for us to make
decisions about 'grey area' papers at the PC meeting since there would
be overlap amongst the area chairs in terms of areas of
expertise. Following is the list of area chairs, their main area, and
between brackets their secondary areas.

Elisabeth Andre: Multimodal/multimedia processing and HCI (dialogue
interaction)

Jill Burstein: NLP applications (TTS and ASR, lexical semantics,
summarization and discourse structure)

Claire Cardie: Information extraction (NLP at large, applications,
natural language understanding)

Pascale Fung: Statistical methods for NLP (Machine learning, speech,
multilinguality, information extraction, machine translation)

Hitoshi Isahara: Machine Translation and Multilinguality (semantics,
resources)

Michael Johnston: Syntax/semantics/parsing (multimodal/multimedia
processing, dialogue interaction)

Rada Mihalcea: Lexical semantics, ontologies, Word Sense
Disambiguation (parallel corpora, data-oriented machine translation)

Jon Oberlander: Discourse and dialogue (computational
psycholinguistics, multimodal processing and multimodal interaction,
generation)

Kemal Oflazer: Finite state methods, dependency parsing (grammars,
morphology, phonology, machine translation)

Kees Van Deemter: Text document and concept-to-speech generation,
Psycholinguistic Models (multimodal generation, semantics/pragmatics,
mathematical models of language)

Antal van den Bosch: Machine learning of language (morphology,
phonology, computational psycholinguistics, statistical methods)

As soon as the area chairs were assigned, they recruited reviewers
(about 20 each), which provided us with a pool of more than 200
reviewers. There was no central coordination of this and some
reviewers ended up reviewing for multiple areas.

We received 348 submissions for the main session, which were then
allocated to the 11 area chairs so that each chair had approximately
the same number of papers. While START nominally assigns papers to
areas, it makes no attempt to balance papers among areas so the
initial assignment was highly skewed. Thus the assignment must
basically be done by hand. As these areas were defined heterogeneously
(the same area chair would be responsible for different topics), the
distribution of papers over areas is not informative.  More
informative is the distribution of keywords in the papers. Authors
could assign as many keywords as they wanted to label their
submission. The following count lists the number of times keywords
have been selected by authors in their submission.

corpus based modeling of language                               122
machine learning for language                                   112
applications, tools and resources                                77
syntax                                                           74
linguistic, mathematical and psychological models of language    71
semantics                                                        66
lexicon                                                          59
information extraction                                           58
evaluation of systems                                            52
machine translation and translation aids                         52
multi lingual processing                                         40
language oriented information retrieval                          32
discourse                                                        31
discourse and dialogue                                           30
multi modal and natural language interfaces and dialogue systems 24
spoken language recognition and understanding                    23
morphology                                                       18
text, document and concept to speech generation                  16
pragmatics                                                       13
question answering                                               13
phonetics                                                         9
phonology                                                         9
multi modal language processing and multi media systems           7
message and narrative understanding systems                       5


The keywords were useful in assigning papers to area chairs,
especially the use of multiple keywords, although authors cannot be
relied on 100% to select all relevant keywords. For example, a paper
about coreference resolution was submitted under the sole keyword of
information extraction, because the algorithm was intended to be used
for that application. Note also that the combination of corpus-based
modeling of language and machine learning of language totals 234 of
the 348 papers were submitted, and that these terms co-occurred with
almost every other area, indicating that these methods have permeated
all areas of the field.

Reviewing was blind, but the area chairs had access to the names of
the authors to better allow detection of conflict of interest.

A one-day Program Committee meeting was held in Brighton at the ITRI
premises kindly made available by Donia Scott the General Chair, with
the help of Kees Van Deemter (Area Chair) and Petra Tank, Professor
Scott's PA, who handled local arrangements.

The meeting resulted in the acceptance of 88 papers, an acceptance
rate of 25%, once more an extremely competitive selection. Among the
accepted papers, 57% originates from North America, Canada and Mexico,
11% from Asia and the Pacific Rim area, and 32% from Europe. A large
proportion of the submitted (and of the accepted) papers indicated
double submission. However, all accepted papers chose to have the
paper presented at ACL rather than the other venue to which the paper
was submitted.

During and after the PC meeting, the programme committee set up a
shortlist of candidates for keynote talks, and invited Anne Cutler
(Max Planck Institute for Psycholinguistics, The Netherlands) and Jack
Mostow (Carnegie Mellon University, Robotics Institute). Both accepted
our invitation.

We also selected a best paper award winner for ACL'04.

As in previous incarnations of the ACL conference, the program is
structured into three parallel paper sessions, demo/poster sessions,
and the student workshop. The accepted papers were organized into the
27 available session slots, and labeled with session names.  Finally,
we invited 27 expert session chairs for guiding the speakers in the
technical program sessions and moderating discussion.

We received various feedback (from area chairs, reviewers, and authors
of submitted papers) on different aspects of the reviewing procedure,
which we will summarize here:

* Oriental character sets in pdf files often caused problems for
  reviewers and area chairs with some versions of some pdf
  readers. Some general solution or at least a support page should
  probably be set up for that.

* Areas. Both Pascale Fung (Statistical Methods for NLP) and Antal van
  den Bosch (Machine learning of language) said that they believed
  having an area for these methods which can apply to any topic such
  as syntax/semantics/discourse no longer makes sense because these
  areas are about methods, and these methods have permeated every
  topic in NLP. They felt that it was very difficult to recruit
  reviewers with appropriate expertise covering methods that could
  conceivably be applied to every topic in NLP.

* Reviewer Recruitment: After the paper allocation was done, three or
  four area chairs asked to recruit additional reviewers because they
  got papers in areas that they did not feel they had appropriate
  reviewers for. For example there were so many papers in discourse
  and dialogue that many of the spoken dialogue papers were directed
  to Elisabeth Andre (multimodal interaction), and there were a large
  number of discourse papers that relied on statistical methods. Some
  area chairs felt that the system could be improved if it were
  possible to wait to recruit reviewers until after area assignments
  had been made, but this would require a submission date at least a
  month earlier than we had. Another possibility would be to use the
  bidding process that START has to allow area chairs or reviewers to
  bid for papers that they want to review, since it seems likely that
  amongst the whole pool of 200 recruited reviewers, there would be
  reviewers who were appropriate for each paper, while those reviewers
  wouldn't necessarily have been recruited by the area chair who ended
  up being responsible for the paper.

* Review form. A feedback for innovation was introduced to the review
  form since there was some concern among the executive committee that
  the competitive selection process for ACL was eliminating papers
  with high novelty. Reviewers expressed some degree of unclarity
  about the difference between originality and innovation (originality
  was to be interpreted within the scope of the topic of the paper,
  whereas innovation was to be measured within the scope of the field
  as a whole). In addition, some people missed the opportunity to
  indicate their level of expertise; others missed a category for
  software and resources reusability.

* Notification feedback. Some people deplored the lack of numeric
  feedback (numeric scores). While this does indeed provide some
  useful feedback to the authors, the scores should also be
  interpreted in the context of the scores of other competing papers
  in the same and other areas, the textual comments, expertise of
  reviewers, confidential  comments, sometimes additional reading by
  members of the PC etc.  Without this background information, it may
  seem strange that one paper  is accepted with an average 6.33
  whereas another paper is rejected with  an average of 7. To us it
  seemed wise not to include the numeric  scores so we adapted START
  so that it removed this information from the reviews before they
  were sent to the authors.

* PC meeting. Possibly, one day is not enough to support a thorough
  and complete decision making process. For example, we had only time
  to discuss the unclear cases and did not explicitly review with the
  complete PC the papers with very high or very low scores (although
  obviously the reviews for these papers were checked by the PC chairs
  and at least one area chair). We believe a one and a half day
  meeting finishing at lunch time on the second day would work much
  better. Ideally this might allow itself enough time for the schedule
  for the program to be organized, the invited speakers to be
  selected, and the best paper selected at the end of the meeting
  rather than leaving this as a task for the PC chairs. Finally, some
  more time than the current 10 days should have been scheduled
  between deadline for reviewers and PC meeting.

It would have been impossible to achieve the fast, efficient and
hardcopy-less (i.e. cheap) submission - review - notification cycle
without an electronic conference management system like START. START
is a relatively mature and stable system and the support is
adequate. However, we did experience a number of problems with and
shortcomings of START in the version we used. We add an appendix to
this report, written by the local ACL-04 START system maintainer, Guy
De Pauw, which may be of interest to future users of the software. One
START-related problem the software was not to blame for is that we
chose to host the system at a local machine at the University of
Antwerp. This created major problems at a crucial moment (area chairs
sending out review information to reviewers) because of a sudden,
badly communicated, change in the network security set-up of the
University of Antwerp (disallowing use of mail servers except from a
limited trusted set). We were also threatened a few times by power
cuts, which fortunately didn't affect the procedure. Therefore our
advice would be to use START preferably hosted at a softconf.com
server, as suggested by softconf itself.



APPENDIX:

Feedback about START:

* Submission

. The 'Submission Report' could perhaps point out that the
registration information is also sent through e-mail

. It would be good if authors can be forced to select at least one
keyword

. Perhaps an (optional) checkbox can be included: 'This paper is under
consideration for other conferences' yes/no

. It is now possible to submit without actually submitting an actual
file. It would be useful to be able to make the file submission field
a mandatory field. In any case: if no paper is being submitted, the
confirmation should not read 'We successfully received your submission
to...'

. It is possible to submit after the deadline if you open submit.html
before the deadline closes

. It would be useful if an archive of all submitted papers can be
created, similar to the "make archive" for final submissions

* Reviewing

. After submitting a review, a link to the list of papers to be
reviewed would be handy

. In the review form, you can use both 'Upload Comments File' as well
as 'Enter Comments Here' at the same time. Please indicate that the
'enter comments here' box is the default

. In the review form, it would be handy to be able to add a short
piece of information for each evaluation category

. It is possible to access abstracts of submissions that one hasn't
been assigned to review, by manipulating ID-numbers in the assignment
URLs

. When an author contacts you to withdraw a paper, you can delete it
from the system. But when you want to contact the reviewers who had
been assigned to that paper, it is not possible anymore to retrieve
this information.

. One user commented on the location of the buttons on the review
form.  He noted that the buttons for accept and clear are the "wrong
way round" and that "Where the clear button is is on most Windows
applications the OK button!!!"

. One of the track administrators decided to access all the reviews by
using a wget operation. This process also accessed the "delete
reviews" link, thereby effectively deleting all the reviews. Perhaps
there should be some kind of confirmation before reviews can be
deleted

. It would be useful if start can automatically extract a spreadsheet
with the reviewers/secondary reviewers, possible associated with
tracks/papers

* Tracks

. Using a conference with tracks can get very complicated. It should
be made possible that Program committee members added by track chairs
are also added centrally.

. Some dedicated program committee page would be handy that indicates
for each author in which tracks they are reviewing.

. It would also be handy if the number of reviews per author can be
viewed centrally (especially for authors who are reviewing in several
tracks at once).

. 1 reviewer assigned to different tracks gets different review
assignment e-mails. This often confuses them. It would be useful if a
reviewer can slogin centrally and still get to see all the reviews
he's doing (for different tracks)

. In a track-based conference, it would be useful to classify incoming
papers by default in a temporary track and not yet in one of the
actual tracks

. Review Progress seems to be broken centrally. If you want a correct
overview of review progress per reviewer, you have to do this on a
track by track basis

* General Management

. From the different setup pages, there should be a direct link back
to the Manager's Console at the top and bottom of the page

. Many users (authors, program committee members, ...) lose their
password. It would be handy to have a password retrieval
option. Perhaps an option for the manager's console: get personal
information for a given account name/ e-mail address, paper number,
...

. Start was not able to process an (admittedly weird) e-mail address
containing a "+" sign