ACL'04 Programme co-chairs report Walter Daelemans & Marilyn Walker Our first action was to write the Call for Papers for the conference. The major decision that we took here was to depart from previous practice, and describe the conference topics in terms of a large set of inclusive keywords, and avoid describing specific areas that papers could be submitted under. This decision was taken to encourage researchers in interdisciplinary or not well represented areas to submit papers to the conference. We also then had to verify that the START conference system would allow papers to be submitted by checking off multiple keywords, rather than selecting a single area. We also attempted to coordinate our submission and notification dates with the COLING conference in Geneva, so as to allow our notification date to occur prior to the COLING submission date, but this was not possible. Next we selected a PC, consisting of 11 area chairs. In order to make sure that a comparable number of papers could be assigned to all area chairs, we looked for people with some breadth in the field, who apart from their primary area of expertise, could also manage additional topics. We believed that this would facilitate our decision to list a large inclusive number of keywords in the Call for Papers, rather than describing area chairs in terms of a single area that they would handle. We also believed that this would make it easier for us to make decisions about 'grey area' papers at the PC meeting since there would be overlap amongst the area chairs in terms of areas of expertise. Following is the list of area chairs, their main area, and between brackets their secondary areas. Elisabeth Andre: Multimodal/multimedia processing and HCI (dialogue interaction) Jill Burstein: NLP applications (TTS and ASR, lexical semantics, summarization and discourse structure) Claire Cardie: Information extraction (NLP at large, applications, natural language understanding) Pascale Fung: Statistical methods for NLP (Machine learning, speech, multilinguality, information extraction, machine translation) Hitoshi Isahara: Machine Translation and Multilinguality (semantics, resources) Michael Johnston: Syntax/semantics/parsing (multimodal/multimedia processing, dialogue interaction) Rada Mihalcea: Lexical semantics, ontologies, Word Sense Disambiguation (parallel corpora, data-oriented machine translation) Jon Oberlander: Discourse and dialogue (computational psycholinguistics, multimodal processing and multimodal interaction, generation) Kemal Oflazer: Finite state methods, dependency parsing (grammars, morphology, phonology, machine translation) Kees Van Deemter: Text document and concept-to-speech generation, Psycholinguistic Models (multimodal generation, semantics/pragmatics, mathematical models of language) Antal van den Bosch: Machine learning of language (morphology, phonology, computational psycholinguistics, statistical methods) As soon as the area chairs were assigned, they recruited reviewers (about 20 each), which provided us with a pool of more than 200 reviewers. There was no central coordination of this and some reviewers ended up reviewing for multiple areas. We received 348 submissions for the main session, which were then allocated to the 11 area chairs so that each chair had approximately the same number of papers. While START nominally assigns papers to areas, it makes no attempt to balance papers among areas so the initial assignment was highly skewed. Thus the assignment must basically be done by hand. As these areas were defined heterogeneously (the same area chair would be responsible for different topics), the distribution of papers over areas is not informative. More informative is the distribution of keywords in the papers. Authors could assign as many keywords as they wanted to label their submission. The following count lists the number of times keywords have been selected by authors in their submission. corpus based modeling of language 122 machine learning for language 112 applications, tools and resources 77 syntax 74 linguistic, mathematical and psychological models of language 71 semantics 66 lexicon 59 information extraction 58 evaluation of systems 52 machine translation and translation aids 52 multi lingual processing 40 language oriented information retrieval 32 discourse 31 discourse and dialogue 30 multi modal and natural language interfaces and dialogue systems 24 spoken language recognition and understanding 23 morphology 18 text, document and concept to speech generation 16 pragmatics 13 question answering 13 phonetics 9 phonology 9 multi modal language processing and multi media systems 7 message and narrative understanding systems 5 The keywords were useful in assigning papers to area chairs, especially the use of multiple keywords, although authors cannot be relied on 100% to select all relevant keywords. For example, a paper about coreference resolution was submitted under the sole keyword of information extraction, because the algorithm was intended to be used for that application. Note also that the combination of corpus-based modeling of language and machine learning of language totals 234 of the 348 papers were submitted, and that these terms co-occurred with almost every other area, indicating that these methods have permeated all areas of the field. Reviewing was blind, but the area chairs had access to the names of the authors to better allow detection of conflict of interest. A one-day Program Committee meeting was held in Brighton at the ITRI premises kindly made available by Donia Scott the General Chair, with the help of Kees Van Deemter (Area Chair) and Petra Tank, Professor Scott's PA, who handled local arrangements. The meeting resulted in the acceptance of 88 papers, an acceptance rate of 25%, once more an extremely competitive selection. Among the accepted papers, 57% originates from North America, Canada and Mexico, 11% from Asia and the Pacific Rim area, and 32% from Europe. A large proportion of the submitted (and of the accepted) papers indicated double submission. However, all accepted papers chose to have the paper presented at ACL rather than the other venue to which the paper was submitted. During and after the PC meeting, the programme committee set up a shortlist of candidates for keynote talks, and invited Anne Cutler (Max Planck Institute for Psycholinguistics, The Netherlands) and Jack Mostow (Carnegie Mellon University, Robotics Institute). Both accepted our invitation. We also selected a best paper award winner for ACL'04. As in previous incarnations of the ACL conference, the program is structured into three parallel paper sessions, demo/poster sessions, and the student workshop. The accepted papers were organized into the 27 available session slots, and labeled with session names. Finally, we invited 27 expert session chairs for guiding the speakers in the technical program sessions and moderating discussion. We received various feedback (from area chairs, reviewers, and authors of submitted papers) on different aspects of the reviewing procedure, which we will summarize here: * Oriental character sets in pdf files often caused problems for reviewers and area chairs with some versions of some pdf readers. Some general solution or at least a support page should probably be set up for that. * Areas. Both Pascale Fung (Statistical Methods for NLP) and Antal van den Bosch (Machine learning of language) said that they believed having an area for these methods which can apply to any topic such as syntax/semantics/discourse no longer makes sense because these areas are about methods, and these methods have permeated every topic in NLP. They felt that it was very difficult to recruit reviewers with appropriate expertise covering methods that could conceivably be applied to every topic in NLP. * Reviewer Recruitment: After the paper allocation was done, three or four area chairs asked to recruit additional reviewers because they got papers in areas that they did not feel they had appropriate reviewers for. For example there were so many papers in discourse and dialogue that many of the spoken dialogue papers were directed to Elisabeth Andre (multimodal interaction), and there were a large number of discourse papers that relied on statistical methods. Some area chairs felt that the system could be improved if it were possible to wait to recruit reviewers until after area assignments had been made, but this would require a submission date at least a month earlier than we had. Another possibility would be to use the bidding process that START has to allow area chairs or reviewers to bid for papers that they want to review, since it seems likely that amongst the whole pool of 200 recruited reviewers, there would be reviewers who were appropriate for each paper, while those reviewers wouldn't necessarily have been recruited by the area chair who ended up being responsible for the paper. * Review form. A feedback for innovation was introduced to the review form since there was some concern among the executive committee that the competitive selection process for ACL was eliminating papers with high novelty. Reviewers expressed some degree of unclarity about the difference between originality and innovation (originality was to be interpreted within the scope of the topic of the paper, whereas innovation was to be measured within the scope of the field as a whole). In addition, some people missed the opportunity to indicate their level of expertise; others missed a category for software and resources reusability. * Notification feedback. Some people deplored the lack of numeric feedback (numeric scores). While this does indeed provide some useful feedback to the authors, the scores should also be interpreted in the context of the scores of other competing papers in the same and other areas, the textual comments, expertise of reviewers, confidential comments, sometimes additional reading by members of the PC etc. Without this background information, it may seem strange that one paper is accepted with an average 6.33 whereas another paper is rejected with an average of 7. To us it seemed wise not to include the numeric scores so we adapted START so that it removed this information from the reviews before they were sent to the authors. * PC meeting. Possibly, one day is not enough to support a thorough and complete decision making process. For example, we had only time to discuss the unclear cases and did not explicitly review with the complete PC the papers with very high or very low scores (although obviously the reviews for these papers were checked by the PC chairs and at least one area chair). We believe a one and a half day meeting finishing at lunch time on the second day would work much better. Ideally this might allow itself enough time for the schedule for the program to be organized, the invited speakers to be selected, and the best paper selected at the end of the meeting rather than leaving this as a task for the PC chairs. Finally, some more time than the current 10 days should have been scheduled between deadline for reviewers and PC meeting. It would have been impossible to achieve the fast, efficient and hardcopy-less (i.e. cheap) submission - review - notification cycle without an electronic conference management system like START. START is a relatively mature and stable system and the support is adequate. However, we did experience a number of problems with and shortcomings of START in the version we used. We add an appendix to this report, written by the local ACL-04 START system maintainer, Guy De Pauw, which may be of interest to future users of the software. One START-related problem the software was not to blame for is that we chose to host the system at a local machine at the University of Antwerp. This created major problems at a crucial moment (area chairs sending out review information to reviewers) because of a sudden, badly communicated, change in the network security set-up of the University of Antwerp (disallowing use of mail servers except from a limited trusted set). We were also threatened a few times by power cuts, which fortunately didn't affect the procedure. Therefore our advice would be to use START preferably hosted at a softconf.com server, as suggested by softconf itself. APPENDIX: Feedback about START: * Submission . The 'Submission Report' could perhaps point out that the registration information is also sent through e-mail . It would be good if authors can be forced to select at least one keyword . Perhaps an (optional) checkbox can be included: 'This paper is under consideration for other conferences' yes/no . It is now possible to submit without actually submitting an actual file. It would be useful to be able to make the file submission field a mandatory field. In any case: if no paper is being submitted, the confirmation should not read 'We successfully received your submission to...' . It is possible to submit after the deadline if you open submit.html before the deadline closes . It would be useful if an archive of all submitted papers can be created, similar to the "make archive" for final submissions * Reviewing . After submitting a review, a link to the list of papers to be reviewed would be handy . In the review form, you can use both 'Upload Comments File' as well as 'Enter Comments Here' at the same time. Please indicate that the 'enter comments here' box is the default . In the review form, it would be handy to be able to add a short piece of information for each evaluation category . It is possible to access abstracts of submissions that one hasn't been assigned to review, by manipulating ID-numbers in the assignment URLs . When an author contacts you to withdraw a paper, you can delete it from the system. But when you want to contact the reviewers who had been assigned to that paper, it is not possible anymore to retrieve this information. . One user commented on the location of the buttons on the review form. He noted that the buttons for accept and clear are the "wrong way round" and that "Where the clear button is is on most Windows applications the OK button!!!" . One of the track administrators decided to access all the reviews by using a wget operation. This process also accessed the "delete reviews" link, thereby effectively deleting all the reviews. Perhaps there should be some kind of confirmation before reviews can be deleted . It would be useful if start can automatically extract a spreadsheet with the reviewers/secondary reviewers, possible associated with tracks/papers * Tracks . Using a conference with tracks can get very complicated. It should be made possible that Program committee members added by track chairs are also added centrally. . Some dedicated program committee page would be handy that indicates for each author in which tracks they are reviewing. . It would also be handy if the number of reviews per author can be viewed centrally (especially for authors who are reviewing in several tracks at once). . 1 reviewer assigned to different tracks gets different review assignment e-mails. This often confuses them. It would be useful if a reviewer can slogin centrally and still get to see all the reviews he's doing (for different tracks) . In a track-based conference, it would be useful to classify incoming papers by default in a temporary track and not yet in one of the actual tracks . Review Progress seems to be broken centrally. If you want a correct overview of review progress per reviewer, you have to do this on a track by track basis * General Management . From the different setup pages, there should be a direct link back to the Manager's Console at the top and bottom of the page . Many users (authors, program committee members, ...) lose their password. It would be handy to have a password retrieval option. Perhaps an option for the manager's console: get personal information for a given account name/ e-mail address, paper number, ... . Start was not able to process an (admittedly weird) e-mail address containing a "+" sign