2023Q3 Reports: Anthology Director

From Admin Wiki
Revision as of 10:49, 22 July 2023 by Matt Post (talk | contribs) (Created Anthology report)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Anthology 2023 Q2 report 

Report

The Anthology is keeping up with its responsibilities. I think however that we are stagnating a bit and falling behind on development goals and our ability to proactively adapt to the changing conference and publishing scene.

This report covers April through early July, 2023 (including the ACL conference).

Accomplishments

  • Since the start of the year, the Anthology assistant (Xinru Yan) has logged about 20 hours a month. Her work is mostly related to handling the large stream of ingestion requests and handling corrects (PDF revisions, and also corrections to metadata such as author names and paper titles). She also contributes improvements to the underlying codebase as time permits.
  • As second assistant, David Stap, has helped with the ingestion of videos after conferences. This involves interfacing with our Underline contact(s), naming and placing the videos, and updating our database. We just ingested videos EMNLP 2022 and EACL 2023.
  • Seza Doğruöz continues to lead our indexing effort, interfacing with third-party indexers such as SCOPUS. This is an important component for many academics in Europe and Asia.
  • Volunteers continue to contribute crucial services: this includes bug fixes, code reviews, and development of the software that builds our site. Chief among these is Marcel Bollmann (Linköping University); other contributors include Nathan Schneider (Georgetown), Dan Gildea, and Arne Köhn.
  • Marcel in particular contributed a number of updates to our Github Issues that clarified and simplified submissions of corrections.
  • I spend about 3 hours a week on average. My time is spent managing the assistants, evaluating ingestion requests, approving code and changes via Github, development of our code base and, increasingly, helping with major *ACL conference ingestions.

Pain points

  • EMNLP 2022 front matter has not been delivered. This means EMNLP’s proceedings will not be accepted by SCOPUS and other indexers.
  • EMNLP 2022 attachments are not ingested, due to formatting issues.
  • EACL 2023 was somewhat easier, but again stretched out for some time as workshop organizers submitted and sometimes later corrected their proceedings.
  • ACL 2023 presented a number of difficulties, more on which below.

ACL 2023 was difficult. I spent about 12 hours, unexpectedly, helping with ACL 2023 ingestion, including a large portion of the Saturday before the first conference day, and then throughout the conference itself. Looking back, more proactivity on my part would have mitigated some of this, but much of the difficulty seems inherent in the process.

We moved to a two-stage ingestion process (main conference papers, then later workshops), which in practice was six or seven steps. This corrected our mistake for EMNLP where we insisted on sticking to our rule of ingesting nothing until we have the complete proceedings for all main volumes and workshops. This resulted in the proceedings not being ingested in time and created lots of extra work. The basic problem is that tough love is impossible; there are no real deadlines as there were in the days of print and everyone knows this.

The process is improved over EMNLP 2022, with all workshops and main conference volumes being created as separate repositories in a single Github org. However, we received the data at the last minute, and there were many changes and new issues that had to be accommodated and addressed. The volumes come together at the last minute, and workshops organizers are often late or non responsive. Many proceedings did not have the correct format despite the nice documentation developed by the ACLPUB2 team. It is often easier just to correct these ourselves than to try to get submitters to correct them.

No doubt, conference and workshop organizers also put in inordinate amounts of time and have reasons for these delays—my goal here is only to describe our perspective. The whole setup begs for automation, increased centralized coordination, and professional support.

Looking forward

The Anthology would likely not function were it not for the financial support we receive for operations. I would like to request support for development, which would help us get ahead of some of our problems. There are three projects in particular that I think would have high payoff in reducing pain and grief among volunteers in the community:

  • Automating revisions and metadata corrections. As noted, we spend a lot of time entering corrections and revisions ourselves. With Github actions, however, we could automate much of this process, automatically creating pull requests.
  • Development of the Anthology Python library. The (software) library we use to build the site could use some updating and refactoring. This would improve our internal projects and . This would also benefit the community, since there are many research projects that make use of code for reading our metadata.
  • A web-based interface for proceedings construction and submission. Softconf made ACLPUBv1 usable by busy academics by integrating the code into STARTv2. We should build a standalone GUI tool that would assemble watermarked PDFs, receive conference metadata, and would generate a validated proceedings for ingestion by the Anthology.