2025Q1 Reports: Anthology Director
Overview
The Anthology staff comprises Matt Post (director since 2019) and four paid assistants who have been helping with day-to-day operations and improvements. We also continue to benefit from our volunteers, especially longtime contributor Marcel Bollmann of Linköping University, Sweden.
Accomplishments
In addition to normal operations, we have completed the following (October 2024 through January 2025).
- In a push over Christmas, we simplified and improved the process for correcting metadata, which consumes a lot of our time. Each paper page in the Anthology now has a "Fix data" button which presents a dialog popup allowing easy correction of key data. This then is used to populate a computer-readable Github issue. There is an announcement here. This was an effort by Marcel Bollmann, Nathan Schneider, and me.
- The site was rewritten on top of a new Python library contributed by Marcel Bollmann. This library is available through the Python package manager (https://pypi.org/project/acl-anthology/). We also managed to obtain the project name from Takahiro Kubo, who had published his own module many years ago, and generously turned it over to us.
- Nathan Schneider also contributed a similar button for correcting author disambiguation information.
- David Stap has taken over ingestion of TACL and CL and has been consistent about this, addressing a small problem.
We have also continued work on improving documentation.
Issues
We have had a few videos disappear from Vimeo (#3121). We pay for the service, but have not been able to get help from their support team, despite reaching out many times.
Plans
My major goal for this year is to modify the Anthology codebase to maintain an explicit representation of people. In the current system, people are inferred from author names on papers which are pooled and merged with the help of a "name variants" file that identifies and merges people with different name representations (e.g., Aravind Joshi). Interested parties can follow discussion in issue 1179. We plan to incorporate ORC IDs.
We are also working on the ability to host and cite plenary videos (#3603, #4309).