2020Q1 Reports: Anthology Director

From Admin Wiki
Revision as of 19:16, 3 March 2020 by Matt Post (talk | contribs) (added report)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

New ID format

Starting this year, to enable growth, we will move to a new ID format for Anthology articles. We will replace the DOS-inspired ###-#### format (e.g., P19-1017) with a more explicit format of the form {year}.{name}-{volume}.{#} (e.g., 2020.acl-long.17). This new format will solve a number of problems we are facing:

  • We were running out of conference codes, with everything piled under "W" (which originally meant "workshop" but had been overloaded to also include most non-ACL events)
  • We could only have 99 non-ACL events, which we over went last year
  • Workshops could only have 99 papers, which WMT (which is also no longer a workshop) broke in 2018, leading to unnatural volume breaks

Videos

ACL continues to spend a lot of money on videos, which are not always turned over and ingested into the Anthology. It requires a lot of effort post-conference to chase down organizers and convince video providers to upload their videos and produce the mapping file that we can use for ingestion. For example, for 2019:

  • NAACL Not ingested
  • ACL Fully ingested, thanks to post-conference help from Lluis Marquez, David Traum, and Simonetta Montemagni, Maria Cristina Schiavone, and the folks at Studio Visio.
  • EMNLP: Not ingested (though contact established)

It is a ton of work to track this down and it gets pushed aside. Moving forward, I have written a contract that should be used before hiring companies so that the upload and delivery are part of the contract. ACL 2020 is on top of this.

Hiring an Assistant

Thank you for permission to hire an assistant. I will be advertising for this soon. This should help with things like videos etc.

The catalog

Backfilling

We have been working on identifying papers missing from the Anthology. Two of note are:

  • IWPT Headed up by Kilian Gebhardt, we have hired out the scanning of years 1991 to 2000, and are now in the process of manually entering metadata.
  • MT Archive With funding provided by IAMT, I have hired Marie Dubremetz to convert a large back-catalog of MT papers and ingest them into the Anthology.
  • LiLT journal Via a request from Martha Palmer, we will soon host LiLT (Linguistic Issues in Language Technology), previously hosted at Stanford and edited by Annie Zaenen.

Both should be done in the coming months.

Looking forward

We continue to receive requests to host proceedings. In general I accept them if they look to be of good quality, can be attested by senior folks in our area, and are at least marginally related to NLP or CL. In this coming year, we will be adding:

  • CNL 2020: Seventh International Workshop on Controlled Natural Language (CNL 2020)