2014Q1 Reports: Info Officer

From Admin Wiki
Jump to navigation Jump to search

[Link to 2013 Q3 Report] [Link to 2013 Q1 Report]

The Information Officer (IO) portfolio includes integration of the different ACL-wide activities that are related to information dissemination; including the Anthology, website, wiki, portal and archive. Plans include provide integration of logins (through OpenID and OAuth; IN PROGRESS); update our information services to be updated and professionally-designed (PLANNED).

Long-term goals for the costs of the information services to be sponsored, movement of the aclweb.org infrastructure to a more modern webhost, accessibility and long-term maintenance of the aclweb.org and other sites, and to be cost-neutral through sponsorship by corporate interests.

IO Overview

Budget. The IO has budget to oversee part-time manpower allocated to help improve our association's websites, which includes maintenance, upgrading, migrating and backup. So far, we have incurred costs of BUG, which is well in line with our projections.

DOIs/CrossRef. We are now a registered body for assigning DOIs to scholarly materials in CrossRef. This costs USD 275 per year and 1 USD per resource registered for a DOI. Under previous agreement, we decided to start with DOI assignment with our new TACL journal, but it seems that the journal's EIC staff are encumbered by other issues. One important consideration of joining CrossRef and being able to assign DOIs is CrossRef's mandate that all journal articles do outbound citation linking within 18 months of joining CrossRef (see here, rule #6). We will be working with TACL to establish this workflow with TACL first, and then try to propagate this for use in our conference systems (may have to be tied with START development). The EICs have not had the bandwidth to deal with this issue (as it requires some reworking of how cited items are reported in their system).

Given this bottleneck and the unforseeable delay in getting a workflow established, I would suggest that we start with our voluminous conference proceeding data instead. I will start with the annual ACL proceedings, then deal with chapter conferences (EACL, NAACL) next.

Collaboration with ELRA. We have also discussed joint work with ELRA's executive officers, Nicoletta Calzolari and Khalid Choukri about information services. As a result, we have a few late-breaking items for discussion.

  • Use of the LRE Map at all ACL conferences. The LRE Map is a project to catalog language resource (LR) use in CL publications, which started with LREC 2010. Since then it has become more popular and a number of ACL sponsored events have been also using this framework. START (our de facto conference management system) has a module that allows authors to express the use of LR resources in publications. ELRA has asked whether it is possible that ACL officially sanction the use of the LRE Map in conferences, rather than having ERLA officials ask conference chairs on a per-case basis. ELRA maintains to keep the LRE Map and associated resources free for all use. A discussion between the Information Office and Conference Office initiated and we concur that such a policy would both benefit the ACL and CL/NLP in general. As such we propose the following motion:

Formal Motion. The ACL exec endorses the optional use of ELRA's LRE Map for use in any ACL event and publication by individual authors and by event chairs.

  • ISLRN. ELRA is also planning to assign a unique identifier to all language resources (akin to a DOI for scholarly materials). ELRA would like ACL to lend its support and backing of such a scheme. The information office feels this will benefit the ACL and CL/NLP community as a whole, so we have backed this plan. We assume no official motion is needed for this at this stage, as there is no standard yet and as it falls under the purview of the Information Office, but if it does need an official motion, we propose the following:

We note that both of ELRA's initiatives already have wide support (LDC, AFNLP, etc.) in the CL/NLP community.

Plans. ELRA is also interested in this initiative, and may decide to adopt our system in having a cross organization login system; we'll be keeping them informed of our decisions and may consider their feedback when we deem that it benefits our membership.

Other plans include:

  • Adding a vetting service for bibliographies, so that any ACL paper referencing an ACL publication would have a proper identifier to the work. This is likely to take the industry standard form of the DOI.
  • Investigating the citation indexing of our materials -- it appears that ACL materials are somewhat haphazardly indexed by Elsevier and SCI. We need to address this because many of our membership depend on the availability of indexing in determining whether to publish in our venues or not.

Anthology

The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all. We employ a Creative Commons Attribution Non-Commercial, Share-Alike license for materials published by ACL, although dual licensing for a fee is presumably possible (although not exercised currently).

This past half year, aside from regular ingestion, videos from NAACL have been made available as links.

The Anthology now contains over 24,500 papers (up from 21,900 articles from a year ago, and 23,000 from 6 months ago). The new ACL Anthology is now active but not out of beta yet, so we are going to maintain both sites for the time being. The domain name aclanthology.info has been registered for 10 years to point towards the webhost that provides the Anthology.

Mailing List. The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 469 members (up from 394 from a year ago, and 426 from six months ago). This is an announcement-only list, where we notify members of newly listed released materials online.

Plans

A key thrust this year is in the addition of becoming a DOI assignee as part of the CrossRef publishers' cooperative. This will allow us to register our own DOIs for publications which will route to the ACL Anthology or TACL pages (see other report from the Information Office). Currently we have an agreement with the ACM to assign DOIs through them, but this costs us pageviews and the opportunity to control where we want the information to go.

A second thrust will to best handle the other forms of scientific knowledge that we are interested in archiving. These include software, datasets and video. The procedures for integrating these with START and the submission process need to be worked out, and the space requirements for these services assessed. For the time being, we will concentrate on videos (as NAACL is supposed to be making these available).

A third thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus.

We have long term plans to work on these other following problems, which are less urgent:

  • collaboration with START and aclpub (also may involve the Conference Officer's, Jian's, work)
  • PDF metadata fixing for all articles. Crucially, Google Scholar uses this information but it is not always correctly generated.
  • One PDF file per article. This is especially problematic for the J79 series, which largely represents one issue per PDF file.
  • Incorporation of TACL accepted articles into the Anthology. Currently one difficulty is that TACL submissions can also appear as a ACL publication. Likely, we will just list both publications as (unlinked) records for now.

ELRA has contact us (via Nicoletta and Khalid) to ask for some joint initiatives between ACL/ELRA and other sister CL organizations. I report this in the IO report, but one item is relevant to the Anthology office. ELRA has back issues of some of their conference materials and inquired about how we scan and digitize our materials for the Anthology. The bulk of the ACL legacy materials were bulk scanned by Steven Bird prior to 2004 by a third-party, so we only do one-off digitization for now. I may voluntarily assist ELRA with some of their materials as they need. With respect to the items that the IO/CO roles wish to work on with ELRA, this is considered a low-priority for now.

Website / Portal

The ACL website continues to serve as the primary online resource for the organization. It contains the main ACL site, an ACL Wiki which serves as a resource to the general computational linguistics community, an ACL Admin wiki used to store and maintain ACL specific resources such as reports, handbooks, and policies as well as an exec wiki reserved for the use of ACL execs. We also maintain mirrors of individual ACL conference websites, membership email lists for ACL announcements and a listing of resolutions of the ACL Exec Committee.

The ACL Portal was created to provide a web-based platform to house facilities for the benefit of members. The Portal currently serves little function other than maintaining a list of current members and a payment gateway for membership. We are currently working towards integrating the Portal into the website's functionality, now that both systems are run on a common platform (Drupal 7). Integration will involve upgrading existing custom modules developed by Ben Phelan (the previous developer) for the Portal to Drupal 7; this is ongoing work.

We are now working in parallel on consolidating the ACL Website and the ACL Portal, and on the establishment of a central login for ACL services (something akin to a "ACL Account" a la Google or Facebook). We are planning to use OpenID and OAuth, which would allow members to link their ACL account with other (i.e., Google, LinkedIn, Twitter, Microsoft/Hotmail) services; such that one could use login credentials from those services for ACL use.

Update. From 2013 Q3 to now, we have reached several milestones:

  • We completed updates to our software installations to fix security concerns and updated the website to be responsive (so that it functions somewhat better on mobile devices such as smartphones and tablets). In particular, our MediaWiki installations were updated, and the main website was upgraded to a Drupal 7 installation that is responsive. This makes the website largely compatible with the code for the Portal.
  • We have also mirrored a few additional old conference websites (especially when those domains go offline and non-renewed) and listed them within the website, and now have a better policy for this.
  • ACL Elections were also run by the webmaster, with help from Drago. We inherited his code, installed it, and were able to run the elections smoothly.
  • 2 new resolutions were passed by the ACL Exec Committee and they were added to the list maintained on the Admin Wiki.
  • There were small problems that were a bit difficult to trace in the registration and payment code in the Portal. These are now fixed.
  • There are some communication problems with respect to migrating and updating information in the ACL website. The webmaster needs to be aware that the business manager, secretary and the information officer all have jurisdiction over the work. This was not made clear in the first few tasks, which caused confusion for the webmaster.
  • As time management became an issue, our current webmaster, Joshua Herring, has voluntarily resigned and will no longer serve, as of the end of March. He will be working to close out his duties on the integration between the Portal and the website, in the time remaining. We are currently planning to source for a replacement, without the involvement of the ACL Exec.

We would like place on the record a note of thanks to Josh for his service. Josh's work has incurred a cost of about 2K USD for his time on the projects, that has been paid out of the 10K USD allocated budget for the IO portfolio.