2014Q1 Reports: Info Officer

From Admin Wiki
Revision as of 01:16, 14 February 2014 by Knmnyn (talk | contribs)
Jump to navigation Jump to search

[Link to 2013 Q3 Report] [Link to 2013 Q1 Report]

The Information Officer (IO) portfolio includes integration of the different ACL-wide activities that are related to information dissemination; including the Anthology, website, wiki, portal and archive. Plans include provide integration of logins (through OpenID and OAuth; IN PROGRESS); update our information services to be updated and professionally-designed (PLANNED). Long-term goals for the costs of the information services to be sponsored, accessibility and long-term maintenance of the aclweb.org and other sites, and absorbed by corporate interests.

I/O Overview

Update. From 2013 Q3 to now, we have completed updates to our software installations to fix security concerns and updated the website to be responsive (so that it functions somewhat better on mobile devices such as smartphones and tablets).

Budget. A key part of the work done in the IO is to oversee part-time manpower allocated to help improve our association's websites, which includes maintenance, upgrading, migrating and backup. These are not small jobs and require budget. We proposed to the ACL Exec a budget of 10,000 USD (motion here) that was approved to hire a professional webmaster who could do part of the regular work with help from the IO and Secretary positions in ACL.

DOIs/CrossRef. We are now a registered body for assigning DOIs to scholarly materials in CrossRef. This costs USD 275 per year and 1 USD per resource registered for a DOI. Under previous agreement, we decided to start with DOI assignment with our new TACL journal, but it seems that the journal's EIC staff are encumbered by other issues. One important consideration of joining CrossRef and being able to assign DOIs is CrossRef's mandate that all journal articles do outbound citation linking within 18 months of joining CrossRef (see here, rule #6). We will be working with TACL to establish this workflow with TACL first, and then try to propagate this for use in our conference systems (may have to be tied with START development). The EICs have not had the bandwidth to deal with this issue (as it requires some reworking of how cited items are reported in their system).

Given this bottleneck and the unforseeable delay in getting a workflow established, I would suggest that we start with our voluminous conference proceeding data instead. I will start with the annual ACL proceedings and deal with chapter conferences (EACL, NAACL) next.

Collaboration with ELRA. We have also discussed joint work with ELRA's executive officers, Nicoletta Calzolari and Khalid Choukri about information services. As a result, we have a few late-breaking items for discussion.

  • Use of the LRE Map at all ACL conferences. The LRE Map is a project to catalog language resource (LR) use in CL publications, which started with LREC 2010. Since then it has become more popular and a number of ACL sponsored events have been also using this framework. START (our de facto conference management system) has a module that allows authors to express the use of LR resources in publications. ELRA has asked whether it is possible that ACL officially sanction the use of the LRE Map in conferences, rather than having ERLA officials ask conference chairs on a per-case basis. ELRA maintains to keep the LRE Map and associated resources free for all use. A discussion between the Information Office and Conference Office initiated and we concur that such a policy would both benefit the ACL and CL/NLP in general. As such we propose the following motion:

Formal Motion. The ACL exec endorses the optional use of ELRA's LRE Map for use in any ACL event and publication by individual authors and by event chairs.

  • ISLRN. ELRA is also planning to assign a unique identifier to all language resources (akin to a DOI for scholarly materials). ELRA would like ACL to lend its support and backing of such a scheme. The information office feels this will benefit the ACL and CL/NLP community as a whole, so we have backed this plan. We assume no official motion is needed for this at this stage, as there is no standard yet and as it falls under the purview of the Information Office, but if it does need an official motion, we propose the following:

Formal Motion. The ACL exec supports the development and standardization of a Language Resource unique identifier, and is open to work with any association to achieve these aims.

We note that both of ELRA's initiatives already have wide support (LDC, AFNLP, etc.) in the CL/NLP community.

Plans. The hired part-time help for the webmaster position is now working on consolidating the ACL Website and the ACL Portal, and then working on the first major milestone of establishing a central login for ACL services (something akin to a "ACL Account" a la Google or Facebook). We are planning to use OpenID and OAuth, which would allow members to link their ACL account with other (i.e., Google, LinkedIn, Twitter, Microsoft/Hotmail) services; such that one could use login credentials from those services for ACL use. This is ongoing work at the moment. ELRA is also interested in this initiative, and may decide to adopt our system in having a cross organization login system; we'll be keeping them informed of our decisions and may consider their feedback when we deem that it benefits our membership.

Other plans include:

  • Adding a vetting service for bibliographies, so that any ACL paper referencing an ACL publication would have a proper identifier to the work. This is likely to take the industry standard form of the DOI.
  • Investigating the citation indexing of our materials -- it appears that ACL materials are somewhat haphazardly indexed by Elsevier and SCI. We need to address this because many of our membership depend on the availability of indexing in determining whether to publish in our venues or not.

Anthology

(Updated 31 Jul; see text with yellow background; may not be reflected in any hard copy)

The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all.

This year, TACL articles have started being indexed, and we hope to make the videos from NAACL be also available soon. We have also started indexing JEP-TALN-RECITAL, a conference series related to CL/NLP in French.

The Anthology now contains over 23,000 papers (up from 20,200 articles from a year ago).

The new ACL Anthology has gone live in Feb 2012. Unfortunately, due to a number of maintainer problems, the system has not been live for very long. It is an important part of the Anthology work to have the new version be stable, up-and-running on a constant basis for 2013.

Mailing List. The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 426 members (up from 363 from a year ago). This is an announcement-only list, where we notify members of newly listed released materials online.

Plans

A key thrust this year is in the addition of becoming a DOI assignee as part of the CrossRef publishers' cooperative. This will allow us to register our own DOIs for publications which will route to the ACL Anthology or TACL pages (see other report from the Information Office). Currently we have an agreement with the ACM to assign DOIs through them, but this costs us pageviews and the opportunity to control where we want the information to go.

A second thrust will to best handle the other forms of scientific knowledge that we are interested in archiving. These include software, datasets and video. The procedures for integrating these with START and the submission process need to be worked out, and the space requirements for these services assessed. For the time being, we will concentrate on videos (as NAACL is supposed to be making these available).

A third thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus.

The very-much related work in the Information Officer (of which the Anthology is a part) is also available as a 2013 Q3 report here.

We have long term plans to work on these other following problems, which are less urgent:

  • collaboration with START and aclpub (also may involve the Conference Officer's, Jian's, work)
  • PDF metadata fixing for all articles. Crucially, Google Scholar uses this information but it is not always correctly generated.
  • One PDF file per article. This is especially problematic for the J79 series, which largely represents one issue per PDF file.
  • Incorporation of TACL accepted articles into the Anthology. Currently one difficulty is that TACL submissions can also appear as a ACL publication. Likely, we will just list both publications as (unlinked) records for now.

= Late-Breaking Plans

ELRA has contact us (via Nicoletta and Khalid) to ask for some joint initiatives between ACL/ELRA and other sister CL organizations. I report this in the IO report, but one item is relevant to the Anthology office. ELRA has back issues of some of their conference materials and inquired about how we scan and digitize our materials for the Anthology. The bulk of the ACL legacy materials were bulk scanned by Steven Bird prior to 2004 by a third-party, so we only do one-off digitization for now. I may voluntarily assist ELRA with some of their materials as they need. With respect to the items that the IO/CO roles wish to work on with ELRA, this is considered a low-priority for now.


Portal

The ACL Portal was created to provide a web-based platform to house facilities for the benefit of members.

Over the last two years, I have been working part-time in maintaining the Portal. This has generally amounted to minor maintenance tasks - bug fixes, some administrative tools for Priscilla, and so on. The Portal currently serves little function other than maintaining a list of current members and a payment gateway for membership. If these are the only functions that the portal is to serve, then I would recommend integrating it with the main ACL website (see similar comments along these lines from Robert Dale in last year's report). The portal is built on top of the Drupal content management system, as is the proposed replacement for the ACL main page (built on Drupal 7), so integration involves little more than upgrading existing custom modules developed by Ben Phelan for the Portal to Drupal 7.

Website

The ACL website continues to serve as the primary online resource for the organization. It contains the main ACL site, an ACL Wiki which serves as a resource to the general computational linguistics community, an ACL Admin wiki used to store and maintain ACL specific resources such as reports, handbooks, and policies as well as an exec wiki reserved for the use of ACL execs. We also maintain mirrors of individual ACL conference websites, membership email lists for ACL announcements and a listing of resolutions of the ACL Exec Committee.

This year, 5 new resolutions were passed by the ACL Exec Committee and they were added to the list maintained on the Admin Wiki.

As part of the 7 May ACL resolution, we have hired a new webmaster, Joshua Herring, who was previously the developer of the ACL Portal. Work has begun on a replacement site for the current main website. This was undertaken because the current main site is built on quite an old version of Joomla! from which there is no direct upgrade path. The replacement site is being built in Drupal 7 with the goals of giving it a more responsive design and a broader range of content and to integrate with the existing ACL Portal. Current work focuses on creating an integrated login system so that the website, Portal, Anthology, etc. can use a single set of logins and user metadata to enable downstream activities (registration of papers, membership and bibliometrics).

Currently, Josh has worked for about 55 hours on the webmaster portfolio, accruing about USD 850 in expenses, out of the 10K allocated for the IO budget as a whole. Most of this has been on the website maintenance and upgrading (about 2/3rds) and the remainder on the integration/login focus as well as routine duties.