2013Q1 Reports: Info Officer
Preface: As the inaugural Information Officer (IO) executive member at-large, it is not yet clear what the scope of my duties are. As such, I will attempt to define them, while reporting and planning in this document. This is written without prior knowledge of the ACL Exec's structure, so please forgive me if I overstep my bounds as to my presumed responsibilities.
The IO portfolio includes integration of the different ACL-wide activities that are related to information dissemination; including the Anthology, website, wiki, portal and archive. Plans include short-term goals to provide integration of back-end resources (mailing lists), updating of software installations; mid-term goals of central-sign in for ACL services; and long-term goals for the costs of the information services to be sponsored and absorbed by corporate interests.
As relayed by Drago and Jian, I understand that the IO's duties are to look after the integration of the different ACL-wide (exclusive of SIGs) activities that are related to information dissemination. These include the Anthology, journals, website, and archive. Each of these individual information services are served by a different coordinator, so an understanding of the individual components and being able to help with coordination and delegation to address issues.
Currently, I'm largely unaware of the pressing points with respect to information services, so I hope the teleconference will help bring out a list of points that individual Exec member have noted that are problematic about the information services. I hope to address some of these easier points within the next 2-3 months and them turn attention to 1) harder problems on the list and 2) forward planning to address longer-term information services.
The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all. Conference proceedings are published in the anthology around the same time as the conference (subject to general/program chairs' discretion). CL articles are published within a few days of publication on the MIT Press website, now that CL is open access. With TACL going into circulation soon, this venue will also need to be incorporated into the Anthology this coming year. NAACL is also planning video recordings so how to integrate and archive these other measures will be part of our ongoing work.
The anthology now contains over 21,900 papers (up from 20,200 articles from six months ago).
The new ACL Anthology has gone live in Feb 2012. Unfortunately, due to a number of maintainer problems, the system has not been live for very long. It is an important part of the Anthology work to have the new version be stable, up-and-running on a constant basis for 2013.
We have also gotten the Exec's approval to use a separate domain (aclanthology.info) and hosting company (Amazon EC2) for the service. This service is not yet active.
With respect to materials, we continue to integrate other CL related venues into the Anthology to increase the prestige of ACL as well as to make the Anthology even more useful. We have integrated TALN (in French) and RANLP (English) over the last period.
Mailing List. The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 394 members (up from 363 from 6 months ago). This is an announcement-only list, where we notify members of newly listed released materials online.
A key thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus.
A second thrust will to best handle the other forms of scientific knowledge that we are interested in archiving. These include software, datasets and video. The procedures for integrating these with START and the submission process need to be worked out, and the space requirements for these services assessed.
A third part will be whether we want to re-investigate whether to become our own DOI assignee. Currently we have an agreement with the ACM to assign DOIs through them, but this costs us pageviews and the opportunity to control where we want the information to go. A big problem with becoming a DOI provider is it adds to our administrative burden and costs money to assign DOIs.
We plan to work on these other following problems, but which are less urgent:
- collaboration with START and aclpub.
- PDF metadata fixing for all articles. Crucially, Google Scholar uses this information but it is not always correctly generated.
- One PDF file per article. This is especially problematic for the J79 series, which largely represents one issue per PDF file.
- Incorporation of TACL accepted articles into the Anthology. Currently one difficulty is that TACL submissions can also appear as a ACL publication. Likely, we will just list both publications as (unlinked) records for now.
The website duties are governed by our webmaster, which looks after aclweb.org, domain registration (on behalf of the business office), electronic mailing lists, Wiki, ExecWiki and AdminWiki.
A current difficulty is that our former webmaster (for aclweb.org) has resigned. Ali Hakim (through Drago) has been working as Webmaster for over 5 years and resigned for personal reasons. Let us extend our sincere thanks to him on the record for fixing and updating the website with minimal delays.
We are currently sourcing for a new webmaster through Drago's solicitations and so far only one member (Baoli Li from China) has shown interest. I'm currently assessing him, through some technical challenges regarding Wikis, HTML/CSS and PHP. An important responsibility of the webmaster is to reply and act in a timely manner. If you have suggestions or would like to nominate someone, please let me know, as personal connections usually work better than open calls for help. This is a paid position (as I understand it is 20 USD per hour).
Action items. Hire a new webmaster.
Joomla (Main Website)
Our main ACL website is run under Joomla CMS version 1.x. This is an obsolete version of Joomla that should be upgraded as soon as possible. The stable version is 2.5, but the development version 3.x offers responsive design (i.e., useful for tablets and phones). 3.x should be stable by Joomla's own prediction of mid 2013.
Unfortunately, upgrading is not a simple matter, requiring careful back-up and installation. It will be time-intensive and should be delegated to the incoming (paid) webmaster.
Action items. Through the new webmaster, back up our current installation and upgrade Joomla to 2.5, and later, a stable version of 3.x.
(This should actually probably be a separate report (as it was in the past), but I have incorporated it here. Apologies for any territorial trespassing)
Robert Dale runs the ACL Portal. It is run, as far as I know, to provide a member-updatable database (to help the business office) and to help give members some benefits for being ACL members (since Computational Linguistics is open access, there is little else that is precious to our membership. As of 2012 Q3, it was being run with part-time assistance of Josh Herring. Robert recommends that we either downscale it or upgrade it to a larger system. It depends on what the membership and exec envision. Currently the portal is at http://www.aclweb.org/portal/ and serves as a membership portal (events management, paying dues, etc.).
Action items. Decide what we want to do with the portal and either promote or integrate this better with the website (Joomla) or Anthology.
Peter Turney runs our Wiki systems. They are running old versions of MediaWiki (same software which is used to run Wikipedia). It is stable but outdated.
Action Item. We need to first hire the webmaster and then have he/she upgrade the Wiki system to a more modern version. This is one of the first action items for the webmaster to perform. Peter Turney, who runs the general Wiki also notes that the general Wiki has spam account creation problems. We should also have the webmaster take care of this. This hurdle seems more urgent than the Joomla upgrades.
aclweb.org is hosted by 1and1.com. We are under a developer package that has 300 GB of space and 5 included domains and a quota of 100 MySQL databases (currently we use 12).
Assessment of webhost. My current feeling is that 1and1 is a satisfactory web host. It is certainly not very competitive with respect to features but it is stable and hasn't been affected much by outages at other cloud services (e.g., Amazon). I believe we are not spending a lot on web hosting so changing this for pricing reasons would be counterproductive, as there is a lot of software and data incorporated into the system. For other systems and pilot projects that require more features, we'll have to look elsewhere for hosting.
Action Item. The credit card on file can no longer be accepted by 1and1. We need to change the card details before the next billing period (23 March 2013). I currently do not know the price for the package.
Forward Plans. As noted in the assessment, we should not be changing our webhost. We do need to look after our disk quota as space will start to be a concern if we start hosting archival quality videos on our server. We may get the incoming webmaster to assess how easy it would be to migrate from this webhost in case of problems or the need to set up a backup solution.
ACL hosts domains as part of our webhosting account with 1and1.com and in other separate services.
This is our main website, under two separate domains. Note that this is separate from our web hosting; this just secures our right to point aclweb.org to a particular IP address. Domain registration and renewal for five domains (3 used so far) is part of our hosting contract. This means that the domain renewal is done by the webhost in a just-in-time manner.
Domain ID:D3240355-LROR Domain Name:ACLWEB.ORG Created On:24-Mar-1997 05:00:00 UTC Last Updated On:26-Mar-2012 01:22:37 UTC Expiration Date:25-Mar-2013 05:00:00 UTC
Domain ID:D104923136-LROR Domain Name:ASSOCIATIONFORCOMPUTATIONALLINGUISTICS.ORG Created On:23-Sep-2004 17:42:23 UTC Last Updated On:24-Sep-2012 01:31:23 UTC Expiration Date:23-Sep-2013 17:42:23 UTC
This are other domains that we hold for reasons unknown to me. Apparently redirects to the main website. No reason to let it go until we need to register a new name.
Domain ID:D104923137-LROR Domain Name:ACL2005.ORG Created On:23-Sep-2004 17:42:24 UTC Last Updated On:24-Sep-2012 01:31:26 UTC Expiration Date:23-Sep-2013 17:42:24 UTC
This is the domain that the ACL Anthology will be migrating to over the next few years. This was registered separately by Min. As our webhost is a bit underpowered with respect to programming capabilities, this points to a Amazon EC2 instance that is only used for the pilot Anthology beta currently (not charged).
Domain ID:D47707006-LRMS Domain Name:ACLANTHOLOGY.INFO Created On:06-Sep-2012 15:49:02 UTC Last Updated On:05-Nov-2012 20:30:17 UTC Expiration Date:06-Sep-2022 15:49:02 UTC
We also have a block of domains registered that don't direct to our aclweb.org webhost. As I understand, they are redirected to the local hosts of the respective conference.
acl2010.org ADNS Services June 04, 2018 acl2011.org ADNS Services June 04, 2018 acl2012.org ns2.cafe24.com ns1.cafe24.com June 04, 2018 acl2013.org ADNS Services June 04, 2018 acl2014.org Under Construction Page June 04, 2018 acl2015.org Under Construction Page June 04, 2018
Action Items. Business office needs to update the credit card information at networksolutions.com
We keep organizational mailing lists as part of the secretary's post. However, our webhosting package allows for the construction of mailing lists. We may want to assess whether we'd like to offload this responsibility to the webmaster, either by using the hosting package or by using a public free mail system (Yahoo! mail, Google Groups). I believe Drago is currently ok with the system he is using to manage the multiple sources for mailing lists.
Future Plans - IMPORTANT FOR ACL EXEC CONFERENCE CALL
This is an area where I need more feedback from the ACL Exec about what we'd like to see as an organization. I'll briefly outline my two most urgent perspectives (aside from the maintenance problems aforementioned).
Single-sign On. We can utilize OpenID and other technologies (ORCID, meant for uniquely identifying research authors) to allow our members to use their Google, FaceBook, LinkedIn or other account to login to the portal, anthology (and eventually START). Having a single convenient method for sign-on will hopefully boost uptake and continued servicing of member accounts.
Sponsorship. A long term goal that I'd like to introduce is to get much of the costs of the IO portfolio to be self-sustaining. This means that the corporate interests in CL/NLP help to defray the costs of not only conferences but the Anthology, its sister projects and the website as well. A coordinated plan will involve our rotating conferences (for example, the conferences could contribute some funds for the running of the website and Anthology on a per-article basis).
Drago also maintained a wish list of things related to information services. I've tried to format his list to cluster several parts together. I confess I don't completely understand what all of the topics mean. I hope to be able to review these with all of you in the conference call, so that we can prioritize what are the issues that need to be addressed.
- links to Google Scholar, ArnetMiner, ACL Author Network
- similar people to me (using ACL Anthology)
- software to comment on papers
- video recordings
- past tutorials and workshops
- showcasing IE/Bib software - (long term)
- taxonomy of NLP topics - (long term)
- SIG membership
- volunteering intentions
- reviewing availability DB (Min: would this really be used by members and kept updated?)
- integration with TACL (Min: ?)
- integration with conf. content (Min: ?)
- all ACL rules, etc.
- submissions dates (Min: for conferences?)
- opt out (Min: ?)
- ACL fellows (Min: ?)
- elections (Min: ?)
- ACL answers (e.g. StackExchange like site) - (long term)
Funding for the IO responsibilities: While the webmaster / portal are paid positions, we don't have any staff that are paid for the development of the Anthology. I think we should consider adding this to the Portal or Webmaster responsibility (to be directed by the IO). Information services cost the ACL a small but constant amount of funding. But further development by grassroot initiatives (AAN, Searchbench, Saffron) need to be funded to have long-term sustainability for our membership. We could seek corporate sponsorship of the Anthology's maintenance, if the Exec feel that this is appropriate way to lessen costs (we have started to ask whether certain commercial entities who traditionally sponsor ACL events would be interested).
The ACL Exec should consider two proposals about important, long-term publishing properties: DOI and long term archiving. The first is more important. Currently Digital Object Identifiers (DOIs) are assigned to our materials by the ACM, under the agreement of our ACL-ACM agreement negotiated previously by Steven Bird. The prior agreement gives ACM the right to register DOIs for our materials (and thus the right for the DOIs to resolve to the ACM Portal) in exchange for paying the necessary fees (US $1 per article/publication). With TACL coming up as a journal, it will be important for ACL to establish itself as a DOI assigner. Taking charge of our own DOIs also will allow us to print the DOIs on proceedings before being published. The downside is the cost. To assign DOIs to scholarly articles, the standard registration agency is CrossRef. Their annual fee is $275, and an additional $1 per registered item. Since ACL and its associated workshops generate about 2000 proceedings per year, it would cost between 2-3K per year to allow us to assign DOIs. As ACL Anthology Editor, I think the presence of TACL may better motivate our own need to register as a DOI provider with CrossRef (note: I have briefly corresponded with Michael and Dekang about TACL registering DOIs, they are interested but not aware of the process or costs, to my understanding).
For long term archiving, the CLOCKSS initiative is a non-profit organization that keeps copies of digital materials for archiving in case a publisher goes away. This is less important since the Anthology was conceived for just this purpose. Currently, I would not recommend subscription for this. Costs for this service currently are US $200 per year and $0.25 per object to be archived. This would amount to about $1K per year. I can investigate this further if needed.