Difference between revisions of "2013Q3 Reports: ACL Anthology"

From Admin Wiki
Jump to navigation Jump to search
Line 6: Line 6:
 
[ [[2009Q3_Reports:_ACL_Anthology|Link to 2009 Q3 report]] ]
 
[ [[2009Q3_Reports:_ACL_Anthology|Link to 2009 Q3 report]] ]
  
The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all.  Conference proceedings are published in the anthology around the same time as the conference (subject to general/program chairs' discretion). CL articles are published within a few days of publication on the MIT Press website, now that CL is open access.  With TACL going into circulation soon, this venue will also need to be incorporated into the Anthology this coming year.  NAACL is also planning video recordings so how to integrate and archive these other measures will be part of our ongoing work.
+
The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all.   
  
The anthology now contains over 21,900 papers (up from 20,200 articles from six months ago).  
+
This year, TACL articles have started being indexed, and we hope to make the videos from NAACL be also available soon.  We have also started indexing JEP-TALN-RECITAL, a conference series related to CL/NLP in French.
 +
 
 +
The Anthology now contains over 23,000 papers (up from 20,200 articles from a year ago).  
  
 
The new ACL Anthology has gone live in Feb 2012.  Unfortunately, due to a number of maintainer problems, the system has not been live for very long.  It is an important part of the Anthology work to have the new version be stable, up-and-running on a constant basis for 2013.
 
The new ACL Anthology has gone live in Feb 2012.  Unfortunately, due to a number of maintainer problems, the system has not been live for very long.  It is an important part of the Anthology work to have the new version be stable, up-and-running on a constant basis for 2013.
  
We have also gotten the Exec's approval to use a separate domain (aclanthology.info) and hosting company (Amazon EC2) for the service. This service is not yet active.
+
'''Mailing List.''' The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 426 members (up from 363 from a year ago). This is an announcement-only list, where we notify members of newly listed released materials online.
  
With respect to materials, we continue to integrate other CL related venues into the Anthology to increase the prestige of ACL as well as to make the Anthology even more usefulWe have integrated TALN (in French) and RANLP (English) over the last period. 
+
=== Plans ===
 
+
A key thrust this year is in the addition of becoming a DOI assignee as part of the CrossRef publishers' cooperativeThis will allow us to register our own DOIs for publications which will route to the ACL Anthology or TACL pagesCurrently we have an agreement with the ACM to assign DOIs through them, but this costs us pageviews and the opportunity to control where we want the information to go.
'''Mailing List.''' The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 394 members (up from 363 from 6 months ago)This is an announcement-only list, where we notify members of newly listed released materials online.
 
  
=== Plans ===
+
A second thrust will to best handle the other forms of scientific knowledge that we are interested in archiving.  These include software, datasets and video.  The procedures for integrating these with START and the submission process need to be worked out, and the space requirements for these services assessedFor the time being, we will concentrate on videos (as NAACL is making these available).
A key thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come availableSuch an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus.
 
  
A second thrust will to best handle the other forms of scientific knowledge that we are interested in archiving.  These include software, datasets and videoThe procedures for integrating these with START and the submission process need to be worked out, and the space requirements for these services assessed.
+
A third thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come availableSuch an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus.
  
A third part will be whether we want to re-investigate whether to become our own DOI assignee.  Currently we have an agreement with the ACM to assign DOIs through them, but this costs us pageviews and the opportunity to control where we want the information to go.  A big problem with becoming a DOI provider is it adds to our administrative burden and costs money to assign DOIs.
+
The very-much related work in the Information Officer (of which the Anthology is a part) is also available as a [[2013Q1_Reports:_Info_Officer|2013 Q3 report]] here.
  
We plan to work on these other following problems, but which are less urgent:
+
We have long term plans to work on these other following problems, which are less urgent:
  
 
* collaboration with START and aclpub.
 
* collaboration with START and aclpub.

Revision as of 09:12, 12 July 2013

[ Link to 2013 Q1 report ] [ Link to 2012 Q3 report ] [ Link to 2011 Q3 report ] [ Link to 2010 Q3 report ] [ Link to 2009 Q3 report ]

The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all.

This year, TACL articles have started being indexed, and we hope to make the videos from NAACL be also available soon. We have also started indexing JEP-TALN-RECITAL, a conference series related to CL/NLP in French.

The Anthology now contains over 23,000 papers (up from 20,200 articles from a year ago).

The new ACL Anthology has gone live in Feb 2012. Unfortunately, due to a number of maintainer problems, the system has not been live for very long. It is an important part of the Anthology work to have the new version be stable, up-and-running on a constant basis for 2013.

Mailing List. The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 426 members (up from 363 from a year ago). This is an announcement-only list, where we notify members of newly listed released materials online.

Plans

A key thrust this year is in the addition of becoming a DOI assignee as part of the CrossRef publishers' cooperative. This will allow us to register our own DOIs for publications which will route to the ACL Anthology or TACL pages. Currently we have an agreement with the ACM to assign DOIs through them, but this costs us pageviews and the opportunity to control where we want the information to go.

A second thrust will to best handle the other forms of scientific knowledge that we are interested in archiving. These include software, datasets and video. The procedures for integrating these with START and the submission process need to be worked out, and the space requirements for these services assessed. For the time being, we will concentrate on videos (as NAACL is making these available).

A third thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus.

The very-much related work in the Information Officer (of which the Anthology is a part) is also available as a 2013 Q3 report here.

We have long term plans to work on these other following problems, which are less urgent:

  • collaboration with START and aclpub.
  • PDF metadata fixing for all articles. Crucially, Google Scholar uses this information but it is not always correctly generated.
  • One PDF file per article. This is especially problematic for the J79 series, which largely represents one issue per PDF file.
  • Incorporation of TACL accepted articles into the Anthology. Currently one difficulty is that TACL submissions can also appear as a ACL publication. Likely, we will just list both publications as (unlinked) records for now.