Difference between revisions of "2013Q3 Reports: ACL Anthology"
Line 23: | Line 23: | ||
A third thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus. | A third thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus. | ||
− | The very-much related work in the Information Officer (of which the Anthology is a part) is also available as a [[ | + | The very-much related work in the Information Officer (of which the Anthology is a part) is also available as a [[2013Q3_Reports:_Info_Officer|2013 Q3 report]] here. |
We have long term plans to work on these other following problems, which are less urgent: | We have long term plans to work on these other following problems, which are less urgent: |
Revision as of 08:12, 12 July 2013
[ Link to 2013 Q1 report ] [ Link to 2012 Q3 report ] [ Link to 2011 Q3 report ] [ Link to 2010 Q3 report ] [ Link to 2009 Q3 report ]
The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all.
This year, TACL articles have started being indexed, and we hope to make the videos from NAACL be also available soon. We have also started indexing JEP-TALN-RECITAL, a conference series related to CL/NLP in French.
The Anthology now contains over 23,000 papers (up from 20,200 articles from a year ago).
The new ACL Anthology has gone live in Feb 2012. Unfortunately, due to a number of maintainer problems, the system has not been live for very long. It is an important part of the Anthology work to have the new version be stable, up-and-running on a constant basis for 2013.
Mailing List. The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 426 members (up from 363 from a year ago). This is an announcement-only list, where we notify members of newly listed released materials online.
Plans
A key thrust this year is in the addition of becoming a DOI assignee as part of the CrossRef publishers' cooperative. This will allow us to register our own DOIs for publications which will route to the ACL Anthology or TACL pages. Currently we have an agreement with the ACM to assign DOIs through them, but this costs us pageviews and the opportunity to control where we want the information to go.
A second thrust will to best handle the other forms of scientific knowledge that we are interested in archiving. These include software, datasets and video. The procedures for integrating these with START and the submission process need to be worked out, and the space requirements for these services assessed. For the time being, we will concentrate on videos (as NAACL is making these available).
A third thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus.
The very-much related work in the Information Officer (of which the Anthology is a part) is also available as a 2013 Q3 report here.
We have long term plans to work on these other following problems, which are less urgent:
- collaboration with START and aclpub.
- PDF metadata fixing for all articles. Crucially, Google Scholar uses this information but it is not always correctly generated.
- One PDF file per article. This is especially problematic for the J79 series, which largely represents one issue per PDF file.
- Incorporation of TACL accepted articles into the Anthology. Currently one difficulty is that TACL submissions can also appear as a ACL publication. Likely, we will just list both publications as (unlinked) records for now.