Difference between revisions of "2012Q3 Reports: ACL Anthology"
(New page: [ Link to 2011 Q3 report ] [ Link to 2010 Q3 report ] [ Link to 2009 Q3 report ] '''A...) |
|||
Line 6: | Line 6: | ||
June 2012, Min-Yen Kan | June 2012, Min-Yen Kan | ||
− | '''EXECUTIVE SUMMARY''': BUG In the past year, we have | + | '''EXECUTIVE SUMMARY''': BUG In the past year, we have now two versions of the ACL Anthology site working in parallel. We have started ingesting attachments to papers (dataset, software, slides) and allowing user errata and revisions to be submitted. Within the next year, we plan to convert fully to the new ACL Anthology within one year, ingest the upcoming debut of the TCL journal into the archives, and work with STARTV2 to have further integration with the publication roles to smooth proceedings chairs' work. |
− | '''INTRO''' | + | '''INTRO''' The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all. Conference proceedings are published in the anthology around the same time as the conference (subject to general/program chairs' discretion). CL articles are published within a few days of publication on the MIT Press website, now that CL is open access. With TCL going into circulation soon, this venue will also need to be incorporated into the Anthology this coming year. |
− | The anthology now contains over | + | The anthology now contains over 20,200 papers (up from 18,000 articles from twelve months ago). |
'''CHANGES OVER LAST 12 MONTHS''': BUG | '''CHANGES OVER LAST 12 MONTHS''': BUG | ||
+ | |||
+ | The new ACL Anthology has gone live in Feb 2012. In this new version, we have finished our past milestones of: | ||
+ | |||
+ | * XML import and export of single, multiple papers and volumes | ||
+ | * BibTeX and other bibliographic formats import and export of single, multiple papers and volumes | ||
+ | * Addition of custom fields (e.g., SIG info, attachment-types) | ||
+ | * Suggestion of corrections to metadata or added fields by public (to be moderated by the Anthology editor) | ||
+ | * EACL 2003 | ||
With the help of Praveen Bysani at NUS, we have completed a new prototype of the ACL Anthology (http://aclanthology.heroku.com) which features faceted navigation, search and an underlying data model. Technically, it is built using Ruby on Rails with a Project Blacklight plug-in and features OAI-PMH integration to allow third parties to ingest and list article metadata from the Anthology, and offsite Lucene indices to allow faceted search. In creating the prototype, we have unified the metadata of all articles in the ACL Anthology, a non-trivial task since the original Anthology metadata was not of uniform quality. Currently, minor changes to the prototype are being done to ensure that the functionality of the current Anthology are all intact in the prototype. Once finished, we will seek the ACL Exec's approval to launch the prototype as our production Anthology, which will need to be hosted by a (commercial) third party. | With the help of Praveen Bysani at NUS, we have completed a new prototype of the ACL Anthology (http://aclanthology.heroku.com) which features faceted navigation, search and an underlying data model. Technically, it is built using Ruby on Rails with a Project Blacklight plug-in and features OAI-PMH integration to allow third parties to ingest and list article metadata from the Anthology, and offsite Lucene indices to allow faceted search. In creating the prototype, we have unified the metadata of all articles in the ACL Anthology, a non-trivial task since the original Anthology metadata was not of uniform quality. Currently, minor changes to the prototype are being done to ensure that the functionality of the current Anthology are all intact in the prototype. Once finished, we will seek the ACL Exec's approval to launch the prototype as our production Anthology, which will need to be hosted by a (commercial) third party. | ||
Line 18: | Line 26: | ||
We have also also finished our work to ensure DBLP and ACM Portal accurately cover the Anthology materials; however, some of these changes may have not yet finalized by the opposing party at DBLP and ACM Portal. With assistance from Praveen Bysani, ACM now has a complete list of proceedings from ACL and should finalize DOI assignments for legacy materials (particularly workshops) this year and provide this information back to the ACL Anthology for our records. | We have also also finished our work to ensure DBLP and ACM Portal accurately cover the Anthology materials; however, some of these changes may have not yet finalized by the opposing party at DBLP and ACM Portal. With assistance from Praveen Bysani, ACM now has a complete list of proceedings from ACL and should finalize DOI assignments for legacy materials (particularly workshops) this year and provide this information back to the ACL Anthology for our records. | ||
− | + | We continue to integrate other CL related venues into the Anthology to increase the prestige of ACL as well as to make the Anthology even more useful. We are incorporating TANL and have completed the incorporation of LREC and PACLIC into the Antology. We expect to incorporate RANLP when they can make their proceedings available to us in the ACL Anthology ingestion format. | |
− | |||
− | |||
'''MAILING LIST''': DONE - The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 363 members (up from 312 from last report). This is an announcement-only list, where we notify members of newly listed released materials online. | '''MAILING LIST''': DONE - The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 363 members (up from 312 from last report). This is an announcement-only list, where we notify members of newly listed released materials online. | ||
− | '''ONGOING WORK''': | + | '''ONGOING WORK''': A key thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | We plan to work on these other following problems, but which are less urgent: | |
− | * | + | * collaboration with START and aclpub |
* PDF metadata fixing for all articles. Crucially, Google Scholar uses this information but it is not always correctly generated. | * PDF metadata fixing for all articles. Crucially, Google Scholar uses this information but it is not always correctly generated. | ||
* One PDF file per article. This is especially problematic for the J79 series, which largely represents one issue per PDF file. | * One PDF file per article. This is especially problematic for the J79 series, which largely represents one issue per PDF file. |
Revision as of 07:56, 22 June 2012
[ Link to 2011 Q3 report ] [ Link to 2010 Q3 report ] [ Link to 2009 Q3 report ]
ACL ANTHOLOGY Report June 2012, Min-Yen Kan
EXECUTIVE SUMMARY: BUG In the past year, we have now two versions of the ACL Anthology site working in parallel. We have started ingesting attachments to papers (dataset, software, slides) and allowing user errata and revisions to be submitted. Within the next year, we plan to convert fully to the new ACL Anthology within one year, ingest the upcoming debut of the TCL journal into the archives, and work with STARTV2 to have further integration with the publication roles to smooth proceedings chairs' work.
INTRO The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all. Conference proceedings are published in the anthology around the same time as the conference (subject to general/program chairs' discretion). CL articles are published within a few days of publication on the MIT Press website, now that CL is open access. With TCL going into circulation soon, this venue will also need to be incorporated into the Anthology this coming year.
The anthology now contains over 20,200 papers (up from 18,000 articles from twelve months ago).
CHANGES OVER LAST 12 MONTHS: BUG
The new ACL Anthology has gone live in Feb 2012. In this new version, we have finished our past milestones of:
- XML import and export of single, multiple papers and volumes
- BibTeX and other bibliographic formats import and export of single, multiple papers and volumes
- Addition of custom fields (e.g., SIG info, attachment-types)
- Suggestion of corrections to metadata or added fields by public (to be moderated by the Anthology editor)
- EACL 2003
With the help of Praveen Bysani at NUS, we have completed a new prototype of the ACL Anthology (http://aclanthology.heroku.com) which features faceted navigation, search and an underlying data model. Technically, it is built using Ruby on Rails with a Project Blacklight plug-in and features OAI-PMH integration to allow third parties to ingest and list article metadata from the Anthology, and offsite Lucene indices to allow faceted search. In creating the prototype, we have unified the metadata of all articles in the ACL Anthology, a non-trivial task since the original Anthology metadata was not of uniform quality. Currently, minor changes to the prototype are being done to ensure that the functionality of the current Anthology are all intact in the prototype. Once finished, we will seek the ACL Exec's approval to launch the prototype as our production Anthology, which will need to be hosted by a (commercial) third party.
We have also also finished our work to ensure DBLP and ACM Portal accurately cover the Anthology materials; however, some of these changes may have not yet finalized by the opposing party at DBLP and ACM Portal. With assistance from Praveen Bysani, ACM now has a complete list of proceedings from ACL and should finalize DOI assignments for legacy materials (particularly workshops) this year and provide this information back to the ACL Anthology for our records.
We continue to integrate other CL related venues into the Anthology to increase the prestige of ACL as well as to make the Anthology even more useful. We are incorporating TANL and have completed the incorporation of LREC and PACLIC into the Antology. We expect to incorporate RANLP when they can make their proceedings available to us in the ACL Anthology ingestion format.
MAILING LIST: DONE - The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 363 members (up from 312 from last report). This is an announcement-only list, where we notify members of newly listed released materials online.
ONGOING WORK: A key thrust for this year will be to incorporate the results of the R50 workshop into the Anthology, and allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus.
We plan to work on these other following problems, but which are less urgent:
- collaboration with START and aclpub
- PDF metadata fixing for all articles. Crucially, Google Scholar uses this information but it is not always correctly generated.
- One PDF file per article. This is especially problematic for the J79 series, which largely represents one issue per PDF file.