Difference between revisions of "2016Q3 Reports: ACL Anthology"

From Admin Wiki
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all. As of 2016, we employ a Creative Commons Attribution license for materials published by ACL. This makes our content usable by the general public with attribution to the ACL (although it is not mandatory for any user to inform us of their use of our materials). Dual licensing for a fee is presumably possible (although not exercised currently).
 
The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all. As of 2016, we employ a Creative Commons Attribution license for materials published by ACL. This makes our content usable by the general public with attribution to the ACL (although it is not mandatory for any user to inform us of their use of our materials). Dual licensing for a fee is presumably possible (although not exercised currently).
  
The Anthology now contains over 37,000 (up from 34,800 papers in the last report in Q3) The new ACL Anthology is now active and will be switched to the primary Anthology site around ACL this year, once production issues (detailed below, are fixed and the solutions judged maintainable).  However, we know a portion of our membership will want to still use the older version, so we are going to maintain both sites at least until the end of 2016.  [[http://www.aclweb.org/anthology/ Older version]]
+
The Anthology now contains over 37,000 (up from 34,800 papers in the last report in Q3) The new ACL Anthology is now active and will be switched to the primary Anthology site around ACL this year, once production issues (detailed below, are fixed and the solutions judged maintainable).  However, we know a portion of our membership will want to still use the older version, so we are going to maintain both sites at least until the end of 2016.  [[http://aclanthology.info/ Current version]] [[http://www.aclweb.org/anthology/ Legacy version]]
  
 
'''Mailing List'''. The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 634 members (up from 533 from a year ago, and 555 from the last report 6 months ago). This is an announcement-only list, where we notify members of newly listed released materials online.
 
'''Mailing List'''. The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 634 members (up from 533 from a year ago, and 555 from the last report 6 months ago). This is an announcement-only list, where we notify members of newly listed released materials online.
Line 7: Line 7:
 
'''Auxiliary Ingestion'''.  The Anthology now has ingestion workflows for software, datasets, general attachments, slides and posters that are hosted with the ACL Anthology.  Hyperlinks to videos are also saved; typically proceedings chairs, will ask videos to be saved to the third party techtalks.tv site, which has thus far, been happy to absorb ACL content.   
 
'''Auxiliary Ingestion'''.  The Anthology now has ingestion workflows for software, datasets, general attachments, slides and posters that are hosted with the ACL Anthology.  Hyperlinks to videos are also saved; typically proceedings chairs, will ask videos to be saved to the third party techtalks.tv site, which has thus far, been happy to absorb ACL content.   
  
'''Digital Object Identifiers'''.  We have assigned DOIs to all ACL materials in 2015 and are currently assigning ones to 2016 (NAACL and associated workshops).  With our current practice of assigning DOIs to all materials, our costs are likely to escalate to at least US$ 2K as we digitally publish at least this amount of scholarly articles.
+
'''Digital Object Identifiers'''.  We have assigned DOIs to all ACL materials in 2015 and are currently assigning ones to 2016 (NAACL, SemEval, and associated workshops).  With our current practice of assigning DOIs to all materials, our costs are likely to escalate to at least US$ 2K as we digitally publish at least this amount of scholarly articles.
  
 
'''ACL Anthology Reference Corpus version 2 (ACL ARC 2)'''. We have released a newer updated version of the ACL Anthology Reference Corpus, which standardizes a release of the scholarly articles in the Anthology, alongside a machine readable processed version using open-source software derived from Min-Yen Kan's research group.  The release was made in 1 March 2016, and distribution is currently solely by the ACL ARC website (http://acl-arc.comp.nus.edu.sg), but distribution by LDC may not be possible, as LDC now needs certain re-distribution rights that conflict with the earlier CC licensing.   
 
'''ACL Anthology Reference Corpus version 2 (ACL ARC 2)'''. We have released a newer updated version of the ACL Anthology Reference Corpus, which standardizes a release of the scholarly articles in the Anthology, alongside a machine readable processed version using open-source software derived from Min-Yen Kan's research group.  The release was made in 1 March 2016, and distribution is currently solely by the ACL ARC website (http://acl-arc.comp.nus.edu.sg), but distribution by LDC may not be possible, as LDC now needs certain re-distribution rights that conflict with the earlier CC licensing.   
  
'''Anthology Steering Committee'''.  We recognise that the ACL Anthology has become a significant asset for the ACL, manifesting its central role in the NLP/CL research communities. It is of too much import to have a single editor be responsible for the policymaking of the Anthology. The Exec approved the creation of the Anthology Steering Committee (ASC) to provide oversight for the Anthology.  The ASC will meet on 14 Jul 2016, before the ACL Exec meeting to discuss the Anthology's management, and currently consists of Jing-Shin Chang, Min-Yen Kan and Paola Merlo.
+
'''Work Queue'''.  The current state of ingestion and development of the ACL Anthology is publicly available on the ACL Anthology's footer.  https://docs.google.com/spreadsheets/d/166W-eIJX2rzCACbjpQYOaruJda7bTZrY7MBw_oa7B2E/pubhtml
  
== Plans / To be discussed ==
+
'''Anthology Steering Committee'''.  We recognise that the ACL Anthology has become a significant asset for the ACL, manifesting its central role in the NLP/CL research communities. It is of too much import to have a single editor be responsible for the policymaking of the Anthology. The Exec approved the creation of the Anthology Steering Committee (ASC) to provide oversight for the Anthology.  The ASC currently consists of Jing-Shin Chang, Min-Yen Kan and Paola Merlo.  The ASC met virtually on 14 Jul 2016, before the ACL Exec meeting to discuss the Anthology's management, and discussed the subsequent section on '''Plans''', such that the Anthology Editor could get consensus priority for the matters below.  The ASC also discussed supplementary materials in the ACL Anthology and other authority networks (for papers, authors, etc.) that may be being used, proposed by other institutions (e.g., MIT Press for CL journal), and revisions and better placement of the contributor's instructions for new material Anthology ingestion.
  
# While the new Anthology is live, it lives on a university virtual machine in Singapore, and will not likely scale to provide adequate bandwidth when faced with the full access from the ACL membership and general public. We are investigating which service to take our work towards as it likely requires a virtual private server (VPS) account, also costly, as we need to install certain software and libraries that usually requires root privileges. We hope to work this migration soon.  We will ask the ASC for any advice in this area for a reliable  
+
== Plans, Prioritized ==
# Min-Yen Kan, current Anthology Editor, will relinquish editorship of the ACL Anthology in 2018.  It is time to begin searching for qualified individuals to be nominated or self-nominated to fulfil this important, voluntary role to ensure that service to the community will not be interrupted, while enjoying the benefits of having new leadership rejuvenate the Anthology with new ideas.  We will ask the ASC to begin a search, either directly or through secondary parties and search.
+
 
# For upcoming 2017 conferences, we hope to work to include the hyperlinked DOIs with the bibliographic reference strings in bibliography sections in every ACL conference publication.  We plan to pilot this in ACL 2017. 
+
# While the new Anthology is live, it lives on a university virtual machine in Singapore, and will not likely scale to provide adequate bandwidth when faced with the full access from the ACL membership and general public. We are investigating which service to take our work towards as it likely requires a virtual private server (VPS) account, also costly, as we need to install certain software and libraries that usually requires root privileges. We hope to work this migration soon.  The ASC also asked us to investigate other cloud services such as Amazon's EC2.  
# A continuing thrust will be to allow third-party applications to automatically annotate articles with new metadata and papers in the Anthology, as they come available. Such an API will raise the visibility of the Anthology as a object of study, complementing our earlier work to make the Anthology's text a corpus.
+
# Min-Yen Kan, current Anthology Editor, will relinquish editorship of the ACL Anthology in 2018.  It is time to begin searching for qualified individuals to fulfil this important, voluntary role to ensure that service to the community will not be interrupted, while enjoying the benefits of having new leadership rejuvenate the Anthology with new ideas.  The ASC concurs that this process needs to start, and recommends that the ACL Exec come up with a selection process, adding that the existing Nominating Committee could be asked to help with the process, but that the NC should be enlarged to add at least the current Anthology Editor, and/or a member that could advise on the technical expertise of candidates.  The ASC additionally recommends that the NC:
# We have long term plans to work on these other following issues which are smaller in scope than the above major thrusts:
+
#* Insist that prospective Editors have access to volunteers (i.e., students) who have the technical ability to help with the infrastructural maintenance work.
#* A previous discussion (with Ken Church) proposed that we create a single bibtex file for all Anthology materials. The beta Anthology can generate such information fairly easily with its database backing; we plan to have this file available during the ACL 2016 conference.
+
#* Conduct an open call for (self-) nominations that might dovetail with a call for general volunteers.
#*To create a XML representation of all of the metadata that is used to create the Anthology site.
+
# A previous discussion (with Ken Church) proposed that we create a single BibTeX file for all Anthology materials. The beta Anthology can generate such information fairly easily with its database backing; we plan to have this file available during the ACL 2016 conference.
#* [low priority] collaboration with START and aclpub (also may involve the Conference Officer's work) to integrate users of their system and to obtain LaTeX and abstracts for indexing and preservation.
+
# By ACL 2017, we hope to include the hyperlinked DOIs with the bibliographic reference strings in bibliography sections in every ACL conference publication.
#* [low priority] collaboration with ELRA with respect to use of the LRE Map and ISLRNs, and voluntarily helping them with scanning backlog archives into a digital form.
+
# To add abstracts to the indexable materials with the Anthology, that is indexed and submitted by contributors.
 +
# For long-term preservation, to create a XML representation of all of the metadata used to create the Anthology.  This is similar in nature to the XML dump of DBLP or Wikipedia.  It allows a clean separation of the underlying data in the Anthology from the code used to present it.
 +
# Collaboration with START (also may involve the Conference Officer's work) to integrate user accounts in their system.  This would allow START to have authority records for authors such that new paper submissions might start with correct, canonical forms of author names.  The ASC is aware of ORCIDs and other name authority systems that might also be useful in this process.
 +
# Collaboration with ELRA to allow the categorization of papers against the LRE Map and ISLRNs.
 +
# To allow third-party applications to automatically annotate articles with new metadata on existing papers via an API.  Such an API is a production API, allowing third-parties to add auto-analyzed materials to the Anthology (e.g., auto-extracted keywords, summaries).  This will raise the visibility of the Anthology as a object of study, complementing work on the ACL ARC.

Latest revision as of 11:10, 15 July 2016

The ACL Anthology is a digital archive of research papers in computational linguistics, sponsored by the CL community, and freely available to all. As of 2016, we employ a Creative Commons Attribution license for materials published by ACL. This makes our content usable by the general public with attribution to the ACL (although it is not mandatory for any user to inform us of their use of our materials). Dual licensing for a fee is presumably possible (although not exercised currently).

The Anthology now contains over 37,000 (up from 34,800 papers in the last report in Q3) The new ACL Anthology is now active and will be switched to the primary Anthology site around ACL this year, once production issues (detailed below, are fixed and the solutions judged maintainable). However, we know a portion of our membership will want to still use the older version, so we are going to maintain both sites at least until the end of 2016. [Current version] [Legacy version]

Mailing List. The Anthology mailing list's (http://groups.google.com/group/acl-anthology) membership pool has grown, now consisting of 634 members (up from 533 from a year ago, and 555 from the last report 6 months ago). This is an announcement-only list, where we notify members of newly listed released materials online.

Auxiliary Ingestion. The Anthology now has ingestion workflows for software, datasets, general attachments, slides and posters that are hosted with the ACL Anthology. Hyperlinks to videos are also saved; typically proceedings chairs, will ask videos to be saved to the third party techtalks.tv site, which has thus far, been happy to absorb ACL content.

Digital Object Identifiers. We have assigned DOIs to all ACL materials in 2015 and are currently assigning ones to 2016 (NAACL, SemEval, and associated workshops). With our current practice of assigning DOIs to all materials, our costs are likely to escalate to at least US$ 2K as we digitally publish at least this amount of scholarly articles.

ACL Anthology Reference Corpus version 2 (ACL ARC 2). We have released a newer updated version of the ACL Anthology Reference Corpus, which standardizes a release of the scholarly articles in the Anthology, alongside a machine readable processed version using open-source software derived from Min-Yen Kan's research group. The release was made in 1 March 2016, and distribution is currently solely by the ACL ARC website (http://acl-arc.comp.nus.edu.sg), but distribution by LDC may not be possible, as LDC now needs certain re-distribution rights that conflict with the earlier CC licensing.

Work Queue. The current state of ingestion and development of the ACL Anthology is publicly available on the ACL Anthology's footer. https://docs.google.com/spreadsheets/d/166W-eIJX2rzCACbjpQYOaruJda7bTZrY7MBw_oa7B2E/pubhtml

Anthology Steering Committee. We recognise that the ACL Anthology has become a significant asset for the ACL, manifesting its central role in the NLP/CL research communities. It is of too much import to have a single editor be responsible for the policymaking of the Anthology. The Exec approved the creation of the Anthology Steering Committee (ASC) to provide oversight for the Anthology. The ASC currently consists of Jing-Shin Chang, Min-Yen Kan and Paola Merlo. The ASC met virtually on 14 Jul 2016, before the ACL Exec meeting to discuss the Anthology's management, and discussed the subsequent section on Plans, such that the Anthology Editor could get consensus priority for the matters below. The ASC also discussed supplementary materials in the ACL Anthology and other authority networks (for papers, authors, etc.) that may be being used, proposed by other institutions (e.g., MIT Press for CL journal), and revisions and better placement of the contributor's instructions for new material Anthology ingestion.

Plans, Prioritized

  1. While the new Anthology is live, it lives on a university virtual machine in Singapore, and will not likely scale to provide adequate bandwidth when faced with the full access from the ACL membership and general public. We are investigating which service to take our work towards as it likely requires a virtual private server (VPS) account, also costly, as we need to install certain software and libraries that usually requires root privileges. We hope to work this migration soon. The ASC also asked us to investigate other cloud services such as Amazon's EC2.
  2. Min-Yen Kan, current Anthology Editor, will relinquish editorship of the ACL Anthology in 2018. It is time to begin searching for qualified individuals to fulfil this important, voluntary role to ensure that service to the community will not be interrupted, while enjoying the benefits of having new leadership rejuvenate the Anthology with new ideas. The ASC concurs that this process needs to start, and recommends that the ACL Exec come up with a selection process, adding that the existing Nominating Committee could be asked to help with the process, but that the NC should be enlarged to add at least the current Anthology Editor, and/or a member that could advise on the technical expertise of candidates. The ASC additionally recommends that the NC:
    • Insist that prospective Editors have access to volunteers (i.e., students) who have the technical ability to help with the infrastructural maintenance work.
    • Conduct an open call for (self-) nominations that might dovetail with a call for general volunteers.
  3. A previous discussion (with Ken Church) proposed that we create a single BibTeX file for all Anthology materials. The beta Anthology can generate such information fairly easily with its database backing; we plan to have this file available during the ACL 2016 conference.
  4. By ACL 2017, we hope to include the hyperlinked DOIs with the bibliographic reference strings in bibliography sections in every ACL conference publication.
  5. To add abstracts to the indexable materials with the Anthology, that is indexed and submitted by contributors.
  6. For long-term preservation, to create a XML representation of all of the metadata used to create the Anthology. This is similar in nature to the XML dump of DBLP or Wikipedia. It allows a clean separation of the underlying data in the Anthology from the code used to present it.
  7. Collaboration with START (also may involve the Conference Officer's work) to integrate user accounts in their system. This would allow START to have authority records for authors such that new paper submissions might start with correct, canonical forms of author names. The ASC is aware of ORCIDs and other name authority systems that might also be useful in this process.
  8. Collaboration with ELRA to allow the categorization of papers against the LRE Map and ISLRNs.
  9. To allow third-party applications to automatically annotate articles with new metadata on existing papers via an API. Such an API is a production API, allowing third-parties to add auto-analyzed materials to the Anthology (e.g., auto-extracted keywords, summaries). This will raise the visibility of the Anthology as a object of study, complementing work on the ACL ARC.