2008Q1 Reports: Anthology

From AdminWiki
Jump to: navigation, search
ACL ANTHOLOGY Report, January 2008
Steven Bird & Min Yen Kan

The ACL Anthology is a digital archive of research papers in
computational linguistics, sponsored by the CL community, and freely
available to all.  It includes the Computational Linguistics journal,
and proceedings of many conferences and workshops including: ACL,
EACL, NAACL, ANLP, TINLAP, COLING, HLT, MUC, and Tipster.  Conference
proceedings are published in the anthology around the same time as the
conference.  CL articles are published in the anthology one year in
arrears (but individual subscribers can access recent issues
electronically via the MIT Press website).

The anthology now contains 14,000 papers (up from 12,500 papers twelve
months ago), along with full-text search.  The materials are now
hosted on the ACL website, at http://aclweb.org/anthology-index/,
thanks to Drago Radev.  Most of the papers are also indexed by
Citeseer and Google Scholar, helping the citation counts of ACL
authors.  The ACM Digital Library creates full metadata for all
anthology materials and registers digital object identifiers for ACL
papers (e.g. http://dx.doi.org/10.3115/1118693.1118695), costing the
ACL $275 annually.  The new AAN ACL Anthology Network website at
Michigan provides detailed citation analysis for the anthology.
Updates to the anthology are announced on the mailing list at
http://groups.google.com/group/acl-anthology

Steven Bird has now stepped down as editor, and has passed on the role
to Min-Yen Kan.  This transition marks the conclusion of the
development phase of the Anthology: (a) materials from the ACL's
hardcopy and microfiche eras are now all digitized; (b) born-digital
materials published in ad hoc formats have been manually converted;
(c) the anthology has been incorporated into the ACL's operation,
including the publications process and web hosting.  The ongoing
maintenance of the anthology involves several challenges: streamlining
the proceedings upload process; incorporating richer bibliographic
metadata as it becomes available via DOI services, and supporting
community initiatives that build on the Anthology.

ONGOING ACTIVITIES

PACLIC PROCEEDINGS: The steering committee of PACLIC -- the Pacific
Asia Conference on Language, Information and Computation -- has
approached the Anthology editor to request that PACLIC proceedings be
included in the Anthology.  This has been an important regional
conference covering language in the Pacific Asian region over the past
twenty years.  Recently, with great help from Professor Harada's team
at Waseda University, all PACLIC proceedings have been digitized, and
posted at http://www.decode.waseda.ac.jp/PACLIC-STEERING/.  Including
these materials would add to the geographical and linguistic diversity
of the Anthology.  The Executive needs to establish the scope of the
Anthology beyond the ACL's own publications.

IJCNLP PROCEEDINGS: The 2005 proceedings were excluded from the ACL
Anthology because of an agreement with Springer.  Once the required
three year period elapses, during 2008, the IJCNLP-05 proceedings can
be incorporated into the Anthology.  Su Jian is the contact person for
organizing this.  IJCNLP-08 proceedings will also be processed into
the anthology at a later date this year, pending the final list of
archived papers from the IJCNLP conference chairs.

HIGHER-QUALITY BIBLIOGRAPHIC METADATA: The ACM Digital Library is
creating high-quality bibliographic metadata for each individual
paper, in conjunction with registering each paper with a DOI.  It
should be possible to extract that metadata and improve the quality of
metadata on the Anthology site (e.g. removing OCR errors in the
spelling of author and paper names).

PUBLICATION INSTRUCTIONS: The instructions for the publication
software need to be updated to cover two further tasks: (i) obtaining
the workshop identifiers from the Anthology editor, and (ii) uploading
the materials to the anthology by FTP.  Conferences and workshops not
held in conjunction with a regular ACL meeting are not automatically
included in the Anthology.  Organizers of such events shound consider
using the ACL publication software and contacting the Anthology editor
to ensure timely incorporation of the proceedings in the Anthology.

SIG RELATED MATERIALS: Min is now working on expanding the scope of
Anthology materials where feasible.  In particular, SIGs are likely to
have their own specialized Anthology pages, featuring links to
materials of relevance or supported by each SIG.  Once this is done,
we hope to expand the archiving of materials to workshops/conference
related to SIGs.

TIMING: Conference and workshop organizers have a variety of opinions
about exactly when proceedings should appear in the Anthology
(e.g. before, during, or after the event).  We recommend that the
Executive establish a standard practice here.

ACM DL: Our ACM Digital Library contact, Bernard Rous, has asked to
receive CD-ROMs of ACL conferences as they are published, so that he
can initiate the process of assigning DOIs.  His address is:
Bernard Rous, Electronic Publishing Program Director,
ACM, 2 Penn Plaza Suite 701, New York NY 10121-0701

TEXT EXTRACTION: There is an ongoing initiative to extract plain text
from the ACL Anthology materials, involving Dragomir Radev, Min-Yen
Kan and others.  Most of the Anthology has been converted, and can be
found at http://wing.comp.nus.edu.sg/~min/dAnth/acl/.  This will
facilitate the application of NLP techniques to our own publications.
In particular, the Linked anthology proposal submitted to the ACL
Exec grassroots initiative plans to create standardized test corpus
for future bibliographic and bibliometric studies, which we expect to
be reported later this year.

TOPICAL INDEXING: The existence of persistent URLs makes it easy for
individuals and special interest groups to set up annotated
bibliographies with pointers to papers in the anthology.  Moreover,
the community's own text categorization techniques ought to be applied
to its own text collection.  The anthology site should link to any
well-curated, comprehensive categorizations of its content, so that
members of the CL community can benefit from them.  The new ACL Wiki
would be a convenient place for members to maintain topical indexes of
ACL papers.

WIKIFIED EDITING: On a more long-term schedule for late this year is
to have the Anthology incorporate edits from the user community. These
edits to metadata would be reviewed by the Anthology editor but such
feedback would be made much easier from the context of the users
themselves.