[BioCreative VII] Track 5 - LitCovid track Multi-label topic classification for COVID-19 literature annotation

Event Notification Type: 
Call for Participation
Abbreviated Title: 
LitCovid shared task
Contact Email: 
Contact: 
Qingyu Chen
Submission Deadline: 
Sunday, 12 September 2021

Introduction
BioCreative (http://www.biocreative.org) is the first and longest-running community-wide effort for assessing text mining and information extraction systems applied to the biological domain since 2003.

We are organizing the LitCovid track specifically focusing on COVID-19 literature annotation.

The rapid growth of biomedical literature poses a significant challenge for manual curation and interpretation. This challenge has become more evident during the COVID-19 pandemic: the number of COVID-19-related articles in the literature is growing by about 10,000 articles per month. LitCovid, a literature database of COVID-19-related papers in PubMed, has accumulated more than 100,000 articles, with millions of accesses each month by users worldwide. LitCovid is updated daily, and this rapid growth significantly increases the burden of manual curation. In particular, annotating each article with up to eight possible topics, e.g., Treatment and Diagnosis, has been a bottleneck in the LitCovid curation pipeline.

This track calls for a community effort to tackle automated topic annotation for COVID-19 literature. Topic annotation in LitCovid is a standard multi-label classification task that assigns one or more labels to each article. These topics have been demonstrated to be effective for information retrieval and have been used in many downstream applications related to LitCovid. However, annotating these topics has been a primary bottleneck for manual curation. Increasing the accuracy of automated topic prediction in COVID-19-related literature would be a timely improvement beneficial to curators and researchers worldwide.

Registration

Please follow https://docs.google.com/forms/d/e/1FAIpQLScdMnKFMncL8qDkcRx6aV6lYRm8Pbuf... to register the track.

Training and development datasets

The training and development datasets contain the publicly-available text of over 30 thousand COVID-19-related articles and their metadata (e.g., title, abstract, journal). Articles in both datasets have been manually reviewed and articles annotated by in-house models

Evaluation dataset

Same as the training and development datasets, the evaluation dataset contains the articles that have been manually reviewed. Participants will return predictions for the entire set. Submissions will be evaluated using both label-based and instance-based metrics that are commonly applied for multi-label classification. Evaluation scripts will be provided.

Webinar:
We had a webinar for the track from 9 am to 10 am EST, 22nd July 2021. The slides and video are provided via can be accessed via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.

Dial by your location
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 646 558 8656 US (New York)
+1 669 900 9128 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 346 248 7799 US (Houston)

Important dates:

-Training and development set release: 15th June. The datasets can be accessed via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.
-Evaluation script: 15th July. The evaluation script can be accessed via https://github.com/ncbi/biocreative_litcovid.
-Test set release: 22th August. The test set can be accessed via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.
-Test set prediction submission instructions: 2nd September. The instructions are summarized in https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/BC7-LitCovid-Re... (under the Submission instructions section).
-Test set prediction submission due: 12th September.
-Test set evaluation returned to participants: early September
-Short technical systems description paper due: mid September
-Paper acceptance and review returned: late September

Task organizers:
-Qingyu Chen, National Library of Medicine
-Alexis Allot, National Library of Medicine
-Rezarta Islamaj, National Library of Medicine
-Robert Leaman, National Library of Medicine
-Zhiyong Lu, National Library of Medicine

Contact
Please contact qingyu.chen [at] nih.gov with the subject heading "BioCreative Track 5 LitCovid questions" if you have any questions

More information
-BioCreative VII Track 5 (https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-5/) provides detailed information on the track (such as registrations, timelines, and FAQ)
-BioCreative VII (https://biocreative.bioinformatics.udel.edu/) provides the general information of all the tracks.

References
[1] Chen Q, Allot A, Lu Z. Keep up with the latest coronavirus research (https://www.nature.com/articles/d41586-020-00694-1). Nature. 2020 Mar;579(7798):193-193.
[2] Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature (https://academic.oup.com/nar/article/49/D1/D1534/5964074). Nucleic Acids Research. 2021 Jan 8;49(D1):D1534-40.