This competition is the first task of the shared tasks introduced in the AAAI-21 Workshop on Scientific Document Understanding. This task aims to identify acronyms (i.e., short-forms) and their meanings (i.e.,long-forms) from the documents. For instance:
Input: Existing methods for learning with noisy labels (LNL) primarily take a loss correction approach.
Output: Existing methods for learning with noisy labels (LNL) primarily take a loss correction approach.
In this example, the acronym is shown in bold font and the long-form is shown with an underline. This task is modeled as a sentence-level sequence labeling problem. Participants are provided with manually labeled training and development datasets consisting of 17,506 sentences extracted from English scientific papers published at arXiv.
Acronym Identification competition has two phases:
To participate, first fill out this form to provide the details of your team: https://rb.gy/m7frwz. To submit the results of your model runs, use the CodaLab participate page. In the submit result page, please make sure to use the same team name you provided in the registration form. For more information on the shared task, check out the shared task GitHub page and the workshop website.
Competition participants are invited to present their work in the poster session of the SDU@AAAI-21 workshop. The winner of the competition will be provided with an oral presentation. In addition, SDU@AAAI-21 strongly encourages the participants to submit their system papers to the workshop. The system papers will appear in the workshop proceedings in the shared task track. For more information on the workshop, please see SDU@AAAI-21 website.
Training and development set release: September, 1, 2020
Test set release: November, 20, 2020
System runs due date: December, 4, 2020
System papers due date: December, 11, 2020
Presentation at SDU@AAAI-21: February, 8 or 9, 2021
The submitted results will be evaluated based on their macro-averaged precision, recall, and F1 scores on the test set computed for correct predictions of short-form (i.e., acronym) and long-form (i.e., phrase) boundaries in the sentences. A short-form or long-form boundary prediction is counted as correct if the beginning and the end of the predicted short-form or long-from boundaries equal to the ground-truth beginning and end of the short-form or long-form boundary, respectively. The official score is the macro average of short-form and long-form prediction F1 score.
The evaluation script is provided on the GitHub page of this competition.
The dataset provided for this competition is licensed under CC BY-NC-SA 4.0 international license, and the evaluation script and the baseline are licensed under MIT license. By accepting the terms and conditions you agree that:
Start: Sept. 1, 2020, midnight
Description: Participants will use the training/development sets to design the models
Start: Nov. 20, 2020, midnight
Description: Participant will submit their model runs on test set
Dec. 5, 2020, 11:59 a.m.
You must be logged in to participate in competitions.Sign In