We have migrated the competition to a new Codalab page in order to handle some bugs with the leaderboard in advance of our evaluation period. This page will remain available for your records, but please make new submissions on the new competition page. You will not be able to make submissions during the formal evaluation page on this page.
Welcome! Definition extraction has been a popular topic in NLP research for well more than a decade, but has been historically limited to well defined, structured, and narrow conditions. In reality, natural language is complicated, and complicated data requires both complex solutions and data that reflects that reality. The DEFT corpus expands on these cases to include term-definition pairs that cross sentence boundaries, lack explicit definitors, or definition-like verb phrases (e.g. is, means, is defined as, etc.), or appear in non-hypernym structures.
DeftEval is split into three subtasks
Subtask 1: Sentence Classification
Given a sentence, classify whether or not it contains a definition. This is the traditional definition extraction task.
Subtask 2: Sequence Labeling
Label each token with BIO tags according to the corpus' tag specification (see Data page).
Subtask 3: Relation Classification
Given the tag sequence labels, label the relations between each tag according to the corpus' relation specification (see Data page).
You may participate in any combination of the three subtasks, but note that the evaluation period for Subtask 3 will occur only after the end of the evaluation period for Subtask 2 in order to avoid any unfair release of test data.
Please note that there are new evaluation dates as of 3 Dec 2019 to reflect the new SemEval deadlines:
For questions and issues related to the task, please see the DeftEval-2020 forum. For questions and issues related to the data, please log issues on Github. To contact the organizers, please email the organizers at firstname.lastname@example.org.
You can run these metrics locally by using the evaluation code available on the DEFT Github repo. The test set contains data from the same distribution of data as the train and dev sets.
You may wish to run your evaluation via the Codalab framework to check your input formatting and to submit to the public leaderboard before the evaluation period begins. You must submit through the Codalab evaluation framework during the evaluation period in order for your submission to count towards the official competition ranking.
Please note the following phases and their corresponding evaluation data:
Practice Data: Trial data, found here
Training: Dev data, found here
All Evaluation phases: Test data, unlabeled data to be posted during the appropriate evaluation phase dates for each task.
To submit through Codalab, follow these steps:
Data for this competition is comprised of annotations on excerpts from freely available textbooks at www.cnx.org. All data, including annotations, is provided under the CC-BY-NA 4.0 license.
Start: Aug. 15, 2019, midnight
Start: Sept. 4, 2019, midnight
Start: Feb. 19, 2020, midnight
Start: Feb. 19, 2020, midnight
Start: March 1, 2020, midnight
Start: March 12, 2020, midnight
You must be logged in to participate in competitions.Sign In