PharmaCoNER: Pharmacological substances, Compounds and proteins and Named Entity Recognition Track
BioNLP-OST 2019 Workshop @ EMNLP-IJCNLP 2019 (Hong Kong)
SEAD – Plan TL Sponsoring the PharmaCoNER Task Awards for track winners:
There is a prize for both sub-tracks: 1,000€ to each sub-track winner, 500€ to the second teams and 200€ to the third teams.
About the task
Efficient access to mentions of drugs, medications and chemical entities is a pressing need shared by biomedical researchers, clinicians and pharma industry. The recognition of pharmaceutical drugs and chemical entities is a critical step required for the subsequent detection of relations of chemicals with other biomedically relevant entities.
The critical importance of chemical and drug name recognition motivated several-shared tasks in the past, such as the CHEMDNER tracks or the i2b2 medication challenge. However, currently, most of the BioNLP, as well as clinical NLP research, is being done on English documents, and only a few tasks have been carried out on non-English texts or were multilingual tracks. Nonetheless, it is important to note that there is also a considerable amount of biomedically relevant content published in other languages than English and particularly clinical texts are entirely written in the native language of each country, with a few exceptions.
Following the outline of previous chemical and drug NER efforts, in particular, the BioCreative CHEMDNER tracks, we organize the first task on chemical and drug mention recognition from Spanish medical texts, namely from a corpus of Spanish clinical case studies. Thus, this task will address the automatic extraction of chemical, drug, gene/protein mentions from clinical case studies written in Spanish. The main aim is to promote the development of named entity recognition tools of practical relevance, that is chemical and drug mentions in non-English content, determining the current-state-of-the-art, identifying challenges and comparing the strategies and results to those published for English data.
For this task, we have prepared a manually classified collection of clinical case sections derived from Open access Spanish medical publications, named the Spanish Clinical Case Corpus (SPACCC). The corpus contains a total of 1000 clinical cases / 396,988 words. It is noteworthy to say that this kind of narrative shows properties of both, the biomedical and medical literature as well as clinical records.
Please fill in the registration form in http://temu.bsc.es/pharmaconer/index.php/register/
The annotation of the entire set of entity mentions was carried out by medicinal chemistry experts and it includes the following four entity types:
– “NORMALIZABLES”: mentions of chemicals that can be manually normalized to a unique concept identifier (primarily SNOMED-CT).
– “NO_NORMALIZABLES”: mentions of chemicals that could not be normalized manually to a unique concept identifier.
– “PROTEINAS”: mentions of proteins and genes following an adaptation of the BioCreative GPRO track annotation guidelines. This class includes also peptides, peptide hormones and antibodies.
– “UNCLEAR”: cases of general substance class mentions of clinical and biomedical relevance, including certain pharmaceutical formulations, general treatments, chemotherapy programs, vaccines and a predefined set of general substances (e.g.: Estragón, Silimarina, Bromelaína, Melanina, Vaselina, Lanolina, Alcohol, Tabaco, Marihuana, Cannabis, Opio and Gluten). Mentions of this class will not be part of the entities evaluated by this track but serve as additional annotations of medical relevance.
Evaluation of automatic predictions for this task will have two different scenarios or tracks:
1) Track 1: NER offset and entity classification.
2) Track 2: Concept indexing.
The PharmaCoNER corpus has been randomly sampled into three subsets: the train, the development, and the test set. The training set contains 500 clinical cases, and the development and test set 250 clinical cases each.
More information at the PharmaCoNER webpage.
The sample set is composed of 15 clinical cases extracted from the training set. This sample set is also included in the evaluation script (see Resources). Download the sample set from here.
The train set is composed of 500 clinical cases. Download the train set from here.
The Development set is composed of 250 clinical cases. Download the development set from here.
Test set (including background set)
The Test set with the background set is composed of at least 2,500 clinical cases. Available for download according to the established dates (see Schedule).
Test set with Gold Standard annotations
The Test set is with Gold Standard annotations is composed of 250 clinical cases. Available for download according to the established dates (see Schedule).
Evaluation of automatic predictions for the PharmaConNER task will have two different scenarios or sub-tracks: NER offset and entity type classification sub-track and the concept indexing sub-track:
• NER offset and entity type classification: Thefirst evaluation scenario will consist in the classical entity-based or instanced-based evaluation that requires that system outputs match exactly the beginning and end locations of each entity tag, as well as match the entity annotation type of the gold standard annotations.
• Concept indexing: The second evaluation scenario will consist of a concept indexing task where for each document, the list of unique SNOMED concept identifiers have to be generated by participating teams, which will be compared to the manually annotated concept ids corresponding to chemical compounds and pharmacological substances.
The primary evaluation metrics will consist of micro-averaged precision, recall and F1-scores:
Precision (P) = true positives/(true positives + false positives)
Recall (R) = true positives/(true positives + false negatives)
F-score (F1) = 2*((P*R)/(P+R))
For both sub-tracks, the official evaluation and the ranking of the submitted systems will be based exclusively in the F-score (F1) measure.
As part of the evaluation process, we plan to carry out a statistical significance testing between system runs using approximate randomization, following setting previously used in the context of i2b2 challenges. The used evaluation scripts together with proper documentation and README files with instructions will be freely available on GitHub to enable evaluation tools source code local testing by participating teams.
As prediction baseline we will use vocabulary transfer results from the training/development set derived entity named and using gazetteer-lookup on the test set corpus.
See evaluation examples here.
A submission.zip for this competition would look similar (In this example we submit brat format, you can submit xml format):
Test Set - Sub task 1:
| - subtask1
| - S0004-06142005000500011-1.ann
| - ...
Test Set - Sub task 2:
| - subtask2
| - S0004-06142005000500011-1.tsv
| - ...
1. The root of the zip file should be subtask1 and subtsk2 directories.
2. Users must annotate all test set, otherwise, the submission will not be processed.