This is the CodaLab Competition for SemEval-2021 Task 10: Source-Free Domain Adaptation for Semantic Processing.
Please join our Google Group to ask questions and get the most up-to-date information on the task.
|20 Aug 2020:||Pre-trained models release|
|3 Dec 2020:||Test data release|
|10 Jan 2021:||Evaluation start|
|31 Jan 2021:||Evaluation end|
Data sharing restrictions are common in NLP datasets. For example, Twitter policies do not allow sharing of tweet text, though tweet IDs may be shared. The situation is even more common in clinical NLP, where patient health information must be protected, and annotations over health text, when released at all, often require the signing of complex data use agreements. The SemEval-2021 Task 10 framework asks participants to develop semantic annotation systems in the face of data sharing constraints. A participant's goal is to develop an accurate system for a target domain when annotations exist for a related domain but cannot be distributed. Instead of annotated training data, participants are given a model trained on the annotations. Then, given unlabeled target domain data, they are asked to make predictions.
We propose two different semantic tasks to which this framework will be applied: negation detection and time expression recognition.
Egoitz Laparra, Yiyun Zhao, Steven Bethard (University of Arizona)
Tim Miller (Boston Children's Hospital and Harvard Medical School)
Özlem Uzuner (George Mason University)
Laparra E., Xu D., Elsayed A., Bethard S., and Palmer M. SemEval 2018 task 6: Parsing time normalizations. In Proceedings of The 12th International Workshop on Semantic Evaluation, New Orleans, Louisiana. 2018.
Negation detection will be evaluated using the standard precision, recall and F1 scores as used in most published work: recall points are gained by correctly predicting that a negated entity is negated, precision points are obtained if a predicted negation is correct.
Time expression recognition will be evaluated using the standard precision, recall and F1 previously used for the entity-finding portion of SemEval 2018 Task 6.
By submitting results to this competition, you consent to the public release of your scores at the SemEval-2021 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.
You agree not to redistribute the test data except in the manner prescribed by its licence.
Since the scenario proposed by SemEval-2021 Task 10 is domain adaptation with no access to the source data, no annotated training set is distributed. Instead, participants are provided with models trained on that source data, the development data representing a new domain on which participants can explore domain adaptation algorithms, and the test data representing another new domain on which the participant's approaches will be evaluated.
For negation detection, the development data is the i2b2 2010 Challenge Dataset, a de-identified dataset of notes from Partners HealthCare, containing 2886 unlabeled train instances (entities in sentence context), and 5545 dev instances with a corresponding labeling for with negation status. The original i2b2 data set had multi-label annotations in the set Asserted, Negated, Uncertain, Hypothetical, Conditional, FamilyRelated - to align with other challenge datasets we have kept the Negated category but mapped all other categories to "Not negated." The i2b2 2010 Challenge data requires a Data Use Agreement with Partners HealthCare, in order to access the development data, participants must first obtain access through the n2c2/DBMI Data Portal. After downloading the 2010 data, participants can then run scripts that are in the Github repo for this task.
For time expression recognition, the development data is the annotated news portion of the SemEval 2018 Task 6 data. The source text is from the freely available TimeBank, and the 2,000+ time entity annotations are stored in Anafora XML format.
Participants should also obtain access to the MIMIC III corpus v1.4, as a portion of it may be used for one or both of the test sets. Access to the MIMIC data requires participants to complete a CITI "Data or Specimens Only Research" online course, and then make an online request through PhysioNet. The course takes only a couple of hours online, and access requests are typically approved within a few days.
Paricipants are provided with trained models for both negation detection and time expression recognition. In both cases, we have used the RoBERTa-base (Liu et al., 2019) pretrained model included in the Huggingface Transformers library:
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., and Stoyanov V. Roberta: A robustly optimized bert pretraining approach. arXiv preprint. 2019.
The practice data (development data) is a subset of the i2b2 2010 Challenge on concepts, assertions, and relations in clinical text. If you do not already have access to this data, you will need to request access at the DBMI Data Portal. If you have obtained access, follow the portal link above and download the data by expanding the "2010 Relations Challenge Downloads" tab, and downloading the three files with the following titles:
At the time of writing, these are the last 3 links for the 2010 data. This should give you the following files, which you should save to a single directory:
Extract each of these with:
tar xzvf concept_assertion_relation_training_data.tar.gz
tar xzvf reference_standard_for_test_data.tar.gz
tar xzvf test_data.tar.gz
Next we will extract an unlabeled training set, unlabeled evaluation set, and a label file for the evaluation set (to test submissions and see the format). If you don't already have the task repo checked out, do so and enter the project directory:
$ git clone https://github.com/Machine-Learning-for-Medical-Language/source-free-domain-adaptation.git && cd source-free-domain-adaptation
Then to extract the training files, run the i2b2 extraction script with:
$ mkdir -p practice_text/negation && python3 extract_i2b2_negation.py <directory with three extracted i2b2 2010 folders> practice_text/negation
This will extract the three files into
The idea during the practice time is to use train.tsv as representative target-domain data to improve the system, and then evaluate any improvements to your system on dev.tsv.
To use the trained model to make predictions, install the requirements and run the
run_negation.py script to process the practice data as follows:
$ pip3 install -r baselines/negation/requirements.txt $ python3 baselines/negation/run_negation.py -f practice_text/negation/dev.tsv -o submission/negation/
This script will write a file called
submission/negation/system.tsv with one label per line.
The trial data for the practice phase consists of 99 articles from the AQUAINT, TimeBank and te3-platinum subsets of TempEval-2013, i.e. "Newswire" domain.
You can automatically download and prepare the input data for this phase running the
prepare_time_dataset.py script available in the task repository. If you don't already have the task repo checked out and the requirements installed, you need to do so first:
$ git clone https://github.com/Machine-Learning-for-Medical-Language/source-free-domain-adaptation.git && cd source-free-domain-adaptation $ pip3 install -r baselines/time/requirements.txt $ python3 prepare_time_dataset.py practice_text/
This will create a
practice_text/time directory containing the plain text of the documents used in this task.
The baseline for the time expression recognition is based on the pytorch implementation of RoBERTa by Hugging Face. We have used the
RobertaForTokenClassification architecture from Hugging Face/transformers library to fine-tune
roberta-base on 25,000+ time expressions in de-identified clinical notes. The resulting model is a sequence tagger that we have made available in Hugging Face model hub: clulab/roberta-timex-semeval. The following table shows the in-domain and out-of-domain (practice_data) performances:
The task repository contains scripts to load and run the model:
time baseline. These scripts are based on the Hugging Face/transformers library that allows easily incorporating the model into the code. See for example, the code from the baseline that loads the model and its tokenizer.
The first time you run such code, the model will be automatically downloaded in your computer. The scripts also include the basic functionality to read the input data and produce the output Anafora annotations. You can use the
run_time.py script to parse raw text and obtain time expressions. For example, to process the practice data, run:
$ python3 baselines/time/run_time.py -p practice_text/time/ -o submission/time/
This will create one directory per document in
submission/time containing one
.xml file with predictions in Anafora format.
There are many ways to try to improve the performance of this baseline on the practice text (and later, on the evaluation text). Should you need to continue training the
clulab/roberta-timex-semeval model on annotated data that you have somehow produced, you can run the
$ python3 baselines/time/train_time.py -t /path/to/train-data -s /path/to/save-model
train-data directory must follow a similar structure to the
practice_text/time folder and include, for each document, a the raw text file (with no extension) and an Anafora annotation file (with
.xml extension). After running the training, the
save-model directory will contain three files (
config.json) with the configuration and weights of the final model, and the vocabulary and configuration files used by the tokenizer (
To upload your predictions to CodaLab, first make sure that your predictions are formatted correctly, then create a
submission.zip and upload it to CodaLab.
For negation detection, the output format is one classifier output per line, where the lines correspond to the lines in the input. A prediction of "Negated" should be output as 1, while a prediction of "Not negated" should be output as -1.
For time expression recognition, your system must produce Anafora XML format files in Anafora's standard directory organization.
Make sure that you comply with following rules when you create your output directory:
time. If you are not participating in one of the tracks, do not include its directory.
negationdirectory, include a single tsv file with the name
timedirectory, follow the same structure and names as in the dataset:
For example, for the development data, your directory structure should look like:
The easiest way to generate
submission.zip is to use the
Makefile provided in the sample code repository. First, place your prediction files - including the entire directory structure described above - under a
submission directory in the root of the sample code checkout. Then run
make submission.zip. This will zip up all your prediction files and produce a file,
To upload your submission, go to the CodaLab competition page. Find the "Participate" tab, then the "Submit/View Results" navigation element, then make sure "Practice" button is highlighted, and click the "Submit" button. Find your
submission.zip with the file chooser and upload. The scoring will run in the background -- usually you can refresh the page in about a minute to see the result in the table below.
You may see the error:
Traceback (most recent call last): File "/worker/worker.py", line 330, in run if input_rel_path not in bundles: TypeError: argument of type 'NoneType' is not iterable
This is a known issue with CodaLab. The solution for now is to make a new submission with the same
Start: June 9, 2020, midnight
Start: Jan. 10, 2021, midnight
Start: Jan. 31, 2021, midnight
You must be logged in to participate in competitions.Sign In