SMM4H'20 - Shared Task

Secret url: https://competitions.codalab.org/competitions/23705?secret_key=aeda0cb9-a3da-4e4f-9d6a-d639355b455e
Organized by ivanflores - Current server time: Nov. 30, 2020, 6:17 p.m. UTC

Previous

Task4 prescription medication abuse, Post-Eval
June 5, 2020, midnight UTC

Current

Task5 birth defect pregnancy outcome, Post-Eval
June 5, 2020, midnight UTC

End

Competition Ends
Never

Task Definition

The proposed SMM4H shared tasks involve NLP challenges on social media mining for health monitoring and surveillance. This requires processing imbalanced, noisy, real-world, and substantially creative language expressions from social media. The proposed systems should be able to deal with many linguistic variations and semantic complexities in various ways people express medication-related concepts and outcomes. It has been shown in past research that automated systems frequently underperform when exposed to social media text because of the presence of novel/creative phrases and misspellings, and frequent use of idiomatic, ambiguous and sarcastic expressions. The tasks will thus act as a discovery and verification process of what approaches work best for social media data.

Similar to the first four runs of the shared tasks, the data include annotated collections of posts on Twitter. The training data is already prepared and will be available to the teams registering to participate.

The Five shared tasks proposed this year are:

  • Task 1: Automatic classification of tweets that mention medications
  • Task 2: Automatic classification of multilingual tweets that report adverse effects
  • Task 3: Automatic extraction and normalization of adverse effects in English tweets
  • Task 4: Automatic characterization of chatter related to prescription medication abuse in tweets
  • Task 5: Automatic classification of tweets reporting a birth defect pregnancy outcome

Timeline (Tentative)

 Jan 15, 2020  Training Data Release 
 June 1, 2020  Test Data Release, Evaluation Phase starts  
 June 4, 2020  Evaluation Phase ends, Post-Evaluation Phase starts 

Organizers

  • Graciela Gonzalez-Hernandez, University of Pennsylvania, USA
  • Davy Weissenbacher, University of Pennsylvania, USA
  • Ari Z. Klein, University of Pennsylvania, USA
  • Karen O’Connor, University of Pennsylvania, USA
  • Abeed Sarker, Emory University, USA
  • Elena Tutubalina, Kazan Federal University, Russia
  • Martin Krallinger, Barcelona Supercomputing Center, Spain
  • Anne-Lyse Minard, Université d’Orléans, France

Evaluation Metrics

The evaluation metric for each task is as follows:

  • Task 1: F1-score for the “positive” class (i.e., tweets that mention medications)
  • Task 2: F1-score for the “positive” class (i.e., tweets that report AEs)
  • Task 3: F1-score for the “positive” class (i.e., the correct AE spans and MedDRA IDs for tweets that report AEs)
  • Task 4: F1-score for the “potential abuse/misuse” class
  • Task 5: micro-averaged F1-score for the “defect” and “possible defect” classes

F1-score = 2 * ((Precision * Recall)/(Precision + Recall)); Precision = TP/(TP + FP); Recall = TP/(TP + FN);
Micro-averaged F1-score = 2 * ((Precision * Recall)/(Precision + Recall)); Precision = (TP(D) + TP(PD))/(TP(D) + TP(PD) + FP(D) + FP(PD)); Recall = (TP(D) + TP(PD))/(TP(D) + TP(PD) + FN(D) + FN(PD));

Abbreviations

TP	true positives
FP	false positives
FN	false negatives
D	“defect” tweet class
PD	“possible defect” tweet class

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the SMM4H'20 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers. You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science. You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers. You further agree to submit and present a short paper describing your system during the workshop. You agree not to redistribute the training and test data except in the manner prescribed by its licence.

To come...

Task Details

Task 1: Automatic classification of tweets that mention medications

This binary classification task involves distinguishing tweets that mention a medication or dietary supplement (annotated as “1”) from those that do not (annotated as “0”). In 2018, this task was organized using a data set that contained an artificially balanced distribution of the two classes. This year, the data set represents the natural, highly imbalanced distribution of the two classes among tweets posted by 112 women during pregnancy, with only approximately 0.2% of the tweets mentioning a medication. Training and evaluating classifiers on this year’s data set will more closely model the detection of tweets that mention medications in practice.

Data

  • Training data: 69,272 (181 “positive” tweets; 69,091 “negative” tweets)
  • Evaluation data: 29,687 tweets
  • Evaluation metric: F1-score for the “positive” class (i.e., tweets that mention medications)
  • Contact information: Davy Weissenbacher

* Participating teams should submit their results to CodaLab as a ZIP file containing a TSV file in the same format as the training data (shown below). The TSV file should not be in a folder in the ZIP file, and the ZIP file should not contain any folders or files other than the TSV file.

Tweet ID            User ID      Tweet            Class
354256195432882177  54516759     tweet text       0
352456944537178112  1267743056   tweet text       1
332479707004170241  273421529    tweet text       0
340660708364677120  135964180    tweet text       1

 

Task 2: Automatic classification of multilingual tweets that report adverse effects

This binary classification task involves distinguishing tweets that report an adverse effect (AE) of a medication (annotated as “1”) from those that do not (annotated as “0”), taking into account subtle linguistic variations between AEs and indications (i.e., the reason for using the medication). This classification task has been organized for every past #SMM4H Shared Task, but only for tweets posted in English. This year, this task also includes distinct sets of tweets posted in Spanish, French, Chinese, and Russian.

Data

  • English
    • Training data: 25,678 tweets (2,377 “positive” tweets; 23,301 “negative” tweets)
    • Evaluation data: ~5,000 tweets
    • Evaluation metric: F1-score for the “positive” class (i.e., tweets that report AEs)
    • Contact information: Arjun Magge
  • French
    • Training data: 2,426 tweets (39 “positive” tweets; 2,387 “negative” tweets)
    • Evaluation data: 607 tweets
    • Evaluation metric: F1-score for the “positive” class (i.e., tweets that report AEs)
    • Contact information: Anne-Lyse Minard
  • Russian
    • Training data: 7,612 tweets (666 “positive” tweets; 6,946 “negative” tweets)
    • Evaluation data: 1,903 tweets
    • Evaluation metric: F1-score for the “positive” class (i.e., tweets that report AEs)
    • Contact information: Elena Tutubalina

* Participating teams should submit their results to CodaLab as a ZIP file containing a TSV file in the same format as the training data (shown below). The TSV file should not be in a folder in the ZIP file, and the ZIP file should not contain any folders or files other than the TSV file.

Tweet ID            User ID      Class      Tweet
354256195432882177  54516759     0          tweet text
352456944537178112  1267743056   1          tweet text
332479707004170241  273421529    0          tweet text
340660708364677120  135964180    1          tweet text

 

Task 3: Automatic extraction and normalization of adverse effects in English tweets

This task, organized for the first time in 2019, is an end-to-end task that involves extracting the span of text containing an adverse effect (AE) of a medication from tweets that report an AE, and then mapping the extracted AE to a standard concept ID in the MedDRA vocabulary (preferred terms). The training data includes tweets that report an AE (annotated as “1”) and those that do not (annotated as “0”). For each tweet that reports an AE, the training data contains the span of text containing the AE, the character offsets of that span of text, and the MedDRA ID of the AE. For some of the tweets that do not report an AE, the training data contains the span of text containing an indication (i.e., the reason for using the medication) and the character offsets of that span of text, allowing participants to develop techniques for disambiguating AEs and indications.

Data

  • Training data: 2,376 tweets (1,212 “positive” tweets; 1,155 “negative” tweets)
  • Evaluation data: ~1,000 tweets
  • Evaluation metric: F1-score for the “positive” class (i.e., the correct AE spans and MedDRA IDs for tweets that report AEs
  • Contact information: Arjun Magge

* Participating teams should submit their results to CodaLab as a ZIP file containing a TSV file in the same format as the training data (shown below). The TSV file should not be in a folder in the ZIP file, and the ZIP file should not contain any folders or files other than the TSV file.

 

Task 4: Automatic characterization of chatter related to prescription medication abuse in tweets

This new, multi-class classification task involves distinguishing, among tweets that mention at least one prescription opioid, benzodiazepine, atypical anti-psychotic, central nervous system stimulant or GABA analogue, tweets that report potential abuse/misuse (annotated as “A”) from those that report non-abuse/-misuse consumption (annotated as “C”), merely mention the medication (annotated as “M”), or are unrelated (annotated as “U”).

Data

  • Training data: 13,172 tweets
  • Evaluation data: 3,271 tweets
  • Evaluation metric: F1-score for the “potential abuse/misuse” class
  • Contact information: Abeed Sarker

* Participating teams should submit their results to CodaLab as a ZIP file containing a TSV file in the same format as the training data (shown below). The TSV file should not be in a folder in the ZIP file, and the ZIP file should not contain any folders or files other than the TSV file.

 

Task 5: Automatic classification of tweets reporting a birth defect pregnancy outcome

This new, multi-class classification task involves distinguishing three classes of tweets that mention birth defects: “defect” tweets refer to the user’s child and indicate that he/she has the birth defect mentioned in the tweet (annotated as “1”); “possible defect” tweets are ambiguous about whether someone is the user’s child and/or has the birth defect mentioned in the tweet (annotated as “2”); “non-defect” tweets merely mention birth defects (annotated as “3”)

Data

  • Training data: 18,397 tweets (953 “defect” tweets; 956 “possible defect” tweets; 16,488 “non-defect” tweets)
  • Test data: 4,602 tweets
  • Evaluation metric: micro-averaged F1-score for the “defect” and “possible defect” classes
  • Contact information: Ari Klein

* Participating teams should submit their results to CodaLab as a ZIP file containing a TSV file in the same format as the training data (shown below). The TSV file should not be in a folder in the ZIP file, and the ZIP file should not contain any folders or files other than the TSV file.

Tweet ID            User ID            Tweet            Class
588529611144896512  33780523           tweet text       1
878999231540932609  720532596179013632 tweet text       2
688663852720992256  357448100          tweet text       3
419619531032895488  416623482          tweet text       3
552360913278631936  2867262207         tweet text       3

 

FAQ

Q: How will I submit my results? A: Teams should submit their results to CodaLab as a ZIP file containing a TSV file in the same format as the training data for each task (see the task descriptions above). The TSV file should not be in a folder in the ZIP file, and the ZIP file should not contain any folders or files other than the TSV file. Q: How many submissions can I make? A: For each task, three submissions from each team will be accepted. You can participate in one or multiple tasks. Q: Can I participate in Task 2 only? A: Yes. You can participate in any number of tasks. Q: Are there any restrictions on data and resources that can be used for training the classification system? For example, can we use manually or automatically constructed lexicons? Can we use other data (e.g., tweets, blog posts, medical records) annotated or unlabeled? A: There are currently no restrictions on data and resources. External resources and data can be used. All external resources need to be explained in the system description paper. Q: Is there any information on the test data? Will the test data be collected in the same way as the training data? For example, will the same drug names be used to collect tweets? A: The test data has been collected the same way.

Task1 Medication classification, Practice

Start: Jan. 1, 2020, midnight

Task1 Medication classification, Evaluation

Start: June 1, 2020, midnight

Task1 Medication classification, Post-Eval

Start: June 5, 2020, midnight

Task2 ADR classification(English), Practice

Start: Jan. 1, 2020, midnight

Task2 ADR classification(English), Evaluation

Start: June 1, 2020, midnight

Task2 ADR classification(English), Post-Eval

Start: June 5, 2020, midnight

Task2 ADR classification(French), Practice

Start: Jan. 1, 2020, midnight

Task2 ADR classification(French), Evaluation

Start: June 1, 2020, midnight

Task2 ADR classification(French), Post-Eval

Start: June 5, 2020, midnight

Task2 ADR classification(Russian), Practice

Start: Jan. 1, 2020, midnight

Task2 ADR classification(Russian), Evaluation

Start: June 1, 2020, midnight

Task2 ADR classification(Russian), Post-Eval

Start: June 5, 2020, midnight

Task3 ADR extraction and normalization, Practice

Start: Jan. 1, 2020, midnight

Task3 ADR extraction and normalization, Evaluation

Start: June 1, 2020, midnight

Task3 ADR extraction and normalization, Post-Eval

Start: June 5, 2020, midnight

Task4 prescription medication abuse, Practice

Start: Jan. 1, 2020, midnight

Task4 prescription medication abuse, Evaluation

Start: June 1, 2020, midnight

Task4 prescription medication abuse, Post-Eval

Start: June 5, 2020, midnight

Task5 birth defect pregnancy outcome, Practice

Start: Jan. 1, 2020, midnight

Task5 birth defect pregnancy outcome, Evaluation

Start: June 1, 2020, midnight

Task5 birth defect pregnancy outcome, Post-Eval

Start: June 5, 2020, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 baiyang2581 0.69