Training and development data sets can be downloaded from: http://alt.qcri.org/resources/tanbih/covid19/hackathon/hackathon_clarin_data.zip
Note: developement data set has no labels.
More detail about the dataset and baseline scorer are available in https://github.com/sshaar/CLARIN-Hackathon-2020-Fighting-the-COVID19-Infodemic
There is also a similar task for Arabic, please check here: https://competitions.codalab.org/competitions/26467
The training file consist of 9 columns as follows:
tweet no: an index mapping tweet id
tweet_text: corresponds to the original text of a given tweet as downloaded from Twitter
Q1_label to Q7_label: corresponds to the label for question 1 to 7
For some questions there will be a label with “nan”. For the experiment with a particular question, entries with the label “nan” have to be removed.
The class labels consist of “yes” and “no”, therefore, the model will be trained with binary labels, which will also predict binary labels for each question.
Classification systems will be evaluated using the average of macro-averaged F1-scores for Questions 1 to 7.
Start: Sept. 20, 2020, midnight
Start: Oct. 14, 2020, midnight
Oct. 15, 2020, 8:26 p.m.
You must be logged in to participate in competitions.Sign In