COVID-19 Retweet Prediction Challenge

Organized by trovdimi - Current server time: Jan. 19, 2021, 9:45 p.m. UTC
Reward $2,810

First phase

Validation Leaderboard
June 15, 2020, midnight UTC


Competition Ends
Sept. 1, 2020, midnight UTC


As a result of the ongoing Coronavirus disease 2019 (COVID-19) pandemic, our daily life routines and behavior patterns changed drastically not only offline but also online. One example of such a change is the change in the reading patterns on Wikipedia and Reddit [1,2]. COVID-19 has also been a hot topic on other social media platforms such as Facebook, Twitter, or YouTube. 

To understand the information spreading mechanisms during the COVID-19 pandemic, in this challenge, we focus on Twitter. Twitter is an online social network where users can follow each other and share information using short text posts called tweets. The platform offers a function to retweet a tweet, which means sharing it with your followers without any change. Retweeting is a popular function and it has also found his way in other online social networks such as Weibo. Retweeting can be seen as amplifying the spread of original content and thus retweet prediction is a crucial task when studying information spreading processes. As such, understanding retweet behavior is useful and has many practical applications, e.g. political audience design [3,4], fake news spreading and tracking [5,6], health promotion [7], mass emergency management [8], etc. Modeling retweet behavior has been an active research area and is also especially important during times of crisis, such as the current COVID-19 pandemic. 

The retweet prediction task in the challenge is based on the TweetsCOV19 dataset --- a publicly available dataset containing more than 8 million COVID-19-related tweets, spanning the period October 2019 to April 2020.

The COVID-19 Retweet Prediction Challenge is part of the CIKM2020 AnalytiCup and the winners will be invited to present their solutions during the *online* AnalytiCup Workshop in October 2020. 



  • 01.07.20 Contest and Phase 1 Begin (Validation Leaderboard opens)
  • 15.08.20 Phase 2 Begin (Testing Leaderboard opens)
  • 31.08.20 Last Shot & Contest End
  • 01.09.20 Semi-Finalists Announcement (top six teams on the Testing Leaderboard)
  • 01.10.20 Report & Code Due
  • 20.10.20 Winners Announcement


Prizes and Sponsors

The winners of the COVID-19 Retweet Prediction Challenge await non-cash prizes worth 2.500€ provided by L3S Research Center, University of Hannover, Germany. The prizes will be distributed among the participants as follows:

  • The 1st Place receives a non-cash prize equivalent of EUR1,200 (~USD1,350)*
  • The 2nd Place receives a non-cash prize equivalent of EUR800 (~USD900)*
  • The 3rd Place receives a non-cash prize equivalent of EUR500 (~USD560)*

*In order to be eligible for any award, the semi-finalists are required to submit the code and solution report (4 pages in ACM format) to the organizers by the stipulated deadline. The submitted codes and reports may be inspected to check the validity of the solution. The reports will eventually be made publicly available on the CIKM conference website.

Prizes will be awarded in the form of vouchers. L3S Research Center reserves the right not to award some of or all the prizes if the competition criteria are not met.

The complete list of sponsors includes:
GESIS – Leibniz Institute for the Social Sciences, Germany
Chongqing University of Technology, China
Heinrich-Heine-University Düsseldorf, Germany



Dimitar Dimitrov, GESIS – Leibniz Institute for the Social Sciences, Germany
Xiaofei Zhu, Chongqing University of Technology, China



  • Please first go through all the pages in this competition for complete information.
  • If you have further questions, please post them on the Forum tab.

We wish you all the best!



We wouldn't be here without the help of the following people:

  • Erdal Baran - helped with preparing the data
  • Zhanwang Peng - helped to test the competition on CodaLab 


[1] Gozzi, N., Tizzani, M., Starnini, M., Ciulla, F., Paolotti, D., Panisson, A. and Perra, N., 2020. Collective response to the media coverage of COVID-19 Pandemic on Reddit and Wikipedia. arXiv preprint arXiv:2006.06446

[2] Ribeiro, M.H., Gligorić, K., Peyrard, M., Lemmerich, F., Strohmaier, M. and West, R., 2020. Sudden Attention Shifts on Wikipedia Following COVID-19 Mobility Restrictions. arXiv preprint arXiv:2005.08505.

[3] Stieglitz, S. and Dang-Xuan, L., 2012, January. Political communication and influence through microblogging--An empirical analysis of sentiment in Twitter messages and retweet behavior. In 2012 45th Hawaii International Conference on System Sciences (pp. 3500-3509). IEEE.

[4] Kim, E., Sung, Y. and Kang, H., 2014. Brand followers’ retweeting behavior on Twitter: How brand relationships influence brand electronic word-of-mouth. Computers in Human Behavior37, pp.18-25.

[5] Lumezanu, C., Feamster, N. and Klein, H., 2012, May. # bias: Measuring the tweeting behavior of propagandists. In Sixth International AAAI Conference on Weblogs and Social Media.

[6] Vosoughi, S., Roy, D. and Aral, S., 2018. The spread of true and false news online. Science359(6380), pp.1146-1151.

[7] Chung, J.E., 2017. Retweeting in health promotion: Analysis of tweets about Breast Cancer Awareness Month. Computers in Human Behavior74, pp.112-119.

[8] Kogan, M., Palen, L. and Anderson, K.M., 2015, February. Think local, retweet global: Retweeting by the geographically-vulnerable during Hurricane Sandy. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing (pp. 981-993).



In this competition, you are provided with the TweetsCOV19 dataset, a publicly available dataset of more than 8 million COVID-19-related tweets, spanning the period October 2019 to April 2020. For each tweet, the dataset provides metadata and some precalculated features such as sentiment scores and entities.

Given the set of features for a tweet from TweetsCOV19, the task is to predict the number of times it will be retweeted (#retweets).

We recommend using the data as provided after registration in the Get Data tab of the Participate page. 


Each submission will be evaluated using the Mean Squared Log Error (MSLE). Please note that #retweets has to be an integer.

Violating any competition rule specified below is ground for disqualification.  In the event of any dispute in connection with the Competition, or with the interpretation or implementation of these rules, the decision of the Organizers shall be final.


The competition is open to everyone except for anyone involved with the organization.

One account per participant

You cannot sign up to CodaLab from multiple accounts, and therefore you cannot submit from multiple accounts.

Team size

There is no limit to the number of team members. The only restriction is that the total count of submission of all team members must be less than or equal to the maximum allowed in the respective phase of the competition. 

Team mergers

Team mergers are allowed all throughout Phase 1, and can be performed by the team leader (go to your account's User Settings and indicate team name and members). In order to merge, the combined team must have a total submission count less than or equal to the maximum allowed as of the merge date. The maximum allowed is the number of submissions per day multiplied by the number of days the competition has been running. The organizers do not provide any assistance regarding the team mergers.

Additional data

Participants are free to use any additional datasets that have been made publicly available *before* the beginning of the Competition April 30, 2020.

No private sharing outside teams

Privately sharing code or data outside of teams is not permitted. It is permitted to share code if it is made available to all participants on the forums or as a public repository (e.g., Github).


You may submit a maximum of 20 entries per day during Phase 1 (Validation Leaderboard). For Phase 2 (Testing Leaderboard), you can only submit 10 entries in total per day.

At the end of Phase 2, the semi-finalists--- the top six teams---are to submit their code as well as a report describing their solution (4 pages in ACM format) and make their code publicly available by the stipulated deadline.

The submitted codes and reports may be inspected to check the validity of the solution.  The reports will eventually be made publicly available on the CIKM conference website.

Selected teams will also be invited to present their solutions *online* at the CIKM AnalytiCup Workshop in October 2020.  To allocate the limited presentation slots, preference will be given to award-winning teams, as well as teams deemed by the organizers to have interesting or remarkable solutions.


We trust that all used data, methods, and resources comply with the ACM code of ethics.


The ranking of entries based on the prediction score (MSLE) during Phase 2 will be used to determine the semi-finalists (top six teams), subject to the validity of the solutions. Winners will be the top 3 teams among the semi-finalist teams. A tie in the prediction score will be broken in favor of the earlier submission on the final leaderboard.

Validation Leaderboard

Start: June 15, 2020, midnight

Description: Ongoing model development and evaluation based on validation data. The results are shown on the Validation Leaderboard.

Testing Leaderboard

Start: Aug. 15, 2020, midnight

Description: Final submission based on the testing data. The results are shown on the Testing Leaderboard.

Competition Ends

Sept. 1, 2020, midnight

You must be logged in to participate in competitions.

Sign In