I would like to clarify the rules regarding additional data.
> Participants are free to use any additional datasets that have been made publicly available *before* the beginning of the Competition.
If the condition is that the dataset was published before 2020/07/01, there is a possibility that the tweets will overlap with the tweets in the test set.
There is a possibility of leakage. Is that possibility protected by any rules of this competition?
Doesn't it require additional data to be released in advance, like Kaggle?
Example: https://www.kaggle.com/c/prostate-cancer-grade-assessment/discussion/145026
"Participants are free to use any additional datasets that have been made publicly available *before* the beginning of the Competition" - what does "before the beginning of the competition" mean?
Does it mean we can only use external datasets up till the date Sep 30 2019, which is the first day in the training dataset? Admin could you confirm this?
Hi,
thanks for your questions! We updated the Terms and Conditions Page of the Challenge.
Best, the organizers.