In the development phase, we just need to predict 1 day's data.
But in the test phase, Why do we need to predict 7 days results in phase B?
They are two different objects in one competition. Because in the development phase we can use the last 1 day's data to make features but in the test phase we cannot use the last day's data.
eg. In the development phase, I use yesterday's data to make a lot of features such as news ctr, users' click count, and so on. But now these feature is invalid. Because the next 7 days don't know yesterday's data.
Maybe this change doesn't matter for the DL model(nrms 、naml ...). But it will destroy all effort for these ML method participants.
agree with YangZhenghong.
Posted by: huailei @ Aug. 22, 2020, 7:46 a.m.Hi YangZhenghong, we have introduced the dataset construction and split in our dataset description paper. We hope the model is capable of mining long-term user interest rather than capturing short-term dynamics only. Thus, we reserve the logs in the last week as the test set. You may consider using the features extracted from the training/dev set only or designing new features.
Posted by: MIND_Organizer @ Aug. 22, 2020, 10:08 a.m.Thanks for your answer, dear MIND organizer. I agree with you that it's valuable to design the user's long-term interest model in research. But in competition, Many click behaviors depend on short-term data. (eg. An entertainment user may also concern the "Notre Dame de Paris is destroyed on fire" news.) And if we don't use these short-term data it's difficult to find the breaking news. And if we want to win it's necessary to design a good method to combine short-term and long-term data/models. Because the actual result is consist of all situation. Have a nice day, thanks again.
Posted by: YangZhenghong @ Aug. 25, 2020, 3:55 a.m.