CodaLab -

> Evaluation Test Set

Dear organisers,

We are wondering how many models we could submit for the final test set per team and whether the scores are immediately available to everyone? Can different members of the same team submit different models or only one model per team will be allowed?
Have you foreseen any actions to avoid overfitting on the final test set? For example limited number of submissions or taking the average of the results of all submitted models; or not showing the scores immediately?

Kind regards,
Ana and Siri

Posted by: siriwillems @ May 18, 2020, 11:39 a.m.

Hi Ana and Siri,

Each team will be allowed to submit one model to the final test set leaderboard, and their scores will be hidden from everyone (including themselves) until the competition closes. If you make multiple submissions, which we will allow, only your last submission will be counted on the test set leaderboard. The choices we made should prevent anyone from overfitting to the final test set. We will provide more details about the testing phase in an announcement tomorrow.

Aaron

Posted by: OpenKBP @ May 20, 2020, 3:27 p.m.

ok this makes sense. Thanks :)!

Posted by: siriwillems @ May 20, 2020, 3:36 p.m.

If we ca n’t see the result, how can we ensure that the submission format is correct? （sometimes some silly mistakes exist in submission） I think it may be a good choice to limit the number of submissions but make the results public。

Posted by: LSL000UD @ May 20, 2020, 4:08 p.m.

The last thing we want is for a group who put in a lot of work to lose out because of a silly technicality, so we will reach out to anyone with an invalid submission. The test data is formatted in the same way as the validation data, so if you have a pipeline that works for the validation data you shouldn't have any issues with the new test data.

The purpose of the test set is to "test" everyones models on an entirely unknown dataset. If you get measurements (e.g., scores from your competitors) then you are getting some aggregate statistics on the test set. We don't want to make any results public until the competition closes because if we make anything public it may give teams an opportunity to use the test set like a validation set.

Posted by: OpenKBP @ May 20, 2020, 4:21 p.m.

Thanks！

Posted by: LSL000UD @ May 20, 2020, 4:34 p.m.

No problem, and thanks to everyone for raising these concerns! We'll make sure these concerns are all addressed in our announcement tomorrow.

Posted by: OpenKBP @ May 20, 2020, 4:45 p.m.

Post in this thread

Forums

OpenKBP - 2020 AAPM Grand Challenge Forum

> Evaluation Test Set