Now that the competition only has a single agent phase, are there going to be any changes to the evaluation scheme?
Furthermore, we have another question on the 35% of scores mentioned in the previous evaluation scheme allocated to safety, robustness, and novelty. Are there any further details on how you measure these additional evaluation aspects or is it really up to us to define what safety and robustness are? Also, how do you define novelty exactly? How are these 3 aspects going to be translated into 35% of the score?
Finally, some questions about the 65% of the scores assigned to the performance evaluation. In this case, how exactly will the scores in codalab be translated to this score?
Thanks!
Thanks for your question. 65% of the score will be determined by the Codalab score as specified in the 'Rank' column. This score is the total distance covered by the vehicle across the 7 scenarios. An additional 25% of the score will be determined by evaluating distance covered on a number of more challenging scenarios, which will have a tendency to favour safer and more robust policies. These environments are not in the test set used for the Codalab score. Finally, our team will evaluate the novelty and research value of proposed solutions, which will correspond to the remaining 10% of the score.
Posted by: HuaweiUK @ Nov. 19, 2019, 10:30 a.m.Thanks for your reply.
I have some further questions though. In the case of the scores in the leaderboard, how are you exactly going to transform the metric there into 65% of the final scores? In general, since the upper bound of the score in the leaderboard is not clearly defined, how are you going to normalize those scores into the 0-65% range?
Thanks!
Posted by: Team63 @ Nov. 21, 2019, 10:49 a.m.