DriveML Huawei Autonomous Vehicles Challenge Forum

Go back to competition Back to thread list Post in this thread

> Local validation drastically different from online results

Good evening, we are hoping you can shed a light on our recent submission results.

We have been training our models with the public data tracks and held one track out for validation purposes. Our model successfully completes every episode of the hold out track without mishaps, even with different seeds. We've replicated the same experiment by holding out different tracks and our model is always successful on the hold out track.

However, once we submit any of these models, our results are extremely poor, and we are not able to complete a single lap on most tracks without crashing or going off road. Is there any chance any of the observations are slightly different in your environment? Are the online SUMO environment parameters (such as physics) exactly the same as our local configuration?

If this is not the problem, do you have any suggestion on how to fix this local-to-online discrepancy in validation? We can go into more details on how we perform validation locally by private emails, if necessary.

Thank you in advance,
Team drive++

Posted by: Team162 @ Dec. 12, 2019, 9:07 p.m.

Thanks for your message. The observations and physics are exactly the same for evaluation using codalab. The only change is that we evaluate on new maps, with varying numbers of social cars as listed on the leaderboard. The SUMO environment parameters are exactly the same. Your current performance seems pretty good, as you are currently second on the leaderboard, but I understand if that is disappointing given your internal tests. Whilst I can't give anything away, I would say that even with your methodology there is always a danger of overfitting and it is possible that this is the issue you face. I will also ask my colleagues to take a look at your query to see if there is anything more they can add.

Finally, bear in mind that 25% of the final score will be comprised of further evaluation on new maps and potentially with changed aspects of the environment. You will probably want your model to be robust to that. Good luck!

Posted by: HuaweiUK @ Dec. 13, 2019, 6:29 p.m.

Can you kindly confirm you're using the latest starter_kit version? (context: https://competitions.codalab.org/forums/18353/3208/). Another simple test would be to run the provided RLlib example locally on your machine, as well as submitting it to CodaLab to confirm that the scores are similar.

Kindly,
Julian (DriveML Team)

Posted by: HuaweiUK @ Dec. 13, 2019, 6:42 p.m.

Thank you for your replies. That is very helpful. We can confirm we are using the latest version of the starter kit.
After more experiments, we also found an error on our end. This error might have accounted for a portion of the discrepancy we observed although not all.

Have a nice weekend, see you on Wednesday,
Team drive++

Posted by: Team162 @ Dec. 13, 2019, 7:42 p.m.
Post in this thread