Multi-Agent RL Training School

Organized by HanShan_RL - Current server time: May 31, 2020, 6:54 a.m. UTC


April 17, 2020, midnight UTC


Single Agent
May 31, 2020, midnight UTC


Multi Agent
June 18, 2020, midnight UTC

CS492 Driving Competition



Self-driving cars have been a hot topic in recent years, and is a thriving and booming research area holding the promise of a new generation in transportation development. Intending to further research in autonomous driving and in identifying rising stars, our CS492 Reinforcement Learning lesson is hosting a competition under the support of HUAWEI Autonomous Driving Platform and their modern Simulator SMARTS.


SMARTS, the first of its kind in that it can model realistic dynamical behaviour, can offer you these scenarios that emulates real-world behaviours at different granularity levels and get you a step closer to bridging the gap between research and application.

SMARTS now supports

  • Simulator Core - supports fast and flexible construction of RL environments
  • Algorithm Library - integrates the most comprehensive set of MARL algorithms
  • Multi-Agent Trainer - supports major multi-agent RL training paradigms
  • Policy Zoo - supports the instantiation of social agents from existing policies
  • Scenario Studio - supports flexible scripting of training scenarios in Python


For training, your goal is to design an agent that is capable of driving safely and efficiently across a variety of simulated maps. Competitors will train a driverless car from scratch to traverse a multi-lane highway as fast as possible whilst avoiding collisions with social vehicles and other competitors. Success will require the training of intelligent driving behaviours which account for the behaviour of nearby vehicles, such as when executing an overtaking manoeuvre.

For evaluation, we are following a rule like Formula One. By setting a start position and a goal postion, we will evaluate the time consumption of each player to finish the track missions. We are evaluating across a different set of scenarios, with a different seed, max step count. You can refer to more details in evaluation page.


Your challenge consists of two phases covering both single and multi-agent reinforcement learning scenarios.

At the qualification stage, each player is required to control their agents to finish the track missions as soon as possible. The evaluation simulation for each agent is isolated, which means trained agents will only interact with predefined social cars. Top 16 players will go to the next stage.

At the tournament stage, players will be grouped and players in the same group will compete with each other. Hence that the evaluation simulation will contain all members in one group. All players in one group will be set the same start position and have the same goal postion, the ranking is calculated according to the time comsumption to finish the track missions. We will continue the group and compete operation until the final winner is born.


For any questions do not hesitate to raise them in the forums or wechat group.


When you submit your solution we’ll put it through a similar evaluation to your local script. However we’ll be evaluating it across a different set of scenarios: 1lane, 1lane_10v, 2lane_sharp_bwd_10v, 3lane_bwd_10v, 3lane_sharp_b_10v, 3lane_sharp_b_25v, 3lane_sharp_b_50v. The _XXv suffix is the number of social vehicles running in the environment. The evaluation simulation will also run with a different seed. Note that The leader-board score is also the evaluation score.

In addition, we will have test scores by using private, new, previously unreleased maps. The final score will be calculated using a weighted total of evaluation score and test score.

Terms and Conditions



Start: April 17, 2020, midnight

Single Agent

Start: May 31, 2020, midnight

Multi Agent

Start: June 18, 2020, midnight

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 espylapiza 11742.26
2 zukki 12381.03