By AI Trends Staff
deepsense.ai and Volkswagen have published research on arXiv showing that an autonomous vehicle trained entirely in simulation can drive in the real world. The team trained its policies using a reinforcement learning algorithm with “mostly synthetic data”, then transferred its neural network into a real car, providing it with over 100 years’ driving experience before the real-world engine had ever started.
“Moving the neural network policy from the simulator to reality was a breakthrough and the heart of the experiment,” explained Piotr Miłoś, deepsense.ai researcher and a professor at the Polish Academy of Sciences in a statement about the work. “We have run multiple training sessions in the simulated environment, but even the most sophisticated simulator delivers different experiences than what a car encounters in the real world. This is known as the sim-to-real gap. Our algorithm has learned to bridge this gap. That we actually rode in a car controlled by a neural network proves that reinforcement learning-powered training is a promising direction for autonomous vehicle research overall.”
The experiment was conducted by a team of researchers including deepsense.ai’s Błażej Osiński, Adam Jakubowski, Piotr Miłoś, Paweł Zięcina and Christopher Galias, Volkswagen researcher Silviu Homoceanu and University of Warsaw professor Henryk Michalewski. The team not only trained a model that controls a car in a simulated environment but also executed test drives in a real car at Volkswagen’s testing facility. The car navigated through real streets and crossroads, performing real-world driving maneuvers.
The team used the CARLA simulator, and open-source simulator for autonomous driving research based on Unreal Engine 4 and tested 10 models with different driving and input variables.
Simulations Offer Advantages
Being able to use simulation offers crucial advantages, the team says. First, it is cheaper. In their combined experiments, the team generated some 100 years of simulated driving experience. Racking up so much experience in a real car isn’t feasible. Training in a simulator can also be done much faster. The agent can experience all manner of danger, from simple rain to deadly extreme weather, accidents to full-blown crashes, and learn to navigate or avoid them. Subjecting an actual driver to such hazards would be prohibitively complicated, time-consuming, and ethically unacceptable. What’s more, extreme scenarios in the real world are relatively uncommon but can be quickly simulated.
“It was exciting to see that our novel approach worked so well. By further exploring this path we can deliver more reliable and flexible models to control autonomous vehicles,” said Blazej Osinski, a data scientist at deepsense.ai. “We tested our technique on cars, but it can be further explored for other applications.”
Reinforcement learning allows a model to shape its own behavior by interacting with the environment and receiving rewards and penalties as it goes. The model’s goal is, ultimately, to maximize the rewards it receives while avoiding penalties. An autonomous car receives points for safe driving and complying with the traffic laws. Simulating every road condition that can occur is impossible but shaping a set of guidelines like “avoid hitting objects” or “protect passenger from any and all harm” is not. Ensuring the model sticks to the guidelines makes it more reliable, including in less common situations it will eventually face on the road.
“Using reinforcement learning has reduced the amount of human engineering work,” explained Osinski. “We didn’t have to create driving heuristics and to collect reference drives; instead we only had to define desired outcomes (making progress on a route) and undesired behaviors (crashes, deviating from you line, etc.). Based on this rewards and punishment RL is able to figure out the rest.”
There are many next steps for the team, Osinski said. “We can definitely increase the robustness of our system against more conditions and higher driving speeds. We would also like to tackle the situations requiring ‘assertive’ interactions with other drivers, such as changing lanes in a heavy traffic.”
Technical details of the research are described in the team’s paper, available on arXiv.