WISEMOVE: A Framework to Investigate
Safe Deep Reinforcement Learning for Autonomous Driving
Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards
University of Waterloo
September, 14th, 2019
W ISE M OVE ? A research platform that mimics our autonomous driving - - PowerPoint PPT Presentation
W ISE M OVE : A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving Jaeyoung Lee , Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards University of Waterloo September, 14th, 2019 1 W ISE M
University of Waterloo
September, 14th, 2019
1
Local Planner
reference trajectories measurements, perceptions, etc. high-level decision
(abstracted)
Behaviour Planner
2
(w/o MCTS)
Deep Model for Trajectory Generation
reference trajectories measurements, perceptions, etc.
Deep Model for Decision Making
Option
(high-level decision)
STOP STOP
stop region stop region intersection ego
3
Road Scenario
Deep Model for Trajectory Generation Deep Model for Decision Making
Option
KeepLane, Stop, Wait, Follow, ChangeLane
✓ preconditions
Road Scenario
STOP STOP
stop region stop region intersection ego
4
(w/o MCTS)
G((has_stopped_in_stop_region and in_stop_region) U highest_priority)
✓ time-out (e.g., 1 sec.) ✓ speed limit, target lane , e.g., in an option ‘Wait’,
Deep Model for Trajectory Generation Deep Model for Decision Making
Runtime Verifier
G((has_stopped_in_stop_region and in_stop_region) U highest_priority)
G(in_stop_region => (in_stop_region U has_stopped_in_stop_region))
✓ Ego reaches the right end on the road, ✓ a traffic rule is violated, or ✓ a collision happens.
5
Road Scenario
STOP STOP
stop region stop region intersection ego
(w/o MCTS)
✓preconditions, e.g., in an option ‘Wait’, ✓traffic-rules, e.g., in a stop region,
Deep Model for Decision Making Deep Model for Trajectory Generation
Input: a state representation Output: the learnt ‘best’ Option Option (high-level decision) Next Option?
6
(w/o MCTS)
Deep Model for Trajectory Generation
Input: a state representation (simplified) Output: reference trajectories, given an Option Option (high-level decision)
7
Deep Model for Decision Making
Next Option? (w/o MCTS)
8
Deep Model for Trajectory Generation Deep Model for Decision Making
Next Option?
time Follow Stop Wait KeepLane
reference trajectory “____” To the road scenario
(w/o MCTS)
Option “ ”
9
Deep Model for Trajectory Generation
reference trajectory “____”
Option “ ”
10
mean (std) % success after 100,000 training
(averaged over 100 trials of 100 episodes)
11
12
Deep Model for Trajectory Generation Deep Model for Decision Making
Next Option?
time Follow Stop Wait KeepLane
reference trajectory “____” To the road scenario
(w/o MCTS)
Option “ ”
13
Overall performance (after 200,000 steps training)
(averaged over 1000 episodes)
14
KeepLane Follow Wait Stop
current state
Stop ChangeLane Stop Follow Wait
ChangeLane Stop KeepLane Stop Wait
Traverse until the leaf node, with exploration & exploitation
Wait KeepLane ChangeLane KeepLane
Simulate! Backpropagate!
Overall performance
(averaged over 1000 episodes) 15
Options / Reinforcement Learning / Runtime Verification / Monte Carlo Tree Search (MCTS)
16
This work is supported by the Japanese Science and Technology agency (JST) ERATO project JPMJER1603: HASUO Metamathematics for Systems Design, and by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant: Model-Based Synthesis and Safety Assurance of Intelligent Controllers for Autonomous Vehicles.