W ISE M OVE ? A research platform that mimics our autonomous driving - - PowerPoint PPT Presentation

w ise m ove
SMART_READER_LITE
LIVE PREVIEW

W ISE M OVE ? A research platform that mimics our autonomous driving - - PowerPoint PPT Presentation

W ISE M OVE : A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving Jaeyoung Lee , Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards University of Waterloo September, 14th, 2019 1 W ISE M


slide-1
SLIDE 1

WISEMOVE: A Framework to Investigate

Safe Deep Reinforcement Learning for Autonomous Driving

Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards

University of Waterloo

September, 14th, 2019

slide-2
SLIDE 2

WISEMOVE?

  • A research platform that mimics our autonomous driving stack.

1

  • Objective: investigate the safety and performance of motion planners

trained using deep reinforcement learning

  • Features:

✓ Hierarchical Decision Making ✓ Runtime Verification ✓ Reinforcement Learning / Monte Carlo Tree Search (MCTS)

slide-3
SLIDE 3

No learning component … Motion Planner

Local Planner

reference trajectories measurements, perceptions, etc. high-level decision

(abstracted)

Behaviour Planner

Motion Planning Architecture in 100 km Public Drive (2018)

2

slide-4
SLIDE 4

(w/o MCTS)

Deep Model for Trajectory Generation

reference trajectories measurements, perceptions, etc.

Deep Model for Decision Making

Option

Deep models are trained by deep reinforcement learning.

(high-level decision)

STOP STOP

stop region stop region intersection ego

WISEMOVE Architecture

3

Road Scenario

Motion Planner

slide-5
SLIDE 5

Deep Model for Trajectory Generation Deep Model for Decision Making

Option

  • Five Options:

KeepLane, Stop, Wait, Follow, ChangeLane

  • Components

✓ preconditions

  • Two “two-lane and one-way” roads
  • All-ways stop implemented by the stop region
  • 0~5 other vehicles

Road Scenario

STOP STOP

stop region stop region intersection ego

WISEMOVE Architecture

4

(w/o MCTS)

Motion Planner

G((has_stopped_in_stop_region and in_stop_region) U highest_priority)

✓ time-out (e.g., 1 sec.) ✓ speed limit, target lane , e.g., in an option ‘Wait’,

slide-6
SLIDE 6

Deep Model for Trajectory Generation Deep Model for Decision Making

Runtime Verifier

G((has_stopped_in_stop_region and in_stop_region) U highest_priority)

  • Checks LTL-like strings until violated.

G(in_stop_region => (in_stop_region U has_stopped_in_stop_region))

  • An episode ends when:

✓ Ego reaches the right end on the road, ✓ a traffic rule is violated, or ✓ a collision happens.

WISEMOVE Architecture

5

Road Scenario

STOP STOP

stop region stop region intersection ego

(w/o MCTS)

Motion Planner

✓preconditions, e.g., in an option ‘Wait’, ✓traffic-rules, e.g., in a stop region,

slide-7
SLIDE 7

Deep Model for Decision Making Deep Model for Trajectory Generation

Input: a state representation Output: the learnt ‘best’ Option Option (high-level decision) Next Option?

  • Act upon the termination of the current Option.

WISEMOVE Architecture

6

(w/o MCTS)

Motion Planner

  • Choose the ‘best’ Option.
slide-8
SLIDE 8

Deep Model for Trajectory Generation

Input: a state representation (simplified) Output: reference trajectories, given an Option Option (high-level decision)

  • A deep model is stored for each Option.

WISEMOVE Architecture

7

Deep Model for Decision Making

Next Option? (w/o MCTS)

Motion Planner

  • Trajectories generated with simplified vehicle model.
slide-9
SLIDE 9

WISEMOVE Architecture

8

Deep Model for Trajectory Generation Deep Model for Decision Making

Next Option?

time Follow Stop Wait KeepLane

reference trajectory “____” To the road scenario

(w/o MCTS)

Motion Planner

Option “ ”

slide-10
SLIDE 10

9

Deep Model for Trajectory Generation

reference trajectory “____”

Training & Testing Low-level Deep Models

✓ was trained by reinforcement learning (DDPG) with

✓ 20 sec. timeout ✓(additional) preconditions and, if necessary, traffic rules.

Option “ ”

  • Five Deep Models —one for each Option.
  • Each model

✓outputs continuous control commands generating the trajectories

slide-11
SLIDE 11

After 100,000 steps training … KeepLane Stop Follow Wait

10

slide-12
SLIDE 12

After 100,000 steps training … KeepLane Stop Follow Wait

mean (std) % success after 100,000 training

(averaged over 100 trials of 100 episodes)

11

slide-13
SLIDE 13

After 1,000,000 steps training … KeepLane Stop Follow Wait

12

slide-14
SLIDE 14
  • Each low-level deep model is trained a priori for 1,000,000 steps.

Training & Testing High-level Deep Model

Deep Model for Trajectory Generation Deep Model for Decision Making

Next Option?

time Follow Stop Wait KeepLane

reference trajectory “____” To the road scenario

(w/o MCTS)

Motion Planner

Option “ ”

  • One deep model, trained by reinforcement learning (DQN), outputs an Option.
  • 1 sec. time-out for each option; 20 sec. time-out for an entire episode.

13

slide-15
SLIDE 15

Overall performance (after 200,000 steps training)

(averaged over 1000 episodes)

Training & Testing High-level Deep Model

14

slide-16
SLIDE 16

With MCTS over Options …

KeepLane Follow Wait Stop

current state

Stop ChangeLane Stop Follow Wait

. . .

ChangeLane Stop KeepLane Stop Wait

. . .

Traverse until the leaf node, with exploration & exploitation

Wait KeepLane ChangeLane KeepLane

Simulate! Backpropagate!

Overall performance

(averaged over 1000 episodes) 15

slide-17
SLIDE 17
  • The results are reproducible using the publicly available code at
  • Future works

✓ Comparisons of RL and hand-coded motion planners. ✓ Different scenarios, realistic vehicle dynamics, etc. ✓ Simulation-to-Real

Concluding Remarks

git.uwaterloo.ca/wise-lab/wise-move/

  • Features:

Options / Reinforcement Learning / Runtime Verification / Monte Carlo Tree Search (MCTS)

16

slide-18
SLIDE 18

Thank you for attention! Q & A

Acknowledgment

This work is supported by the Japanese Science and Technology agency (JST) ERATO project JPMJER1603: HASUO Metamathematics for Systems Design, and by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant: Model-Based Synthesis and Safety Assurance of Intelligent Controllers for Autonomous Vehicles.