w ise m ove
play

W ISE M OVE ? A research platform that mimics our autonomous driving - PowerPoint PPT Presentation

W ISE M OVE : A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving Jaeyoung Lee , Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards University of Waterloo September, 14th, 2019 1 W ISE M


  1. W ISE M OVE : A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving Jaeyoung Lee , Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards University of Waterloo September, 14th, 2019

  2. 1 W ISE M OVE ? ‣ A research platform that mimics our autonomous driving stack. ‣ Objective: investigate the safety and performance of motion planners trained using deep reinforcement learning ‣ Features: ✓ Hierarchical Decision Making ✓ Runtime Verification ✓ Reinforcement Learning / Monte Carlo Tree Search (MCTS)

  3. 2 Motion Planning Architecture in 100 km Public Drive (2018) Motion Planner No learning component … (abstracted) Behaviour Planner high-level decision Local Planner reference trajectories measurements, perceptions, etc.

  4. 3 W ISE M OVE Architecture Deep models are trained by Motion Planner (w/o MCTS) deep reinforcement learning. Deep Model for Decision Making Option (high-level decision) Deep Model for Trajectory Generation reference trajectories Road Scenario stop region STOP ego measurements, stop region intersection perceptions, etc. STOP

  5. 4 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making Option Road Scenario ‣ Five Options: KeepLane, Stop, Wait, Follow, ChangeLane stop region ‣ Components STOP ego ✓ speed limit, target lane stop region intersection ✓ time-out (e.g., 1 sec.) STOP ✓ preconditions , e.g., in an option ‘Wait’, ‣ Two “two-lane and one-way” roads G((has_stopped_in_stop_region ‣ All-ways stop implemented by the stop region and in_stop_region) U highest_priority) ‣ 0~5 other vehicles Deep Model for Trajectory Generation

  6. 5 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making Runtime Verifier Road Scenario ‣ Checks LTL-like strings until violated. stop region STOP ✓ preconditions, e.g., in an option ‘Wait’, ego stop region intersection G((has_stopped_in_stop_region and in_stop_region) U highest_priority) STOP ‣ An episode ends when: ✓ tra ffi c-rules, e.g., in a stop region, ✓ Ego reaches the right end on the road, G(in_stop_region => ✓ a tra ffi c rule is violated, or (in_stop_region U has_stopped_in_stop_region)) ✓ a collision happens. Deep Model for Trajectory Generation

  7. 6 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making ‣ Choose the ‘best’ Option. Input: a state representation Output: the learnt ‘best’ Option ‣ Act upon the termination of the current Option. Option (high-level decision) Next Option? Deep Model for Trajectory Generation

  8. 7 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making Option (high-level decision) Next Option? Deep Model for Trajectory Generation ‣ A deep model is stored for each Option. Input: a state representation (simplified) Output: reference trajectories, given an Option ‣ Trajectories generated with simplified vehicle model.

  9. 8 W ISE M OVE Architecture Motion Planner (w/o MCTS) Deep Model for Decision Making Option “ ” KeepLane Next Option? Follow Wait time Stop Deep Model for Trajectory Generation reference trajectory “____” To the road scenario

  10. 9 Training & Testing Low-level Deep Models ‣ Five Deep Models —one for each Option. ‣ Each model ✓ outputs continuous control commands generating the trajectories ✓ was trained by reinforcement learning (DDPG) with ✓ 20 sec. timeout ✓ (additional) preconditions and, if necessary, tra ffi c rules. Option “ ” Deep Model for Trajectory Generation reference trajectory “____”

  11. 10 After 100,000 steps training … KeepLane Stop Follow Wait

  12. 11 After 100,000 steps training … KeepLane mean (std) % success after 100,000 training Stop (averaged over 100 trials of 100 episodes) Follow Wait

  13. 12 After 1,000,000 steps training … KeepLane Stop Follow Wait

  14. 13 Training & Testing High-level Deep Model ‣ Each low-level deep model is trained a priori for 1,000,000 steps. ‣ One deep model, trained by reinforcement learning (DQN), outputs an Option. ‣ 1 sec. time-out for each option; 20 sec. time-out for an entire episode. Motion Planner (w/o MCTS) Deep Model for Decision Making Option “ ” Next Option? KeepLane Follow Wait time Stop Deep Model for Trajectory Generation reference trajectory “____” To the road scenario

  15. 14 Training & Testing High-level Deep Model Overall performance (after 200,000 steps training) (averaged over 1000 episodes)

  16. 15 With MCTS over Options … . . . Stop Traverse until the leaf node, Wait with exploration & exploitation ChangeLane KeepLane KeepLane KeepLane Wait ⊥ Simulate! . . . ChangeLane KeepLane current Wait Follow Stop Backpropagate! state Stop ChangeLane Stop Wait Follow Stop Overall performance (averaged over 1000 episodes)

  17. 16 Concluding Remarks ‣ Features: Options / Reinforcement Learning / Runtime Verification / Monte Carlo Tree Search (MCTS) ‣ The results are reproducible using the publicly available code at git.uwaterloo.ca/wise-lab/wise-move/ ‣ Future works ✓ Comparisons of RL and hand-coded motion planners. ✓ Di ff erent scenarios, realistic vehicle dynamics, etc. ✓ Simulation-to-Real

  18. Thank you for attention! Q & A Acknowledgment This work is supported by the Japanese Science and Technology agency (JST) ERATO project JPMJER1603: HASUO Metamathematics for Systems Design, and by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant: Model-Based Synthesis and Safety Assurance of Intelligent Controllers for Autonomous Vehicles.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend