for end to end simulated driving
play

for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New - PowerPoint PPT Presentation

Query-Efficient Imitation Learning for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New York University Overview Introduction End-to-end learning for self-driving Related work Learning method Convolutional


  1. Query-Efficient Imitation Learning for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New York University

  2. Overview  Introduction • End-to-end learning for self-driving • Related work  Learning method • Convolutional neural network • Imitation learning using SafeDAgger  Experiment • Setup • Results  Conclusion and future work

  3. Introduction  End-to-end learning for self-driving • Sensory input from front-facing camera • Control signal Steering Brake

  4. Introduction  Related work • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving cars [Bojarski et al. 2016] • Imitation learning • DAgger [Ross, Gordon, and Bagnell 2010] • SafeDAgger [Zhang and Cho 2017]

  5. DAgger algorithm Dataset 𝐸 0 Policy 𝜌 1 Initialize Policy 𝜌 𝑗 = 𝛾 𝑗 𝜌 ∗ + (1 − 𝛾 𝑗 ) Dataset 𝐸 ′ 𝜌 𝑗 Iteration Dataset 𝐸 𝑗 = 𝐸 ′ ∪ 𝐸 𝑗−1 Policy 𝜌 𝑗 Disadvantage: Return Best policy 𝜌 𝑗 • Query a reference policy constantly • Safe issue to environment

  6. SafeDAgger algorithm Policy 𝜌 1 Safety classifier 𝑑 1 Initialize Dataset 𝐸 0 Policy 𝜌 𝑗 = 𝛾 𝑗 𝜌 ∗ + (1 − 𝛾 𝑗 ) 𝜌 𝑗 Dataset 𝐸 ′ not safe Safety classifier 𝑑 𝑗 Iteration Policy 𝜌 𝑗 Dataset 𝐸 𝑗 = 𝐸 ′ ∪ 𝐸 𝑗−1 Safety classifier 𝑑 1 Advantage: Return Safety classifier 𝑑 𝑗 Best policy 𝜌 𝑗 • Query-efficient • Safety feature

  7.  Safety classifier • Deviation of a primary policy from a reference policy defined • Optimal safety classifier defined as  Learning safety classifier • Minimize a binary cross-entropy loss

  8. Experiment – Setup  TORCS – Open source racing game Training tracks Test tracks

  9. Experiment – Model Input image – 3x160x72 Convolutional layer – 64x3x3 x 4 Max Pooling – 2x2 Convolutional layer – 128x5x5 Feature map x 2 Fully connected layer x 2 Fully connected layer Control Environment Safety value signals variables Primary policy Safety classifier Optimization algorithm: stochastic gradient descent

  10. Results Safe Frames Unsafe Frames

  11. Results  Evaluation on test tracks 1. Mean squared error of steering angle 2. Damage per lap 3. Number of laps 4. Portion of time driven by a reference policy

  12. Results Mean squared error of steering angle MSE (Steering Angle) # of Dagger Iterations Dashed curve – with traffic Solid curve – without traffic

  13. Results Damage per Lap Damage per Lap # of Dagger Iterations Dashed curve – with traffic Solid curve – without traffic

  14. Results Number of Laps Avg. # of Laps # of Dagger Iterations Dashed curve – with traffic Solid curve – without traffic

  15. Results Portion of time driven by a reference policy % of c safe = 0 # of Dagger Iterations Dashed curve – with traffic Solid curve – without traffic

  16. Demo

  17. Conclusion  Proposed SafeDAgger algorithm • Query efficient • Safety feature  End-to-end simulated driving • Trained a convolutional neural network to drive in TORCS with traffic Future work  Evaluate SafeDAgger in the real world  Learn to use temporal information

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend