deeptraffic driving fast through dense traffic with deep
play

DeepTraffic: Driving Fast through Dense Traffic with Deep - PowerPoint PPT Presentation

DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning Lex Fridman DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11 DeepTraffic: Driving


  1. DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning Lex Fridman DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  2. DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  3. Americans spend 8 billion hours stuck in traffic every year. DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  4. Goal: Deep Learning for Everyone accessible and fun: seconds to start, eternity* to master http://cars.mit.edu or search for: “DeepTraffic” * estimated time to discover globally optimal solution DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  5. Goal: Deep Learning for Everyone To Play: To Win: DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  6. Machine Learning from Human and Machine Memorization Understanding DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  7. http://cars.mit.edu/deeptesla DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  8. Naturalistic Driving Data Teslas instrumented: 18 Hours of data: 6,000+ hours Distance traveled: 140,000+ miles Video frames: 2+ billion Autopilot: ~12% DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  9. Naturalistic Driving Data DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  10. http://cars.mit.edu/deeptesla DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  11. • Localization and Mapping: Where am I? • Scene Understanding: Where/who/what/why of everyone else? • Movement Planning: How do I get from A to B? • Driver State: What’s the driver up to? • Communicate: How to I convey intent to the driver and to the world? DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  12. Autonomous Driving: A Hierarchical View Paden B, Čáp M, Yong SZ, Yershov D, Frazzoli E. "A Survey of Motion Planning and Control Techniques for Self- driving Urban Vehicles." IEEE Transactions on Intelligent Vehicles 1.1 (2016): 33-55. DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  13. Applying Deep Reinforcment Learning to Micro-Traffic Simulation DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 Reference: http://www.traffic-simulation.de with Deep Reinforcement Learning fridman@mit.edu May 11

  14. Formulate Driving as Reinforcement Learning Problem How to formalize and learn driving? DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  15. Philosophical Motivation for Reinforcement Learning Takeaway from Supervised Learning: Neural networks are great at memorization and not (yet) great at reasoning. Hope for Reinforcement Learning: Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute- force “reasoning”. DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  16. (Deep) Reinforcement Learning • Pros: • Cheap: Very little human annotation is needed. • Robust: Can learn to act under uncertainty. • General: Can (seemingly) deal with (huge) raw sensory input. • Promising: Our current best framework for achieving “intelligence”. • Cons • Constrained by Formalism: Have to formally define the state space, the action space, the reward, and the simulated environment. • Huge Data: Have to be able to simulate (in software or hardware) or have a lot of real-world examples. DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  17. Agent and Environment • At each step the agent: • Executes action • Receives observation (new state) • Receives reward • The environment: • Receives action • Emits observation (new state) • Emits reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 References: [80] with Deep Reinforcement Learning fridman@mit.edu May 11

  18. Markov Decision Process 𝑡 0 , 𝑏 0 , 𝑠 1 , 𝑡 1 , 𝑏 1 , 𝑠 2 , … , 𝑡 𝑜 −1 , 𝑏 𝑜 −1 , 𝑠 𝑜 , 𝑡 𝑜 state Terminal state action reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 References: [84] with Deep Reinforcement Learning fridman@mit.edu May 11

  19. Major Components of an RL Agent An RL agent may include one or more of these components: • Policy: agent’s behavior function • Value function: how good is each state and/or action • Model: agent’s representation of the environment 𝑡 0 , 𝑏 0 , 𝑠 1 , 𝑡 1 , 𝑏 1 , 𝑠 2 , … , 𝑡 𝑜 −1 , 𝑏 𝑜 −1 , 𝑠 𝑜 , 𝑡 𝑜 state Terminal state action reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  20. Robot in a Room actions: UP , DOWN, LEFT , RIGHT +1 UP -1 80% move UP 10% move LEFT 10% move RIGHT START • reward +1 at [4,3], -1 at [4,2] • reward -0.04 for each step • what’s the strategy to achieve max reward? • what if the actions were deterministic? DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  21. Is this a solution? +1 -1 • only if actions deterministic • not in this case (actions are stochastic) • solution/policy • mapping from each state to an action DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  22. Optimal policy +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  23. Reward for each step -2 +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  24. Reward for each step: -0.1 +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  25. Reward for each step: -0.04 +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  26. Reward for each step: -0.01 +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  27. Reward for each step: +0.01 +1 -1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  28. Value Function • Future reward 𝑆 = 𝑠 1 + 𝑠 2 + 𝑠 3 + ⋯ + 𝑠 𝑜 𝑆 𝑢 = 𝑠 𝑢 + 𝑠 𝑢 +1 + 𝑠 𝑢 +2 + ⋯ + 𝑠 𝑜 • Discounted future reward (environment is stochastic) 𝑆 𝑢 = 𝑠 𝑢 + 𝛿𝑠 𝑢 +1 + 𝛿 2 𝑠 𝑢 +2 + ⋯ + 𝛿 𝑜 − 𝑢 𝑠 𝑜 = 𝑠 𝑢 + 𝛿 ( 𝑠 𝑢 +1 + 𝛿 ( 𝑠 𝑢 +2 + ⋯ )) = 𝑠 𝑢 + 𝛿𝑆 𝑢 +1 • A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 References: [84] with Deep Reinforcement Learning fridman@mit.edu May 11

  29. Q-Learning s a • State-action value function: Q  (s,a) r • Expected return when starting in s , performing a, and following  s’ • Q-Learning: Use any policy to estimate Q that maximizes future reward: • Q directly approximates Q* (Bellman optimality equation) • Independent of the policy being followed • Only requirement: keep updating each (s,a) pair Learning Rate Discount Factor New State Old State Reward DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  30. Exploration vs Exploitation • Key ingredient of Reinforcement Learning • Deterministic/greedy policy won’t explore all actions • Don’t know anything about the environment at the beginning • Need to try all actions to find the optimal one • Maintain exploration Use soft policies instead:  (s,a)>0 (for all s,a) • • ε -greedy policy • With probability 1- ε perform the optimal/greedy action • With probability ε perform a random action • Will keep exploring the environment • Slowly move it towards greedy policy: ε -> 0 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11

  31. Q-Learning: Value Iteration A1 A2 A3 A4 S1 +1 +2 -1 0 S2 +2 0 +1 -2 S3 -1 +1 0 -2 S4 -2 0 +1 +1 DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 References: [84] with Deep Reinforcement Learning fridman@mit.edu May 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend