DeepTraffic: Driving Fast through Dense Traffic with Deep - - PowerPoint PPT Presentation

deeptraffic driving fast through dense traffic with deep
SMART_READER_LITE
LIVE PREVIEW

DeepTraffic: Driving Fast through Dense Traffic with Deep - - PowerPoint PPT Presentation

DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning Lex Fridman DeepTraffic: Driving Fast through Dense Traffic Lex Fridman GTC 2017 with Deep Reinforcement Learning fridman@mit.edu May 11 DeepTraffic: Driving


slide-1
SLIDE 1

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Lex Fridman

slide-2
SLIDE 2

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

slide-3
SLIDE 3

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Americans spend 8 billion hours stuck in traffic every year.

slide-4
SLIDE 4

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Goal:

Deep Learning for Everyone

accessible and fun: seconds to start, eternity* to master

http://cars.mit.edu

  • r search for:

“DeepTraffic”

* estimated time to discover globally optimal solution

slide-5
SLIDE 5

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

To Play: To Win:

Goal:

Deep Learning for Everyone

slide-6
SLIDE 6

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Machine Learning from Human and Machine

Memorization Understanding

slide-7
SLIDE 7

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

http://cars.mit.edu/deeptesla

slide-8
SLIDE 8

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Naturalistic Driving Data

Teslas instrumented: 18 Hours of data: 6,000+ hours Distance traveled: 140,000+ miles Video frames: 2+ billion Autopilot: ~12%

slide-9
SLIDE 9

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Naturalistic Driving Data

slide-10
SLIDE 10

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

http://cars.mit.edu/deeptesla

slide-11
SLIDE 11

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

  • Localization and Mapping:

Where am I?

  • Scene Understanding:

Where/who/what/why of everyone else?

  • Movement Planning:

How do I get from A to B?

  • Driver State:

What’s the driver up to?

  • Communicate:

How to I convey intent to the driver and to the world?

slide-12
SLIDE 12

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Autonomous Driving: A Hierarchical View

Paden B, Čáp M, Yong SZ, Yershov D, Frazzoli E. "A Survey of Motion Planning and Control Techniques for Self- driving Urban Vehicles." IEEE Transactions on Intelligent Vehicles 1.1 (2016): 33-55.

slide-13
SLIDE 13

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Applying Deep Reinforcment Learning to Micro-Traffic Simulation

Reference: http://www.traffic-simulation.de

slide-14
SLIDE 14

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Formulate Driving as Reinforcement Learning Problem

How to formalize and learn driving?

slide-15
SLIDE 15

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Philosophical Motivation for Reinforcement Learning

Takeaway from Supervised Learning: Neural networks are great at memorization and not (yet) great at reasoning. Hope for Reinforcement Learning: Brute-force propagation of outcomes to knowledge about states and

  • actions. This is a kind of brute-force “reasoning”.
slide-16
SLIDE 16

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

(Deep) Reinforcement Learning

  • Pros:
  • Cheap: Very little human annotation is needed.
  • Robust: Can learn to act under uncertainty.
  • General: Can (seemingly) deal with (huge) raw sensory

input.

  • Promising: Our current best framework for achieving

“intelligence”.

  • Cons
  • Constrained by Formalism: Have to formally define the

state space, the action space, the reward, and the simulated environment.

  • Huge Data: Have to be able to simulate (in software or

hardware) or have a lot of real-world examples.

slide-17
SLIDE 17

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Agent and Environment

  • At each step the agent:
  • Executes action
  • Receives observation (new state)
  • Receives reward
  • The environment:
  • Receives action
  • Emits observation (new state)
  • Emits reward

References: [80]

slide-18
SLIDE 18

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Markov Decision Process

𝑡0, 𝑏0, 𝑠1, 𝑡1, 𝑏1, 𝑠2, … , 𝑡𝑜−1, 𝑏𝑜−1, 𝑠𝑜,𝑡𝑜

state action reward Terminal state

References: [84]

slide-19
SLIDE 19

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Major Components of an RL Agent

An RL agent may include one or more of these components:

  • Policy: agent’s behavior function
  • Value function: how good is each state and/or action
  • Model: agent’s representation of the environment

𝑡0, 𝑏0, 𝑠1, 𝑡1, 𝑏1, 𝑠2, … , 𝑡𝑜−1, 𝑏𝑜−1, 𝑠𝑜,𝑡𝑜

state action reward Terminal state

slide-20
SLIDE 20

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Robot in a Room

+1

  • 1

START

actions: UP , DOWN, LEFT , RIGHT UP 80% move UP 10% move LEFT 10% move RIGHT

  • reward +1 at [4,3], -1 at [4,2]
  • reward -0.04 for each step
  • what’s the strategy to achieve max reward?
  • what if the actions were deterministic?
slide-21
SLIDE 21

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Is this a solution?

+1

  • 1
  • only if actions deterministic
  • not in this case (actions are stochastic)
  • solution/policy
  • mapping from each state to an action
slide-22
SLIDE 22

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Optimal policy

+1

  • 1
slide-23
SLIDE 23

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Reward for each step -2

+1

  • 1
slide-24
SLIDE 24

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Reward for each step: -0.1

+1

  • 1
slide-25
SLIDE 25

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Reward for each step: -0.04

+1

  • 1
slide-26
SLIDE 26

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Reward for each step: -0.01

+1

  • 1
slide-27
SLIDE 27

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Reward for each step: +0.01

+1

  • 1
slide-28
SLIDE 28

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Value Function

  • Future reward

𝑆 = 𝑠1 + 𝑠2 + 𝑠3 + ⋯ + 𝑠𝑜 𝑆𝑢 = 𝑠

𝑢 + 𝑠 𝑢+1 + 𝑠 𝑢+2 + ⋯ + 𝑠 𝑜

  • Discounted future reward (environment is stochastic)

𝑆𝑢 = 𝑠𝑢 + 𝛿𝑠𝑢+1 + 𝛿2𝑠𝑢+2 + ⋯ + 𝛿𝑜−𝑢𝑠𝑜 = 𝑠𝑢 + 𝛿(𝑠𝑢+1 + 𝛿(𝑠𝑢+2 + ⋯)) = 𝑠𝑢 + 𝛿𝑆𝑢+1

  • A good strategy for an agent would be to always choose

an action that maximizes the (discounted) future reward

References: [84]

slide-29
SLIDE 29

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Q-Learning

  • State-action value function: Q(s,a)
  • Expected return when starting in s,

performing a, and following 

  • Q-Learning: Use any policy to estimate Q that maximizes future reward:
  • Q directly approximates Q* (Bellman optimality equation)
  • Independent of the policy being followed
  • Only requirement: keep updating each (s,a) pair

s a s’ r

New State Old State Reward Learning Rate Discount Factor

slide-30
SLIDE 30

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Exploration vs Exploitation

  • Key ingredient of Reinforcement Learning
  • Deterministic/greedy policy won’t explore all actions
  • Don’t know anything about the environment at the beginning
  • Need to try all actions to find the optimal one
  • Maintain exploration
  • Use soft policies instead: (s,a)>0 (for all s,a)
  • ε-greedy policy
  • With probability 1-ε perform the optimal/greedy action
  • With probability ε perform a random action
  • Will keep exploring the environment
  • Slowly move it towards greedy policy: ε -> 0
slide-31
SLIDE 31

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Q-Learning: Value Iteration

References: [84]

A1 A2 A3 A4 S1 +1 +2

  • 1

S2 +2 +1

  • 2

S3

  • 1

+1

  • 2

S4

  • 2

+1 +1

slide-32
SLIDE 32

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Q-Learning: Representation Matters

  • In practice, Value Iteration is impractical
  • Very limited states/actions
  • Cannot generalize to unobserved states
  • Think about the Breakout game
  • State: screen pixels
  • Image size: 𝟗𝟓 × 𝟗𝟓 (resized)
  • Consecutive 4 images
  • Grayscale with 256 gray levels

𝟑𝟔𝟕𝟗𝟓×𝟗𝟓×𝟓 rows in theQ-table!

References: [83, 84]

slide-33
SLIDE 33

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Philosophical Motivation for Deep Reinforcement Learning

Takeaway from Supervised Learning: Neural networks are great at memorization and not (yet) great at reasoning. Hope for Reinforcement Learning: Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”. Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable learning of the optimal thing to do given a formalized set of actions and states (possibly huge).

slide-34
SLIDE 34

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Deep Q-Learning

Use a function (with parameters) to approximate the Q-function

  • Linear
  • Non-linear: Q-Network

References: [83]

slide-35
SLIDE 35

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Deep Q-Network: Atari

Mnih et al. "Playing atari with deep reinforcement learning." 2013.

References: [83]

slide-36
SLIDE 36

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Atari Breakout

References: [85]

After 120 Minutes

  • f Training

After 10 Minutes

  • f Training

After 240 Minutes

  • f Training
slide-37
SLIDE 37

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

DQN Results in Atari

References: [83]

slide-38
SLIDE 38

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Deep Q-Network: DeepTraffic

slide-39
SLIDE 39

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Deep Q-Network Training

Given a transition < s, a, r, s’ >, the Q-table update rule in the previous algorithm must be replaced with the following:

  • Do a feedforward pass for the current state s to get

predicted Q-values for all actions

  • Do a feedforward pass for the next state s’ and calculate

maximum overall network outputs max a’ Q(s’, a’)

  • Set Q-value target for action to r + γmax a’Q(s’, a’) (use

the max calculated in step 2).

  • For all other actions, set the Q-value target to the same as
  • riginally returned from step 1, making the error 0 for those
  • utputs.
  • Update the weights using backpropagation.

References: [83]

slide-40
SLIDE 40

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Philosophical Motivation for Deep Reinforcement Learning

Takeaway from Supervised Learning: Neural networks are great at memorization and not (yet) great at reasoning. Hope for Reinforcement Learning: Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”. Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable learning of the optimal thing to do given a formalized set of actions and states (possibly huge in size).

slide-41
SLIDE 41

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Driving may need more than SLAM, Perception, and Control

References: (Karaman RRT*)

slide-42
SLIDE 42

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Moravec’s Paradox: The “Easy” Problems are Hard

Soccer is harder than Chess

References: [8, 9]

slide-43
SLIDE 43

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Formulate Driving as a Reinforcement Learning Problem

http://cars.mit.edu/deeptrafficjs

slide-44
SLIDE 44

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

The Road, The Car, The Speed

slide-45
SLIDE 45

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

The Road, The Car, The Speed

  • Solo motion planning subtasks
  • Longitudinal: speed
  • Lateral: lane choice
  • Vehicular interaction subtasks
  • Longitudinal: car-following
  • Lateral: lane changing
slide-46
SLIDE 46

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

The Road, The Car, The Speed

slide-47
SLIDE 47

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

“Safety System”: Motion and Control are Given

slide-48
SLIDE 48

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Learning the “Behavioral Layer” Task

slide-49
SLIDE 49

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Learning the “Behavioral Layer” Task

slide-50
SLIDE 50

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Action Space

slide-51
SLIDE 51

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Driving / Learning

slide-52
SLIDE 52

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Learning Input

slide-53
SLIDE 53

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Deep RL: Q-Function Learning Parameters

slide-54
SLIDE 54

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Deep RL: Layers

slide-55
SLIDE 55

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Deep RL: Output (Actions)

slide-56
SLIDE 56

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

ConvNetJS: Options

slide-57
SLIDE 57

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Formulate Driving as a Reinforcement Learning Problem

http://cars.mit.edu/deeptrafficjs

slide-58
SLIDE 58

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Slides available at http://cars.mit.edu/gtc

slide-59
SLIDE 59

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

OpenAI Gym: From JS to TensorFlow

  • 1. Formulate DeepTraffic as a reinforcement learning task.
  • 2. Use TensorFlow/Keras/PyTorch to train an agent
slide-60
SLIDE 60

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Formulate DeepTraffic as a Reinforcement Learning Task

slide-61
SLIDE 61

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Adding a Deep Q-Network (with Keras)

Example: https://github.com/matthiasplappert/keras-rl

slide-62
SLIDE 62

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Adding a Deep Q-Network (with Keras)

Example: https://github.com/matthiasplappert/keras-rl

slide-63
SLIDE 63

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

http://cars.mit.edu

DeepTraffic

v1.0: In MIT v1.1: Outside MIT

slide-64
SLIDE 64

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

http://cars.mit.edu

DeepTraffic 2.0

1st place: Titan XP 2nd place: GeForce GTX 1080 Ti 3rd place: Jetson TX2

slide-65
SLIDE 65

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

http://cars.mit.edu

DeepTraffic

Challenge to GTC Attendees:

  • Create account on the site and put “GTC” as

how you heard about us.

  • Make a neural network that travels 70+ mph.
slide-66
SLIDE 66

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Have fun with Deep RL and DeepTraffic!

slide-67
SLIDE 67

Lex Fridman fridman@mit.edu GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

Have fun with Deep RL and DeepTraffic! But not too much fun...

Slides available at http://cars.mit.edu/gtc