for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New - - PowerPoint PPT Presentation

β–Ά
for end to end simulated driving
SMART_READER_LITE
LIVE PREVIEW

for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New - - PowerPoint PPT Presentation

Query-Efficient Imitation Learning for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New York University Overview Introduction End-to-end learning for self-driving Related work Learning method Convolutional


slide-1
SLIDE 1

Query-Efficient Imitation Learning for End-to-End Simulated Driving

Jiakai Zhang, Kyunghyun Cho New York University

slide-2
SLIDE 2
  • Introduction
  • End-to-end learning for self-driving
  • Related work
  • Learning method
  • Convolutional neural network
  • Imitation learning using SafeDAgger
  • Experiment
  • Setup
  • Results
  • Conclusion and future work

Overview

slide-3
SLIDE 3

Introduction

  • End-to-end learning for self-driving
  • Sensory input from front-facing camera
  • Control signal

Steering Brake

slide-4
SLIDE 4

Introduction

  • Related work
  • Supervised learning
  • ALVINN net [Pomerleau 1989]
  • DeepDriving [Chen et al. 2015]
  • End-to-end learning for self-driving cars [Bojarski et al. 2016]
  • Imitation learning
  • DAgger [Ross, Gordon, and Bagnell 2010]
  • SafeDAgger [Zhang and Cho 2017]
slide-5
SLIDE 5

DAgger algorithm

Dataset 𝐸0 Policy 𝜌1

Initialize

Policy πœŒπ‘— = π›Ύπ‘—πœŒβˆ— + (1 βˆ’ 𝛾𝑗) πœŒπ‘— Dataset 𝐸𝑗 = 𝐸′ βˆͺ πΈπ‘—βˆ’1 Dataset 𝐸′ Policy πœŒπ‘—

Iteration Return

Best policy πœŒπ‘—

Disadvantage:

  • Query a reference policy

constantly

  • Safe issue to environment
slide-6
SLIDE 6

SafeDAgger algorithm

Dataset 𝐸0 Policy 𝜌1

Initialize

Dataset 𝐸𝑗 = 𝐸′ βˆͺ πΈπ‘—βˆ’1 Dataset 𝐸′ not safe Policy πœŒπ‘—

Iteration Return

Best policy πœŒπ‘— Safety classifier 𝑑1 Policy πœŒπ‘— = π›Ύπ‘—πœŒβˆ— + (1 βˆ’ 𝛾𝑗) πœŒπ‘— Safety classifier 𝑑𝑗 Safety classifier 𝑑1 Safety classifier 𝑑𝑗

Advantage:

  • Query-efficient
  • Safety feature
slide-7
SLIDE 7
  • Safety classifier
  • Deviation of a primary policy from a reference

policy defined

  • Optimal safety classifier defined as
  • Learning safety classifier
  • Minimize a binary cross-entropy loss
slide-8
SLIDE 8

Experiment – Setup

  • TORCS – Open source racing game

Training tracks Test tracks

slide-9
SLIDE 9

Experiment – Model

Input image – 3x160x72 Convolutional layer – 64x3x3 Max Pooling – 2x2 Convolutional layer – 128x5x5 Fully connected layer Control signals Environment variables x 4 x 2

Primary policy

Feature map Fully connected layer Safety value x 2

Safety classifier Optimization algorithm: stochastic gradient descent

slide-10
SLIDE 10

Results

Safe Frames Unsafe Frames

slide-11
SLIDE 11
  • Evaluation on test tracks
  • 1. Mean squared error of steering angle
  • 2. Damage per lap
  • 3. Number of laps
  • 4. Portion of time driven by a reference policy

Results

slide-12
SLIDE 12

Dashed curve – with traffic Solid curve – without traffic

Mean squared error of steering angle

Results

# of Dagger Iterations MSE (Steering Angle)

slide-13
SLIDE 13

Damage per Lap

Dashed curve – with traffic Solid curve – without traffic

Results

# of Dagger Iterations Damage per Lap

slide-14
SLIDE 14

Number of Laps

Dashed curve – with traffic Solid curve – without traffic

Results

# of Dagger Iterations

  • Avg. # of Laps
slide-15
SLIDE 15

Portion of time driven by a reference policy

Dashed curve – with traffic Solid curve – without traffic

Results

# of Dagger Iterations % of csafe = 0

slide-16
SLIDE 16

Demo

slide-17
SLIDE 17

Conclusion

  • Proposed SafeDAgger algorithm
  • Query efficient
  • Safety feature
  • End-to-end simulated driving
  • Trained a convolutional neural network to drive in

TORCS with traffic

Future work

  • Evaluate SafeDAgger in the real world
  • Learn to use temporal information