Corner Cases Powered by the Jetson TX Series Sascha Hornauer, - - PowerPoint PPT Presentation

corner cases
SMART_READER_LITE
LIVE PREVIEW

Corner Cases Powered by the Jetson TX Series Sascha Hornauer, - - PowerPoint PPT Presentation

Studying Autonomous Driving Corner Cases Powered by the Jetson TX Series Sascha Hornauer, Baladitya Yellapragada Autonomous Driving Marcus Erbar, https://www.youtube.com/watch?v=IHhLpB5MNTQ Vehicle Detection: Deep Learning w/ lane detection


slide-1
SLIDE 1

Studying Autonomous Driving Corner Cases

Powered by the Jetson TX Series Sascha Hornauer, Baladitya Yellapragada

slide-2
SLIDE 2

Autonomous Driving

Marcus Erbar, https://www.youtube.com/watch?v=IHhLpB5MNTQ Vehicle Detection: Deep Learning w/ lane detection https://github.com/merbar/CarND-Vehicle-Detection

slide-3
SLIDE 3

Autonomous Driving

slide-4
SLIDE 4

Failures of predominant approaches

“You need to paint the bloody roads here!” - Volvos North America CEO Lex Kerssemakers to Los Angeles

Mayor Eric Garcetti while failing to drive autonomously at the Los Angeles Auto Show

slide-5
SLIDE 5

Autonomous Off-Road Driving

slide-6
SLIDE 6

Our Results

slide-7
SLIDE 7

Platform description

Model car with Jetson TX1/TX2

slide-8
SLIDE 8
  • 1. Stage: Collection of Over 100 Hours of Driving
slide-9
SLIDE 9
  • 2. Stage Imitation Learning - Behavioral Cloning of Driving
slide-10
SLIDE 10

Outdoor Domain Results

slide-11
SLIDE 11

Domain Change - Indoor Results

slide-12
SLIDE 12

Follow Behavior and Failure Cases

slide-13
SLIDE 13

Benefits

  • Complementary Solution

✔ ✔ ✔ ✔ ✔ ✔

slide-14
SLIDE 14

Challenges and Further Directions

Learned behavior not completely understood Metrics reflect coarse success, better needed 𝑏𝑣𝑢𝑝𝑜𝑝𝑛𝑧 = 1 − 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑗𝑜𝑢𝑓𝑠𝑤𝑓𝑜𝑢𝑗𝑝𝑜 ∙ 6 𝑡𝑓𝑑𝑝𝑜𝑒𝑡 𝑓𝑚𝑏𝑞𝑡𝑓𝑒 𝑢𝑗𝑛𝑓 𝑡𝑓𝑑𝑝𝑜𝑒𝑡 ∙ 100

  • How simple can our network be? How complex must it be?
  • Can we understand our network performance, before deployment/testing?
  • What has our network learned?

(1)

(1) - Bojarski, M., Del Testa, D., Dworakowski, D., et al., 2016, arXiv:1604.07316

Bojarski, M., Del Testa, D., Dworakowski, D., et al.\ 2016, arXiv:1604.07316 Bojarski, M., Del Testa, D., Dworakowski, D., et al.\ 2016, arXiv:1604.07316 Bojarski, M., Del Testa, D., Dworakowski, D., et al.\ 2016, arXiv:1604.07316

slide-15
SLIDE 15

Visualizing Ego-motion Semantics Implicitly Learned by a Neural Network for Self-Driving

Baladitya Yellapragada

slide-16
SLIDE 16

Outline

  • Need for Visualization-Based Experiments
  • Semantic Label Creation for Ego-motion Features
  • Ego-motion Feature Experiments
slide-17
SLIDE 17

Outline

  • Need for Visualization-Based Experiments
  • Semantic Label Creation for Ego-motion Features
  • Ego-motion Feature Experiments
slide-18
SLIDE 18

Spatiotemporal Input Novelty

General Network Our Network

slide-19
SLIDE 19

Insights from Visualization – Activation Maps

Zhou, et al. “Object Detectors Emerge in Deep Scene CNNs”. 2015. ICLR

slide-20
SLIDE 20

Potential Optical Flow Relevance

  • Saunders. “View rotation is used to perceive path curvature from optic flow”. 2010. Journal of Vision.
  • Optical flow asymptotes are cues for path centers (dashed)
slide-21
SLIDE 21

Insights from Visualization – Gradient Ascent

Zeiler and Fergus. “Visualizing and Understanding Convolutional Networks”. 2013. ArXiv

slide-22
SLIDE 22

Gradient Ascent with our Z2 Color

slide-23
SLIDE 23

Outline

  • Need for Visualization-Based Experiments
  • Semantic Label Creation for Ego-motion Features
  • Ego-motion Feature Experiments
slide-24
SLIDE 24

Visual Representation of Network Semantics

Auto Human Bau, et al. “Network Dissection: Quantifying Interpretability of Deep Visual Representations”. 2017. CVPR.

slide-25
SLIDE 25

Semantic Representations in Networks

Bau, et al. “Network Dissection: Quantifying Interpretability of Deep Visual Representations”. 2017. CVPR.

slide-26
SLIDE 26

Creating Optical Flow Labels per Input Video

  • Consistent True Driving Signals with Tight Standard Deviation

vs

slide-27
SLIDE 27

Filtering Out Well Predicted Videos

  • Reject videos whose driving signals are not predicted well by the network

○ Remaining are task-relevant videos, each with optical flow labels across shared space

slide-28
SLIDE 28

Outline

  • Need for Visualization-Based Experiments
  • Semantic Label Creation for Ego-motion Features
  • Ego-motion Feature Experiments
slide-29
SLIDE 29

Implicit Optical Flow Sensitivity Experiment

Meyer, et al. “Phase-Based Frame Interpolation for Video”. 2015. Computer Vision and Pattern Recognition.

slide-30
SLIDE 30

Speed Condition Examples

Third Speed Normal Speed Thrice Speed

slide-31
SLIDE 31

Results

slide-32
SLIDE 32

Controls

  • Temporal Order
  • Stereoscopic Disparity
  • Domain Correlations
slide-33
SLIDE 33

Compare Time in Other Self-Driving Networks

  • Single Image Networks [1]
  • Recurrent Units [2]
  • Spatiotemporal Convolutions [3]

[1] Mengxi Wu. “Self-driving car in a simulator with a tiny neural network”. 2017. Medium. [2] Xu, et al. “End-to-end Learning of Driving Models from Large-scale Video Datasets”. 2017. ArXiv [3] Chi and Mu. “Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues”. 2017. ArXiv

slide-34
SLIDE 34

Control Condition: Temporal Order

Normal Order Random Order Reverse Order

slide-35
SLIDE 35

Results with Temporal Controls

Normal Order Reverse Order Random Order

slide-36
SLIDE 36

Control Condition: Stereoscopic Disparity

Normal Stereo Reverse Stereo No Stereo

slide-37
SLIDE 37

Results with Stereo Controls

Normal Stereo Reverse Stereo No Stereo

slide-38
SLIDE 38

Control Conditions: Domain Correlations

vs Forest Sidewalk

slide-39
SLIDE 39

Confounding Domain Correlations

Forest w/ Sidewalks

slide-40
SLIDE 40

Results by Domain (Trained on Tested)

Sidewalk on Sidewalk Combined on Combined Forest on Forest

slide-41
SLIDE 41

Results by Domain (Trained on Tested)

Sidewalk on Forest Combined on Forest Forest on Forest

slide-42
SLIDE 42

Conclusions

  • Temporal variability is a consistent predictor of driving signal (bigger label = faster,

smaller label = slower).

  • Even if not explicitly specified, more natural temporal order is preferred over not

Evidence for potential optical flow sensitivity

  • Even if not explicitly specified, more natural camera differences are preferred over not

Evidence for potential stereo disparity sensitivity

  • Optical flow may be implicitly learned, but optical flow salience may be not be

generalizable across domains, so more explicit training is better.