[PPT] - Vijay John, Yuquan Xu , Seiichi Mita, Sm Smart t Vehicle Research PowerPoint Presentation

SLIDE 1

Vijay John, Yuquan Xu , Seiichi Mita, Sm Smart t Vehicle Research Center

Hossein Tehrani, Tomoyoki Oishi, Masataka Konishi, Hakusho Chin Advanced Mobil ilit ity Develo lopment Kazuhisa Ishimaru, Sakiko Nishino Res esea earch Divis isio ion 2.0

SLIDE 2

ADAS and Automated Driving
World 3D Reconstruction
3D Deep Sensor Fusion
Future Plan
Conclusion

SLIDE 3

ADAS App pplications ar are bo booming

Adaptive Cruise Control (ACC)
Adaptive Front Lights (AFL)
Driver Monitoring System (DMS)
Forward Collision Warning (FCW)
Intelligent Speed Adaptation (ISA)
Lane Departure Warning (LDW)
Pedestrian Detection System (PDS)
Surround-View Cameras (SVC)
Autonomous Emergency Braking (AEB)

Vehicle Platform Sensors Configuration

SLIDE 4

Se Sensor Fusi usion & & Per erception (36 (360 deg deg Sc Scene Un Understanding)

Pat ath Plannin ing g / Beh ehavio ior Gen eneratio ion

Contr trol

Cam Cameras St Stereo Las aser Sen ensor RA RADAR So Sonar GPS PS/IMU

Deep Understanding of f Environment

Sensors

SLIDE 5

SLIDE 6

<St Stereo Vis ision>

Far

Clos lose

far - small shift

Shift = Disparit ity

close – big shift

Close Far

Left Right

Road 3D-Object Car Barrier

Close Far

Cal alibration

<Process>

Disp Disparity ty Calc Calculati tion (St (Stereo mat atchin ing)

De Detection

Dis isparity ma map

𝑒𝑗𝑡𝑢𝑏𝑜𝑑𝑓 = 𝐶 . 𝑔 𝑒𝑗𝑡𝑞𝑏𝑠𝑗𝑢𝑧

SLIDE 7

Eac ach pixel l has as suc uch mat atchin ing g cost cur urve, , which constructs the e Mat atchin ing Cost t Spac ace

<Cost Space >

24 40 55

Image height

Finding tr true disparit ity valu lue for every pixel l fr from Matchin ing Cost Space

< Matching Cost >

Lef Left Ima Image e

Disparity: 55 40

Right Ima Image e

Disparity: 55 40 24

Ground Truth

SLIDE 8

Matching cost curve

Neighbors: low matching cost

Focus on Red pixel

10 20 30 40 50 60

Disparity Matching cost

300 400 500 600

0.008 0.064

Horizontal Axis

Left Image Right Image

SLIDE 9

Vit iterbi alg lgorithm can fin find the glo lobal optimum

Pixel

430 440 450 460 470 480 490

Mat atching cost 0.006 0.018 0.012

60 50 40 30 20 10

Disparity Viterbi node Path cost Block Matching (wrong) Viterbi (optimal)

Exploiting the neighbors’ matching cost can be translated into Mathematical Optimization about the Sh Shortest Path th Problem

SLIDE 10

Cost for VSLj Cost for VSLj+1 DSL_RDLUk , k=1,…,K Costs Regrouping Right down to left up Viterbi direction Left up to right down Viterbi direction DSL_RDLUk , k=1,…,K Merge Costs Average VSLj , j=1,…,640

VSLj−1 VSLj VSLj+1 VSLj+2

Huge Networks wit ith Parallel Optimization

SLIDE 11

< SGBM >

< Proposed Method(Multi-Path Viterbi) >

Merge

Viterbi ← → Viterbi ↑ ↓ Viterbi ↖ ↘ Viterbi ↗ ↙ Cost space Out: disparity Viterbi→ Cost space + Output: disparity

Merge Diagonal directions …

MPV Optimize the Accumulates Information Step by Step

Independently merge

Hierarchically merge Viterbi← Viterbi↑ Viterbi↓ Viterbi ↗ Viterbi ↙ Viterbi ↖ Viterbi ↘

< MPV >

SLIDE 12

Multi Path Viterbi

Co Conventio tional meth ethod: SGB SGBM

Blo lock Matching

Dense & sm smooth th

SLIDE 13

Im Image Si Size ：１２８０×９６０ Calculati tion Tim Time :

15ms／Frame

GPU ： GeFORCE GTX 1080 Nagoya Ur Urban Roa

ad

SLIDE 14

Tokyo Metropolitan Highway

Calc lculation Time :1 :15ms／Frame GPU GPU ： GeFORCE GTX 1080

SLIDE 15

Calculati tion Tim Time : 15ms／Frame GPU ： GeFORCE GTX TX 1080

SLIDE 16

SLIDE 17

SLIDE 18

Electromagnetic Range Wave: Camera ,Laser, Radar

0.1μm 1μm 10μm 1ｍｍ 100μm

Laser Visible Light Camera Infrared Camera Car Radar

10ｍｍ

Car Sonar Electromagnetic Wave Sound Wave

Wave Length

Can an Se See De Detail

Can an See ee Far ar

Fog & Dust

Rain ~100／ｍ3

~100/cｍ3

Sun Light Sno Snow

SLIDE 19

Perception (Learning Framework)

Cam ameras St Stereo Las aser Se Sensor

Training a le learning fr framework for perception tasks

Sen ensors Free Space Objects

Traditional l Learning

Featu ture Extr tracti tion (HOG,

DPM etc)

Featu ture Cla lassifi ficati tion

(SVM, Random Forest etc)

Deep le learning

(Feature Extr

tractio ion + Feature Cla lassification)

SLIDE 20

Single sensor-based

learning is not robust or descriptive enough

Challenges

– Environmental Variation (occlusion, illumination variation, etc.) – High Inter-Class and Intra-Class Variability

Perception (L (Learning Framework)

Single Sensor

Camera, Lidar, Stereo Labels

SLIDE 21

There are many vehicle vari rieties with ith diff ifferent ori rientations

SLIDE 22

We have a large number of On-Road Objects

We have a lot of variety of on road objects!!!!

SLIDE 23

We have the different type of road boundaries

We have a lot of variety of Free Space Boundary!!!!

Concrete Curb Guardrail Wall Pylon Divider

SLIDE 24

Illumination variation as observed by a monocular camera image with appearance features

SLIDE 25

Sensor Fusion-based

learning with

Complementary Sensors

addresses these issues

Monocular Camera

appearance features and depth features are

Comple lementary Features

Se Sensor Fusion and Perception (L (Learning Framework)

Cam ameras St Stereo Las aser Sen ensor

Sensor Fusion

Labels

SLIDE 26

Monocular Camera Depth th Camera Monocular Camera ⇒ Rich Appearance Information Depth Camera ⇒ Depth Information (3D Data) In Inexpensive St Stereo-based Depth th Ine Inexpensive Ill Illuminati tion Varia iati tion Ill Illumination In Invariant due to robust stereo algorithm [1]

[1] [1] Xu et et al.

al. Real

eal-time St Stereo Disp sparity Qual ality Imp Improvement for Ch Chall allenging Traffic En Environments, , IV IV 20 2018

Depth th in informati tion fr from st stereo ca camera rob

bust to
ill

illuminati tion variati tion

SLIDE 27

Appearance and

and Depth Features are Fused wit ithin in a Deep le learnin ing Framework for Environment Perception

Deep le learning fr framework

Appearance (M (Monocular cam camera) Descriptive Ap Appearance Fea eatu tures Depth th (S (Stereo Camera/Laser) Ill Illuminati tion in invaria iant t depth th featu tures Se Sensor fu fusion wit ith co complementa tary featu tures

3D Environment Perception

SLIDE 28

Image & Depth

 Sensor Fusion : Raw Data Level Fusion

Image + Feature Feature Extraction Free Space Detection Object Detection

Image Depth

Image Feature Extraction Depth Feature Extraction

Feature Integration

Free Space Detection Object Detection

Deep Features

 Sensor Fusion : Feature Level Fusion

Good Bad

SLIDE 29

Int ntensi sity Inp nput ut Depth Inp nput ut C1_I _Int

Co Concatenation

Con

nca

catenation Con

nca

catenation Con

nca

catenation Free Space ce Outp utput Object t Outp utput C2_I _Int C1_Dep C2_Dep C2_Dep C2_I _Int US US_D _DC1_Int C2_D _Dep C2_D _Dep C1_D _Dep C1_I _Int C1_D _Dep DC1 C1_Int DC DC1_Dep DC2 C2_Int DC DC2_Dep

Ups Upsampling_ DC1_De Dep

Ups

psampling_ DC1_Int Ups Upsampling_ DC2_Int Ups Upsampling_ DC2_De Dep

Feat eature Map ap

Featu ture Ma Map

Fea eatur ture Map ap Fea eatur ture Map ap Fea eatur ture Map ap C2_I _Int C2_I _Int C1_I _Int Con

nca

catenation Skip ip Conn

nnecti

tions

Ent Entire De Dept pth Enc Encoder Feat eature Ma Maps (m,n,n) ) ar are tr transferred to

Free Sp

Space and and Ob Object De Decoder Feature Ma Maps (o,n,n) ) for

r Conc
ncatenation (m+o,n,n)

Skip ip Conn

nnecti

tions

Ent Entire In Intensity En Encoder Feat eature Ma Maps (m,n,n) ) ar are tr transferred to

Free Sp

Spac ace an and Ob Object De Decoder Fea eature Ma Maps (o,n,n) for Concatenation (m+o,n,n)

Skip ip Conn

nnecti

tions

SLIDE 30

Free Space Objects

Obje bjects

Free ee Spa Space

Dep epth Int Intensit ity

De Dept pth Feat eature Ex Extraction Ima Image Feat eature Ex Extraction Feat eature In Integration Ob Objects De Detection Freespace De Detection

SLIDE 31

ChiNet

Int Inten ensity ty Im Imag age Disp Disparity ty Im Imag age

Free Sp Spac ace

Trained with 9000 Sa

Samples from Japanese Highway dataset

Manually annotated free space and objects
Trained on Keras wit

ith the theano ba backend

Trained with Nvid

idia ia Tita Titan X X GPU

Ob Objects

SLIDE 32

Free ee Spac ace e Obje bjects

SLIDE 33

SLIDE 34

SLIDE 35

SLIDE 36

Implemented on GeF eForce Tita Titan X X using Keras with Theano backend

SLIDE 37

Comparison : “Intensity” vs “Intensity and Depth”

Intensity and Disparity fusion

Wrong boundary Pylon not detected

Car detection Not accurate

Car not detected

Car detected

Pylon detected Car, better detection

Accurate boundary

Intensity image only

SLIDE 38

In Intensity im image only Intensity and Depth Fusion

Pylon not detected

Pylon detected False object

No false object

Wrong boundary better boundary

Evaluation Result

Comparison : “Intensity” vs “Intensity and Dept

SLIDE 39

So Some of f Learned Im Image Feature

Depth Intensity Image

Vehicle Lower Part
Free Space
Sky
Driving Lane
Edge
Free Space

Strong Weak

SLIDE 40

Some of Learned Depth Features

Depth Intensity Image

Close Distance Objects
Close Free Space
Edges
Far Distance Objects
Far Free Space

Strong Weak

SLIDE 41

SLIDE 42

Pi Pixel Da Data ta After Me Mean Cen entering

We ha have diff ifferent t dis istrib ibuti tion even aft fter mean ce centering

Day Time Day Time Night Time Night Time

SLIDE 43

Electromagnetic Range Wave: Camera ,Laser, Radar

0.1μm 1μm 10μm 1ｍｍ 100μm

Laser Visible Light Camera Infrared Camera Car Radar

10ｍｍ

Car Sonar Electromagnetic Wave Sound Wave

Wave Length

Can an Se See De Detail

Can n See Far

Fog & Dust

Rain ~100／ｍ3

~100/cｍ3

Sun Light Sno Snow

Rob

bust

t and Reli liable Area

SLIDE 44

Thermal Camera

Norm rmal Camera

SLIDE 45

Thermal Camera

Norm rmal Camera

SLIDE 46

SLIDE 47

Stability against a variety of light conditions

SLIDE 48

32 cm

SLIDE 49

Pedestrian Pedestrian Pedestrian Pedestrian

SLIDE 50

SLIDE 51

RGB Camera Thermal Camera

3D Dense Data

RGB Fea eatu ture e Extr tractio ion Dep epth Fea eatu ture e Extr tractio ion

Feature Integration

Free ee Spac ace e Detecti tion Obje bject Det etectio ion Ther ermal Fea eatu ture e Extr tractio ion

SLIDE 52

320TOPS

Automated Driving Unit

DRIVE PX PEGASUS

Support the High Speed and Processing Requirement for Lev. 5

Process and Integration Laser , Stereo, Sonar

Far Infrared Camera, Visible Camera

Milliwave Radar IMU & GNSS & Map

SLIDE 53

Sensor fusion of appearance and depth features for

environment perception

Increased robustness and perception accuracy
ChiN

iNet advantages – Precise object boundary detection – Detection of small objects in the road – Detection of far-away objects

Computational time

– Reduction of computational time to ~15 ms possible with optimized CUDA libraries and advances in GPU computing

SLIDE 54

Vijay John, Yuquan Xu , Seiichi Mita, Sm Smart t Vehicle Research Center

Hossein Tehrani, Tomoyoki Oishi, Masataka Konishi, Hakusho Chin Advanced Mobil ilit ity Develo lopment Kazuhisa Ishimaru, Sakiko Nishino Res esea earch Divis isio ion 2.0

Vehicle Platform Sensors Configuration

Pat ath Plannin ing g / Beh ehavio ior Gen eneratio ion

Contr trol

Deep Understanding of f Environment

Sensors

<St Stereo Vis ision>

Shift = Disparit ity

<Process>

Eac ach pixel l has as suc uch mat atchin ing g cost cur urve, , which constructs the e Mat atchin ing Cost t Spac ace

<Cost Space >

Finding tr true disparit ity valu lue for every pixel l fr from Matchin ing Cost Space

< Matching Cost >

Matching cost curve

Horizontal Axis

Vit iterbi alg lgorithm can fin find the glo lobal optimum

Exploiting the neighbors’ matching cost can be translated into Mathematical Optimization about the Sh Shortest Path th Problem

Huge Networks wit ith Parallel Optimization

< SGBM >

MPV Optimize the Accumulates Information Step by Step

Independently merge

< MPV >

Multi Path Viterbi

Blo lock Matching

Dense & sm smooth th

Im Image Si Size ： １２８０×９６０ Calculati tion Tim Time :

15ms／Frame

GPU ： GeFORCE GTX 1080 Nagoya Ur Urban Roa

Tokyo Metropolitan Highway

Calc lculation Time :1 :15ms／Frame GPU GPU ： GeFORCE GTX 1080

Calculati tion Tim Time : 15ms／Frame GPU ： GeFORCE GTX TX 1080

Electromagnetic Range Wave: Camera ,Laser, Radar

0.1μm 1μm 10μm 1ｍｍ 100μm

10ｍｍ

Wave Length

Can an See ee Far ar

Perception (Learning Framework)

Cam ameras St Stereo Las aser Se Sensor

Training a le learning fr framework for perception tasks

Sen ensors Free Space Objects

Traditional l Learning

Featu ture Extr tracti tion (HOG,

DPM etc)

Featu ture Cla lassifi ficati tion

(SVM, Random Forest etc)

Deep le learning

(Feature Extr

tractio ion + Feature Cla lassification)

learning is not robust or descriptive enough

– Environmental Variation (occlusion, illumination variation, etc.) – High Inter-Class and Intra-Class Variability

Perception (L (Learning Framework)

Single Sensor

Camera, Lidar, Stereo Labels

There are many vehicle vari rieties with ith diff ifferent ori rientations

We have a large number of On-Road Objects

We have a lot of variety of on road objects!!!!

We have the different type of road boundaries

We have a lot of variety of Free Space Boundary!!!!

Illumination variation as observed by a monocular camera image with appearance features

learning with

Complementary Sensors

addresses these issues

appearance features and depth features are

Comple lementary Features

Se Sensor Fusion and Perception (L (Learning Framework)

Sensor Fusion

Labels

Monocular Camera Depth th Camera Monocular Camera ⇒ Rich Appearance Information Depth Camera ⇒ Depth Information (3D Data) In Inexpensive St Stereo-based Depth th Ine Inexpensive Ill Illuminati tion Varia iati tion Ill Illumination In Invariant due to robust stereo algorithm [1]

Depth th in informati tion fr from st stereo ca camera rob

illuminati tion variati tion

Appearance and

and Depth Features are Fused wit ithin in a Deep le learnin ing Framework for Environment Perception

Deep le learning fr framework

Appearance (M (Monocular cam camera) Descriptive Ap Appearance Fea eatu tures Depth th (S (Stereo Camera/Laser) Ill Illuminati tion in invaria iant t depth th featu tures Se Sensor fu fusion wit ith co complementa tary featu tures

3D Environment Perception

Image & Depth

 Sensor Fusion : Raw Data Level Fusion

Image + Feature Feature Extraction Free Space Detection Object Detection

Image Depth

Im Image Si Size ：１２８０×９６０ Calculati tion Tim Time :