Vijay John, Yuquan Xu , Seiichi Mita, Sm Smart t Vehicle Research - - PowerPoint PPT Presentation
Vijay John, Yuquan Xu , Seiichi Mita, Sm Smart t Vehicle Research - - PowerPoint PPT Presentation
Vijay John, Yuquan Xu , Seiichi Mita, Sm Smart t Vehicle Research Center Hossein Tehrani, Tomoyoki Oishi, Masataka Konishi, Hakusho Chin Advanced Mobil ilit ity Develo lopment Kazuhisa Ishimaru, Sakiko Nishino Res esea earch Divis isio
- ADAS and Automated Driving
- World 3D Reconstruction
- 3D Deep Sensor Fusion
- Future Plan
- Conclusion
ADAS App pplications ar are bo booming
- Adaptive Cruise Control (ACC)
- Adaptive Front Lights (AFL)
- Driver Monitoring System (DMS)
- Forward Collision Warning (FCW)
- Intelligent Speed Adaptation (ISA)
- Lane Departure Warning (LDW)
- Pedestrian Detection System (PDS)
- Surround-View Cameras (SVC)
- Autonomous Emergency Braking (AEB)
Vehicle Platform Sensors Configuration
Se Sensor Fusi usion & & Per erception (36 (360 deg deg Sc Scene Un Understanding)
Pat ath Plannin ing g / Beh ehavio ior Gen eneratio ion
Contr trol
Cam Cameras St Stereo Las aser Sen ensor RA RADAR So Sonar GPS PS/IMU
Deep Understanding of f Environment
Sensors
<St Stereo Vis ision>
Far
Clos lose
far - small shift
Shift = Disparit ity
close – big shift
Close Far
Left Right
Road 3D-Object Car Barrier
Close Far
Cal alibration
<Process>
Disp Disparity ty Calc Calculati tion (St (Stereo mat atchin ing)
De Detection
Dis isparity ma map
𝑒𝑗𝑡𝑢𝑏𝑜𝑑𝑓 = 𝐶 . 𝑔 𝑒𝑗𝑡𝑞𝑏𝑠𝑗𝑢𝑧
Eac ach pixel l has as suc uch mat atchin ing g cost cur urve, , which constructs the e Mat atchin ing Cost t Spac ace
<Cost Space >
24 40 55
Image height
Finding tr true disparit ity valu lue for every pixel l fr from Matchin ing Cost Space
< Matching Cost >
Lef Left Ima Image e
Disparity: 55 40
Right Ima Image e
Disparity: 55 40 24
Ground Truth
Matching cost curve
Neighbors: low matching cost
Focus on Red pixel
10 20 30 40 50 60
Disparity Matching cost
300 400 500 600
0.008 0.064
Horizontal Axis
Left Image Right Image
Vit iterbi alg lgorithm can fin find the glo lobal optimum
Pixel
430 440 450 460 470 480 490
Mat atching cost 0.006 0.018 0.012
60 50 40 30 20 10
Disparity Viterbi node Path cost Block Matching (wrong) Viterbi (optimal)
Exploiting the neighbors’ matching cost can be translated into Mathematical Optimization about the Sh Shortest Path th Problem
Cost for VSLj Cost for VSLj+1 DSL_RDLUk , k=1,…,K Costs Regrouping Right down to left up Viterbi direction Left up to right down Viterbi direction DSL_RDLUk , k=1,…,K Merge Costs Average VSLj , j=1,…,640
VSLj−1 VSLj VSLj+1 VSLj+2
Huge Networks wit ith Parallel Optimization
< SGBM >
< Proposed Method(Multi-Path Viterbi) >
Merge
Viterbi ← → Viterbi ↑ ↓ Viterbi ↖ ↘ Viterbi ↗ ↙ Cost space Out: disparity Viterbi→ Cost space + Output: disparity
Merge Diagonal directions …
MPV Optimize the Accumulates Information Step by Step
Independently merge
Hierarchically merge Viterbi← Viterbi↑ Viterbi↓ Viterbi ↗ Viterbi ↙ Viterbi ↖ Viterbi ↘
< MPV >
Multi Path Viterbi
Co Conventio tional meth ethod: SGB SGBM
Blo lock Matching
Dense & sm smooth th
Im Image Si Size : 1280×960 Calculati tion Tim Time :
15ms/Frame
GPU : GeFORCE GTX 1080 Nagoya Ur Urban Roa
- ad
Tokyo Metropolitan Highway
Calc lculation Time :1 :15ms/Frame GPU GPU : GeFORCE GTX 1080
Calculati tion Tim Time : 15ms/Frame GPU : GeFORCE GTX TX 1080
Electromagnetic Range Wave: Camera ,Laser, Radar
0.1μm 1μm 10μm 1mm 100μm
Laser Visible Light Camera Infrared Camera Car Radar
10mm
Car Sonar Electromagnetic Wave Sound Wave
Wave Length
Can an Se See De Detail
Can an See ee Far ar
Fog & Dust
Rain ~100/m3
~100/cm3
Sun Light Sno Snow
Perception (Learning Framework)
Cam ameras St Stereo Las aser Se Sensor
Training a le learning fr framework for perception tasks
Sen ensors Free Space Objects
Traditional l Learning
Featu ture Extr tracti tion (HOG,
DPM etc)
Featu ture Cla lassifi ficati tion
(SVM, Random Forest etc)
Deep le learning
(Feature Extr
tractio ion + Feature Cla lassification)
- Single sensor-based
learning is not robust or descriptive enough
- Challenges
– Environmental Variation (occlusion, illumination variation, etc.) – High Inter-Class and Intra-Class Variability
Perception (L (Learning Framework)
Single Sensor
Camera, Lidar, Stereo Labels
There are many vehicle vari rieties with ith diff ifferent ori rientations
We have a large number of On-Road Objects
We have a lot of variety of on road objects!!!!
We have the different type of road boundaries
We have a lot of variety of Free Space Boundary!!!!
Concrete Curb Guardrail Wall Pylon Divider
Illumination variation as observed by a monocular camera image with appearance features
- Sensor Fusion-based
learning with
Complementary Sensors
addresses these issues
- Monocular Camera
appearance features and depth features are
Comple lementary Features
Se Sensor Fusion and Perception (L (Learning Framework)
Cam ameras St Stereo Las aser Sen ensor
Sensor Fusion
Labels
Monocular Camera Depth th Camera Monocular Camera ⇒ Rich Appearance Information Depth Camera ⇒ Depth Information (3D Data) In Inexpensive St Stereo-based Depth th Ine Inexpensive Ill Illuminati tion Varia iati tion Ill Illumination In Invariant due to robust stereo algorithm [1]
[1] [1] Xu et et al.
- al. Real
eal-time St Stereo Disp sparity Qual ality Imp Improvement for Ch Chall allenging Traffic En Environments, , IV IV 20 2018
Depth th in informati tion fr from st stereo ca camera rob
- bust to
- ill
illuminati tion variati tion
Appearance and
and Depth Features are Fused wit ithin in a Deep le learnin ing Framework for Environment Perception
Deep le learning fr framework
Appearance (M (Monocular cam camera) Descriptive Ap Appearance Fea eatu tures Depth th (S (Stereo Camera/Laser) Ill Illuminati tion in invaria iant t depth th featu tures Se Sensor fu fusion wit ith co complementa tary featu tures
3D Environment Perception
Image & Depth
Sensor Fusion : Raw Data Level Fusion
Image + Feature Feature Extraction Free Space Detection Object Detection
Image Depth
Image Feature Extraction Depth Feature Extraction
Feature Integration
Free Space Detection Object Detection
Deep Features
Sensor Fusion : Feature Level Fusion
Good Bad
Int ntensi sity Inp nput ut Depth Inp nput ut C1_I _Int
Co Concatenation
Con
- nca
catenation Con
- nca
catenation Con
- nca
catenation Free Space ce Outp utput Object t Outp utput C2_I _Int C1_Dep C2_Dep C2_Dep C2_I _Int US US_D _DC1_Int C2_D _Dep C2_D _Dep C1_D _Dep C1_I _Int C1_D _Dep DC1 C1_Int DC DC1_Dep DC2 C2_Int DC DC2_Dep
Ups Upsampling_ DC1_De Dep
Ups
psampling_ DC1_Int Ups Upsampling_ DC2_Int Ups Upsampling_ DC2_De Dep
Feat eature Map ap
Featu ture Ma Map
Featu ture Ma Map
Fea eatur ture Map ap Fea eatur ture Map ap Fea eatur ture Map ap C2_I _Int C2_I _Int C1_I _Int Con
- nca
catenation Skip ip Conn
- nnecti
tions
Ent Entire De Dept pth Enc Encoder Feat eature Ma Maps (m,n,n) ) ar are tr transferred to
- Free Sp
Space and and Ob Object De Decoder Feature Ma Maps (o,n,n) ) for
- r Conc
- ncatenation (m+o,n,n)
Skip ip Conn
- nnecti
tions
Ent Entire In Intensity En Encoder Feat eature Ma Maps (m,n,n) ) ar are tr transferred to
- Free Sp
Spac ace an and Ob Object De Decoder Fea eature Ma Maps (o,n,n) for Concatenation (m+o,n,n)
Skip ip Conn
- nnecti
tions
Free Space Objects
Obje bjects
Free ee Spa Space
Dep epth Int Intensit ity
De Dept pth Feat eature Ex Extraction Ima Image Feat eature Ex Extraction Feat eature In Integration Ob Objects De Detection Freespace De Detection
ChiNet
Int Inten ensity ty Im Imag age Disp Disparity ty Im Imag age
Free Sp Spac ace
- Trained with 9000 Sa
Samples from Japanese Highway dataset
- Manually annotated free space and objects
- Trained on Keras wit
ith the theano ba backend
- Trained with Nvid
idia ia Tita Titan X X GPU
Ob Objects
Free ee Spac ace e Obje bjects
Implemented on GeF eForce Tita Titan X X using Keras with Theano backend
Comparison : “Intensity” vs “Intensity and Depth”
Intensity and Disparity fusion
Wrong boundary Pylon not detected
Car detection Not accurate
Car not detected
Car detected
Pylon detected Car, better detection
Accurate boundary
Intensity image only
In Intensity im image only Intensity and Depth Fusion
Pylon not detected
Pylon detected False object
No false object
Wrong boundary better boundary
Evaluation Result
Comparison : “Intensity” vs “Intensity and Dept
So Some of f Learned Im Image Feature
Depth Intensity Image
- Vehicle Lower Part
- Free Space
- Sky
- Driving Lane
- Edge
- Free Space
Strong Weak
Some of Learned Depth Features
Depth Intensity Image
- Close Distance Objects
- Close Free Space
- Edges
- Far Distance Objects
- Far Free Space
Strong Weak
Pi Pixel Da Data ta After Me Mean Cen entering
We ha have diff ifferent t dis istrib ibuti tion even aft fter mean ce centering
Day Time Day Time Night Time Night Time
Electromagnetic Range Wave: Camera ,Laser, Radar
0.1μm 1μm 10μm 1mm 100μm
Laser Visible Light Camera Infrared Camera Car Radar
10mm
Car Sonar Electromagnetic Wave Sound Wave
Wave Length
Can an Se See De Detail
Can n See Far
Fog & Dust
Rain ~100/m3
~100/cm3
Sun Light Sno Snow
Rob
- bust
t and Reli liable Area
Thermal Camera
Norm rmal Camera
Thermal Camera
Norm rmal Camera
Stability against a variety of light conditions
32 cm
Pedestrian Pedestrian Pedestrian Pedestrian
RGB Camera Thermal Camera
3D Dense Data
RGB Fea eatu ture e Extr tractio ion Dep epth Fea eatu ture e Extr tractio ion
Feature Integration
Free ee Spac ace e Detecti tion Obje bject Det etectio ion Ther ermal Fea eatu ture e Extr tractio ion
320TOPS
Automated Driving Unit
DRIVE PX PEGASUS
Support the High Speed and Processing Requirement for Lev. 5
Process and Integration Laser , Stereo, Sonar
Far Infrared Camera, Visible Camera
Milliwave Radar IMU & GNSS & Map
- Sensor fusion of appearance and depth features for
environment perception
- Increased robustness and perception accuracy
- ChiN
iNet advantages – Precise object boundary detection – Detection of small objects in the road – Detection of far-away objects
- Computational time