Single-View Depth Image Estimation Fangchang Ma PhD Candidate at - PowerPoint PPT Presentation

Single-View Depth Image Estimation Fangchang Ma PhD Candidate at MIT (Sertac Karaman Group) • homepage: www.mit.edu/~fcma/ • code: github.com/fangchangma

Depth sensing is key to robotics advancement 1979, Multi-view vision and the Stanford Cart � 2

Depth sensing is key to robotics advancement 2007, Velodyne LiDAR and the DARPA Urban Challenge � 3

Depth sensing is key to robotics advancement 2010, Kinect and aggressive drone manuvers � 4

Impact of depth sensing beyond robotics Face ID by Apple � 5

Existing depth sensors have limited effective spatial resolutions • Stereo Cameras • Structure-light sensors • Time-of-flight sensors (e.g., LiDARs) � 6

Existing depth sensors have limited effective spatial resolutions Stereo: triangulation is accurate only at texture-rich regions � 7

Existing depth sensors have limited effective spatial resolutions Structure-light Sensors: short range, high power consumption � 8

Existing depth sensors have limited effective spatial resolutions LiDARs: extremely sparse measurements � 9

Single-View Depth Image Estimation Depth completion Depth Prediction � 10

Application 1: Sensor Enhancement Kinect Velodyne LiDAR � 11

Application 2: Sparse Map Densification State-of-the-art, real-time SLAM algorithms are mostly (semi) feature-based, resulting in a sparse map representation PTAM LSD-SLAM Depth completion as a downstream, post-processing step for sparse SLAM algorithms, creating a dense map representation � 12

Single-View Depth Image Estimation • Why is the problem challenging? • How to solve the problem? • How to train a model without ground truth? • How fast can we run on embedded systems? • How to obtain performance guarantees with DL? • What to do if you “hate” deep learning? � 13

Challenges in Depth Completion • An ill-posed inverse problem • High-dimensional, continuous prediction � 15

Challenges in Depth Completion • Biased / adversarial sampling • Varying number of measurements � 16

Challenges in Depth Completion • Cross-modality fusion (RGB + Depth) � 17

Challenges in Depth Completion • Lack of ground truth data (category vs. distance) � 18

Sparse-to-Dense: Deep Regression Neural Networks • Direct encoding: use 0s to represent no-measurement • Early-fusion strategy: concatenate RGB and sparse Depth at input level • Network Architecture: standard convolutional neural network • Train end-to-end using ground truth depth � 20

Results on NYU Dataset • RGB only: RMS=51cm • RGB + 20 measurements: RMS=35cm • RGB + 50 measurements: RMS=28cm • RGB + 200 measurements: RMS=23cm � 21

Scaling of Accuracy vs. Samples REL 0 . 25 0 . 20 0 . 15 0 . 10 0 . 05 RGBd sparse depth RGB 0 . 00 10 0 10 1 10 2 10 3 10 4 number of depth samples � 22

Application to Sparse Point Clouds � 23

Application to Sparse Point Clouds � 24

Sparse-to-Dense: depth prediction from sparse depth samples and a single image Fangchang Ma, Sertac Karaman ICRA’18 code: github.com/fangchangma/sparse-to-dense � 25

Experiment 1. Supervised Training (Baseline). RMSE=0.814m (ranked 1st on KITTI). Input (point cloud) (depth image) Prediction (point cloud) � 27

Self-supervision: enforce temporal photometric consistency

Self-supervision: enforce temporal photometric consistency Real RGB 1

Self-supervision: enforce temporal photometric consistency Real RGB 2 Real RGB 1

Self-supervision: enforce temporal photometric consistency Real RGB 2 Estimate pose from LiDAR and RGB Real RGB 1

Self-supervision: enforce temporal photometric consistency Real RGB 2 Inverse warping using both depth and pose Estimate pose from LiDAR and RGB Real RGB 1

Self-supervision: enforce temporal photometric consistency Real RGB 2 Inverse warping using both depth and pose Warped RGB 2 Estimate pose from LiDAR and RGB Real RGB 1

Self-supervision: enforce temporal photometric consistency Real RGB 2 Inverse warping using both depth and pose Warped RGB 2 Estimate pose from LiDAR and RGB Penalize photometric Real RGB 1 differences

Self-supervision: temporal photometric consistency Supervised training requires ground truth depth labels, which are hard to acquire in practice RGB1 Warped RGB1 Photometric error � 36

Experiment 2. Self-Supervised Training Experiment 2. Self-Supervised. RMSE=1.30m Input (point cloud) (depth image) Prediction (point cloud) � 37

Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera Fangchang Ma, Guilherme Venturelli Cavalheiro, Sertac Karaman ICRA 2019 code: github.com/fangchangma/self-supervised-depth-completion � 38

FastDepth • An Efficient and lightweight encoder-decoder network architecture with a low-latency design incorporating depthwise separable layers and additive skip connections • Network pruning applied to whole encoder-decoder network • Platform-specific compilation targeting embedded systems � 40

FastDepth is the first demonstration of real-time depth estimation on embedded systems � 41

FastDepth is the first demonstration of real-time depth estimation on embedded systems � 42

Achieved fast runtime through network design, pruning, and hardware-specific compilation � 43

FastDepth performs similarly to more complex models, but 65x faster RGB Input Ground Truth This Work Baseline (178 fps on TX2 GPU) ResNet-50 with UpProj (2.7 fps on TX2 GPU) � 44

FastDepth: Fast Monocular Depth Estimation on Embedded Systems Diana Wofk*, Fangchang Ma* , Tienju-Yang, Sertac Karaman, Vivienne Sze ICRA 2019 fastdepth.mit.edu https://github.com/dwofk/fast-depth � 45

Assumption: image can be modeled by a convolutional generative neural network � 47

Sub-sampling Process � 48

Rephrasing the depth-completion/image-inpainting problems Question: can you find x (or equivalently, z), given only y? � 49

Rephrasing the depth-completion/image-inpainting problems If z is recovered, then we can reconstruct x as G(z) using a single forward pass � 50

The latent code z can be computed efficiently using gradient descent � 51

Main Theorem For a 2-layer network, the latent code z can be recovered from the undersampled measurements y using gradient descents (with high probability) -0.5 by minimizing the empirical loss function. 0 0.5 -1.5 -1 1 -0.5 0 1.5 � 52

Experimental Results Undersampled Measurements Reconstructed Images Ground Truth � 53

Invertibility of Convolutional Generative Networks from Partial Measurements Fangchang Ma* , Ulas Ayaz*, Sertac Karaman NeurIPS 2018 (previously known as NIPS) code: github.com/fangchangma/invert-generative-networks � 54

Depth Completion: Linear model with planar assumption Input: only sparse depth Output: dense depth Fangchang Ma , Luca Carlone, Ulas Ayaz, Sertac Karaman. “Sparse sensing for resource- constrained depth reconstruction”. IROS’16 Fangchang Ma , Luca Carlone, Ulas Ayaz, Sertac Karaman. “Sparse Depth Sensing for Resource- Constrained Robots”. The International Journal of Robotics Research (IJRR) � 56

Single-View Depth Image Estimation Fangchang Ma PhD Candidate at - PowerPoint PPT Presentation

Single-View Depth Image Estimation Fangchang Ma PhD Candidate at MIT (Sertac Karaman Group) homepage: www.mit.edu/~fcma/ code: github.com/fangchangma Depth sensing is key to robotics advancement 1979, Multi-view vision and the Stanford

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Aron Yu Nov 2, 2012 1 Depth Image Body Parts 3D Joint Est. 2 Image Credit: Shotton et al.

Depth from Stereo Sanja Fidler CSC420: Intro to Image Understanding 1 / 12 Depth from Two

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

Depth and Surface Normal Estimation from a Single Image Mian Wei University of Toronto 2

Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image Denis Tom

Image as a single label king crab Image Source: ImageNet Image as an object set Man

Tsinghua University Monocular Depth-Pose Prediction [R, t] Depth and Pose RGB PoseNet

Recognises your face and voice Kinect Adventures What the Kinect Sees top view side view

Learning Approaches to Estimate Depth from RGB Lecture 5 What will we learn - Latest Approaches

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity Valery

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

AR with head-mounted Displays Vorlesung Augmented Reality Prof. Dr. Andreas Butz WS

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical

Agribot Sprayer, SLAM, and Robust Navigation Andrey Kurenkov, Troy ONeal, Pavel Komarov

Real-Time Monocular SLAM Andrew Davison Robot Vision Group Department of Computing Imperial

Digital night vision April, 2020 scopes (Monoculars) NIGHT VISION MONOCULARS (SCOPES)

Analog night vision April, 2020 scopes (Monoculars) NIGHT VISION MONOCULARS (SCOPES)

Development and Experimental Evaluation of Advanced Robotics Technologies Enabling On- Orbit

Learning Better Object Models using Video Data Patrick Li, Inmar Givoni, Brendan Frey Motivation

Single-View Depth Image Estimation Fangchang Ma PhD Candidate at - PowerPoint PPT Presentation

Single-View Depth Image Estimation Fangchang Ma PhD Candidate at MIT (Sertac Karaman Group) homepage: www.mit.edu/~fcma/ code: github.com/fangchangma Depth sensing is key to robotics advancement 1979, Multi-view vision and the Stanford

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

for each dst in my.out_edges if dst.depth &gt; my.depth+1 then dst.depth = my.depth+1

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Aron Yu Nov 2, 2012 1 Depth Image Body Parts 3D Joint Est. 2 Image Credit: Shotton et al.

Depth from Stereo Sanja Fidler CSC420: Intro to Image Understanding 1 / 12 Depth from Two

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

Depth and Surface Normal Estimation from a Single Image Mian Wei University of Toronto 2

Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image Denis Tom

Image as a single label king crab Image Source: ImageNet Image as an object set Man

Tsinghua University Monocular Depth-Pose Prediction [R, t] Depth and Pose RGB PoseNet

Recognises your face and voice Kinect Adventures What the Kinect Sees top view side view

Learning Approaches to Estimate Depth from RGB Lecture 5 What will we learn - Latest Approaches

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity Valery

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

AR with head-mounted Displays Vorlesung Augmented Reality Prof. Dr. Andreas Butz WS

Deep Incremental Scene Understanding Federico Tombari &amp; Christian Rupprecht Technical

Agribot Sprayer, SLAM, and Robust Navigation Andrey Kurenkov, Troy ONeal, Pavel Komarov

Real-Time Monocular SLAM Andrew Davison Robot Vision Group Department of Computing Imperial

Digital night vision April, 2020 scopes (Monoculars) NIGHT VISION MONOCULARS (SCOPES)

Analog night vision April, 2020 scopes (Monoculars) NIGHT VISION MONOCULARS (SCOPES)

Development and Experimental Evaluation of Advanced Robotics Technologies Enabling On- Orbit

Learning Better Object Models using Video Data Patrick Li, Inmar Givoni, Brendan Frey Motivation

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical