Credit: https://xkcd.com/1897/ ROI-10D: Monocular Lifting of - - PowerPoint PPT Presentation

▶

Dec 09, 2022 643 likes •1.05k views

Credit: https://xkcd.com/1897/ ROI-10D: Monocular Lifting of Learning to Fuse Things and Stuff 2D Detection to 6D Pose and Metric Shape F Manhardt, W Kehl, A Gaidon J Li, A Raventos, A Bhargava, T Tagawa, A Gaidon

SLIDE 1

SLIDE 2

SLIDE 3

SLIDE 4

SLIDE 5

SLIDE 6

Credit: https://xkcd.com/1897/

SLIDE 7

ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape F Manhardt, W Kehl, A Gaidon https://arxiv.org/abs/1812.02781 Learning to Fuse Things and Stuff J Li, A Raventos, A Bhargava, T Tagawa, A Gaidon https://arxiv.org/abs/1812.01192

SLIDE 8

Credit: Ed Olson, May Mobility

SLIDE 9

SLIDE 10

Image courtesy supervise.ly

SLIDE 11

SLIDE 12

○

■ ■

Toyota Safety Sense 2.0 Camera

SLIDE 13

ICRA 2019 [arxiv + video]

SLIDE 14

Easy to acquire Expensive / Difficult to acquire

SLIDE 15

Easy to acquire

SLIDE 16

SLIDE 17

SLIDE 18

Photometric loss via view-synthesis Occlusion Regularization Depth Regularization (edge-aware depth smoothing) Depth Model Parameters

SLIDE 19

Resolution Matters for View Synthesis!

SLIDE 20

A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, vol. 1, no. 10, p. e3, 2016.
W. Shi, J. Caballero, F. Husza ́r, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super- resolution

using an efficient sub-pixel convolutional neural network,” CVPR 2016

○

SLIDE 21

Modified DispNet Architecture

SLIDE 22

M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” NIPS 2015
C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” CVPR 2017

Left Disparity Flipped Left Disparity Priors learned by model due to occluded boundaries in fronto-parallel stereo case Spatial Transformer Network Fused left

○

SLIDE 23

Sub-pixel convolutions (SP), Differentiable Flip Augmentation (FA)

SLIDE 24

SLIDE 25

SLIDE 26

SLIDE 27

SLIDE 28

SLIDE 29

ICLR 2019 [arxiv]

SLIDE 30

Gaidon et al, "Virtual worlds as proxy for multiobject tracking analysis.", CVPR'16 Ros et al, "The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation

f Urban Scenes", CVPR'16

de Souza et al, "Procedural Generation of Videos to Train Deep Action Recognition Networks.", CVPR'17

SLIDE 31

SLIDE 32

adversarial loss privileged regularization perceptual regularization (self-regularization) task loss (this is what we care about)

SLIDE 33

→

SLIDE 34

→

SLIDE 35

→

SLIDE 36

→

SLIDE 37

→

SLIDE 38

SLIDE 39

SLIDE 40