Credit: https://xkcd.com/1897/ ROI-10D: Monocular Lifting of - - PowerPoint PPT Presentation

credit https xkcd com 1897 roi 10d monocular lifting of
SMART_READER_LITE
LIVE PREVIEW

Credit: https://xkcd.com/1897/ ROI-10D: Monocular Lifting of - - PowerPoint PPT Presentation

Credit: https://xkcd.com/1897/ ROI-10D: Monocular Lifting of Learning to Fuse Things and Stuff 2D Detection to 6D Pose and Metric Shape F Manhardt, W Kehl, A Gaidon J Li, A Raventos, A Bhargava, T Tagawa, A Gaidon


slide-1
SLIDE 1
slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Credit: https://xkcd.com/1897/

slide-7
SLIDE 7

ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape F Manhardt, W Kehl, A Gaidon https://arxiv.org/abs/1812.02781 Learning to Fuse Things and Stuff J Li, A Raventos, A Bhargava, T Tagawa, A Gaidon https://arxiv.org/abs/1812.01192

slide-8
SLIDE 8

Credit: Ed Olson, May Mobility

slide-9
SLIDE 9
slide-10
SLIDE 10

Image courtesy supervise.ly

slide-11
SLIDE 11
slide-12
SLIDE 12

■ ■

Toyota Safety Sense 2.0 Camera

slide-13
SLIDE 13

ICRA 2019 [arxiv + video]

slide-14
SLIDE 14

Easy to acquire Expensive / Difficult to acquire

slide-15
SLIDE 15

Easy to acquire

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

18

Photometric loss via view-synthesis Occlusion Regularization Depth Regularization (edge-aware depth smoothing) Depth Model Parameters

slide-19
SLIDE 19

Resolution Matters for View Synthesis!

slide-20
SLIDE 20
  • A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, vol. 1, no. 10, p. e3, 2016.
  • W. Shi, J. Caballero, F. Husza ́r, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super- resolution

using an efficient sub-pixel convolutional neural network,” CVPR 2016

slide-21
SLIDE 21

Modified DispNet Architecture

slide-22
SLIDE 22
  • M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” NIPS 2015
  • C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” CVPR 2017

Left Disparity Flipped Left Disparity Priors learned by model due to occluded boundaries in fronto-parallel stereo case Spatial Transformer Network Fused left

slide-23
SLIDE 23

Sub-pixel convolutions (SP), Differentiable Flip Augmentation (FA)

slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

ICLR 2019 [arxiv]

slide-30
SLIDE 30

30

Gaidon et al, "Virtual worlds as proxy for multiobject tracking analysis.", CVPR'16 Ros et al, "The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation

  • f Urban Scenes", CVPR'16

de Souza et al, "Procedural Generation of Videos to Train Deep Action Recognition Networks.", CVPR'17

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

adversarial loss privileged regularization perceptual regularization (self-regularization) task loss (this is what we care about)

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

slide-35
SLIDE 35

35

slide-36
SLIDE 36

36

slide-37
SLIDE 37

37

slide-38
SLIDE 38

38

slide-39
SLIDE 39
slide-40
SLIDE 40