disparity estimation, and structure from motion Thomas Brox - - PowerPoint PPT Presentation

disparity estimation and structure from motion
SMART_READER_LITE
LIVE PREVIEW

disparity estimation, and structure from motion Thomas Brox - - PowerPoint PPT Presentation

Learning 3D representations, disparity estimation, and structure from motion Thomas Brox University of Freiburg, Germany Research funded by the ERC Starting Grant VideoLearn, the German Research Foundation, and the Deutsche Telekom Stiftung


slide-1
SLIDE 1

Thomas Brox

Learning 3D representations, disparity estimation, and structure from motion

Thomas Brox University of Freiburg, Germany

Research funded by the ERC Starting Grant VideoLearn, the German Research Foundation, and the Deutsche Telekom Stiftung

slide-2
SLIDE 2

Thomas Brox

3D shape and texture from a single image FlowNet: end-to-end optical flow DispNet: end-to-end disparities DeMoN: end-to-end structure from motion

2

Outline

slide-3
SLIDE 3

Thomas Brox

3

Single-view to multi-view

Maxim Tatarchenko Alexey Dosovitskiy ECCV 2016

Choose desired

  • utput view

New image from arbitrary view Up-convolutional part Additional depth map Analysis part Canonical 3D representation?

slide-4
SLIDE 4

Thomas Brox

4

Single-view to multi-view

Real images Synthetic images

slide-5
SLIDE 5

Thomas Brox

5

Multi-view looks like 3D

slide-6
SLIDE 6

Thomas Brox

6

Reconstructing explicit 3D models

slide-7
SLIDE 7

Thomas Brox

7

Multiview morphing

slide-8
SLIDE 8

Thomas Brox

8

Other interesting work

Yang et al. NIPS 2015 Recurrent network, incrementally rotates the object Ours for comparison Kar et al. CVPR 2015 Choy et al. 2016

Input GT Choy Kar Input GT Choy Kar

slide-9
SLIDE 9

Thomas Brox

3D shape and texture from a single image FlowNet: end-to-end optical flow DispNet: end-to-end disparities DeMoN: end-to-end structure from motion

9

Outline

slide-10
SLIDE 10

Thomas Brox

  • Can networks learn to find correspondences?
  • New learning task!

(very different from classification, etc.)

10

FlowNet: estimating optical flow with a ConvNet

Dosovitskiy et al. ICCV 2015

slide-11
SLIDE 11

Thomas Brox

Help the network with an explicit correlation layer

11

Can networks learn to find correspondences?

Dosovitskiy et al. ICCV 2015

slide-12
SLIDE 12

Thomas Brox

  • Getting ground truth optical flow for realistic videos is

hard

  • Existing datasets are small:

12

Enough data to train such a network?

Frames with ground truth Middlebury 8 KITTI 194 Sintel 1041 Needed >10000

slide-13
SLIDE 13

Thomas Brox

13

Realism is overrated: the “flying chairs” dataset

Image pair Optical flow

slide-14
SLIDE 14

Thomas Brox

14

Synthetic 3D datasets

Driving, Monkaa, FlyingThings3D datasets publicly available

Mayer et al. CVPR 2016

slide-15
SLIDE 15

Thomas Brox

15

Generalization: it works!

Although the network has only seen flying chairs for training, it predicts good optical flow on other data

Input images Ground truth FlowNetSimple FlowNetCorr

slide-16
SLIDE 16

Thomas Brox

16

Optical flow estimation in 18ms

slide-17
SLIDE 17

Thomas Brox

17

FlowNet 2.0

Major changes:

  • Improved data and training schedules
  • Stacking of networks with motion compensation
  • Special small displacements and fusion network

Eddy Ilg et al. arXiv 2016

slide-18
SLIDE 18

Thomas Brox

18

FlowNet vs. FlowNet 2.0

slide-19
SLIDE 19

Thomas Brox

19

Numbers….

Sintel KITTI runtime DeepFlow (Weinzaepfel et al. 2013) 7.21 5.8 51940 ms FlowFields (Bailer et al. 2015) 5.81 3.5 22810 ms PCA Flow (Wulff & Black 2015) 8.65 6.2 140 ms FlowNet (Dosovitskiy et al. 2015) 7.52

  • 18 ms

FlowNet 2.0 5.74 1.8 123 ms

slide-20
SLIDE 20

Thomas Brox

20

DispNet: disparity estimation

Mayer et al. CVPR 2016

slide-21
SLIDE 21

Thomas Brox

DispNet: disparity estimation

21

slide-22
SLIDE 22

Thomas Brox

3D shape and texture from a single image FlowNet: end-to-end optical flow DispNet: end-to-end disparities DeMoN: end-to-end structure from motion

22

Outline

slide-23
SLIDE 23

Thomas Brox

23

DeMoN: Structure from motion with a network

Egomotion estimation and depth estimation are mutually dependent

Benjamin Ummenhofer Huizhong Zhou arxiv 2016

slide-24
SLIDE 24

Thomas Brox

24

Straightforward idea

Image pair

Network ignores the second image  motion parallax not learned

slide-25
SLIDE 25

Thomas Brox

25

DeMoN architecture

Estimates optical flow Estimates depth and egomotion

slide-26
SLIDE 26

Thomas Brox

26

Iterative refinement

Input images Ground truth Optical Flow Ground truth Depth Estimated optical flow Estimated depth

slide-27
SLIDE 27

Thomas Brox

27

Outperforms two-frame SfM baselines

slide-28
SLIDE 28

Thomas Brox

28

Two images generalize better than one image

slide-29
SLIDE 29

Thomas Brox

29

Two images generalize better than one image

slide-30
SLIDE 30

Thomas Brox

30

Structure from motion at 7fps

slide-31
SLIDE 31

Thomas Brox

31

Estimated camera trajectory

Example from RGB-D SLAM dataset (Sturm et al.) Red: DeMoN. Black: Ground truth.

slide-32
SLIDE 32

Thomas Brox

3D shape and texture from a single image FlowNet: end-to-end optical flow DispNet: end-to-end disparities DeMoN: end-to-end structure from motion

32

Deep learning for 3D Vision is promising