Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision - - PowerPoint PPT Presentation

deep models for 3d reconstruction
SMART_READER_LITE
LIVE PREVIEW

Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision - - PowerPoint PPT Presentation

Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision Group, MPI for Intelligent Systems, T ubingen Computer Vision and Geometry Group, ETH Z urich October 12, 2017 Max Planck Institute for Intelligent Systems Autonomous


slide-1
SLIDE 1

Deep Models for 3D Reconstruction

Andreas Geiger

Autonomous Vision Group, MPI for Intelligent Systems, T¨ ubingen Computer Vision and Geometry Group, ETH Z¨ urich

October 12, 2017

Autonomous Vision Group

Max Planck Institute for Intelligent Systems

slide-2
SLIDE 2

3D Reconstruction

[Furukawa & Hernandez: Multi-View Stereo: A Tutorial]

Task:

◮ Given a set of 2D images ◮ Reconstruct 3D shape of object/scene

2

slide-3
SLIDE 3

3D Reconstruction Pipeline

Input Images

3

slide-4
SLIDE 4

3D Reconstruction Pipeline

Input Images Camera Poses

3

slide-5
SLIDE 5

3D Reconstruction Pipeline

Input Images Camera Poses Dense Correspondences

3

slide-6
SLIDE 6

3D Reconstruction Pipeline

Input Images Camera Poses Dense Correspondences Depth Maps

3

slide-7
SLIDE 7

3D Reconstruction Pipeline

Input Images Camera Poses Dense Correspondences Depth Maps Depth Map Fusion

3

slide-8
SLIDE 8

3D Reconstruction Pipeline

Input Images Camera Poses Dense Correspondences Depth Maps Depth Map Fusion 3D Reconstruction

3

slide-9
SLIDE 9

3D Reconstruction Pipeline

Input Images Camera Poses Dense Correspondences Depth Maps Depth Map Fusion 3D Reconstruction

3

slide-10
SLIDE 10

Large 3D Datasets and Repositories

[Newcombe et al., 2011] [Choi et al., 2011] [Dai et al., 2017] [Wu et al., 2015] [Chang et al., 2015] [Chang et al., 2017]

4

slide-11
SLIDE 11

Can we learn 3D Reconstruction from Data?

slide-12
SLIDE 12

OctNet: Learning Deep 3D Representations at High Resolutions

[Riegler, Ulusoy, & Geiger, CVPR 2017]

slide-13
SLIDE 13

Deep Learning in 2D

[LeCun, 1998]

7

slide-14
SLIDE 14

Deep Learning in 3D

8

slide-15
SLIDE 15

Deep Learning in 3D

◮ Existing 3D networks limited to ∼ 323 voxels

8

slide-16
SLIDE 16

3D Data is often Sparse

[Geiger et al., 2012]

9

slide-17
SLIDE 17

3D Data is often Sparse

[Li et al., 2016]

9

slide-18
SLIDE 18

3D Data is often Sparse

[Li et al., 2016]

Can we exploit sparsity for efficient deep learning?

9

slide-19
SLIDE 19

Network Activations

Layer 1: 323 Layer 2: 163 Layer 3: 83

10

slide-20
SLIDE 20

Network Activations

Layer 1: 323 Layer 2: 163 Layer 3: 83

10

slide-21
SLIDE 21

Network Activations

Layer 1: 323 Layer 2: 163 Layer 3: 83

Idea:

◮ Partition space adaptively based on sparse input

10

slide-22
SLIDE 22

Convolution

11

slide-23
SLIDE 23

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-24
SLIDE 24

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-25
SLIDE 25

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-26
SLIDE 26

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-27
SLIDE 27

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-28
SLIDE 28

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-29
SLIDE 29

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-30
SLIDE 30

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-31
SLIDE 31

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-32
SLIDE 32

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-33
SLIDE 33

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-34
SLIDE 34

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-35
SLIDE 35

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-36
SLIDE 36

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-37
SLIDE 37

Convolution

0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11

slide-38
SLIDE 38

Convolution

11

slide-39
SLIDE 39

Convolution

11

slide-40
SLIDE 40

Convolution

◮ Differentiable ⇒ allows for end-to-end learning

11

slide-41
SLIDE 41

Efficient Convolution

This operation can be implemented very efficiently:

◮ 4 different cases ◮ First case requires only 1 evaluation!

12

slide-42
SLIDE 42

Pooling

13

slide-43
SLIDE 43

Pooling

13

slide-44
SLIDE 44

Pooling

13

slide-45
SLIDE 45

Pooling

◮ Unpooling operation defined similarly

13

slide-46
SLIDE 46

Results: 3D Shape Classification

Fully Conn. Convolution and Pooling Fully Conn. Convolution and Pooling

Airplane

14

slide-47
SLIDE 47

Results: 3D Shape Classification

83 163 323 643 1283 2563

Input Resolution

10 20 30 40 50 60 70 80

Memory [GB]

OctNet DenseNet

15

slide-48
SLIDE 48

Results: 3D Shape Classification

83 163 323 643 1283 2563

Input Resolution

2 4 6 8 10 12 14 16

Runtime [s]

OctNet DenseNet

15

slide-49
SLIDE 49

Results: 3D Shape Classification

83 163 323 643 1283 2563

Input Resolution

0.70 0.75 0.80 0.85 0.90 0.95

Accuracy

OctNet DenseNet

◮ Input: voxelized meshes from ModelNet

16

slide-50
SLIDE 50

Results: 3D Shape Classification

83 163 323 643 1283 2563

Input Resolution

0.86 0.88 0.90 0.92 0.94

Accuracy

OctNet 1 OctNet 2 OctNet 3

◮ Input: voxelized meshes from ModelNet

16

slide-51
SLIDE 51

Results: 3D Shape Classification

17

slide-52
SLIDE 52

Results: 3D Semantic Labeling

Input Prediction

◮ Dataset: RueMonge2014

18

slide-53
SLIDE 53

Results: 3D Semantic Labeling

Convolution and Pooling Convolution and Pooling Skip Skip Unpooling and Conv. Unpooling and Conv.

◮ Decoder octree structure copied from encoder

19

slide-54
SLIDE 54

Results: 3D Semantic Labeling

IoU [Riemenschneider et al., 2014] 42.3 [Martinovic et al., 2015] 52.2 [Gadde et al., 2016] 54.4 OctNet 643 45.6 OctNet 1283 50.4 OctNet 2563 59.2

20

slide-55
SLIDE 55

OctNetFusion: Learning Depth Fusion from Data

[Riegler, Ulusoy, Bischof & Geiger, 3DV 2017]

slide-56
SLIDE 56

Volumetric Fusion

di+1(p) = wi(p)di(p) + ˆ w(p) ˆ d(p) wi(p) + ˆ w(p) wi+1(p) = wi(p) + ˆ w(p)

◮ p ∈ R3: voxel location ◮ d: distance, w: weight

[Curless and Levoy, SIGGRAPH 1996]

22

slide-57
SLIDE 57

Volumetric Fusion

◮ Pros:

◮ Simple, fast, easy to implement ◮ Defacto ”gold standard” (KinectFusion, Voxel Hashing, ...)

Ground Truth Volumetric Fusion

23

slide-58
SLIDE 58

Volumetric Fusion

◮ Pros:

◮ Simple, fast, easy to implement ◮ Defacto ”gold standard” (KinectFusion, Voxel Hashing, ...)

◮ Cons:

◮ Requires many redundant views to reduce noise ◮ Can’t handle outliers / complete missing surfaces

Ground Truth Volumetric Fusion

23

slide-59
SLIDE 59

TV-L1 Fusion

◮ Pros:

◮ Prior on surface area ◮ Noise reduction

Ground Truth Volumetric Fusion TV-L1 Fusion

23

slide-60
SLIDE 60

TV-L1 Fusion

◮ Pros:

◮ Prior on surface area ◮ Noise reduction

◮ Cons:

◮ Simplistic local prior (penalizes surface area, shrinking bias) ◮ Can’t complete missing surfaces

Ground Truth Volumetric Fusion TV-L1 Fusion

23

slide-61
SLIDE 61

Learned Fusion

◮ Pros:

◮ Learn noise suppression from data ◮ Learn surface completion from data

Ground Truth Volumetric Fusion TV-L1 Fusion OctNetFusion

23

slide-62
SLIDE 62

Learned Fusion

◮ Pros:

◮ Learn noise suppression from data ◮ Learn surface completion from data

◮ Cons:

◮ Requires large 3D datasets for training ◮ How to scale to high resolutions?

Ground Truth Volumetric Fusion TV-L1 Fusion OctNetFusion

23

slide-63
SLIDE 63

Learning 3D Fusion

Convolution and Pooling Convolution and Pooling Skip Skip Unpooling and Conv. Unpooling and Conv.

Input Representation:

◮ TSDF ◮ Higher-order statistics

Output Representation:

◮ Occupancy ◮ TSDF

24

slide-64
SLIDE 64

Learning 3D Fusion

Convolution and Pooling Convolution and Pooling Skip Skip Unpooling and Conv. Unpooling and Conv.

What is the problem?

24

slide-65
SLIDE 65

Learning 3D Fusion

Convolution and Pooling Convolution and Pooling Skip Skip Unpooling and Conv. Unpooling and Conv.

What is the problem?

◮ Octree structure unknown ⇒ needs to be inferred as well!

24

slide-66
SLIDE 66

OctNetFusion Architecture

Features Features Input Output Input Output Input Output 256³ 256³ 128³ 128³ 64³ 64³ Octree Structure

∆64 ∆128 ∆256

Octree Structure 25

slide-67
SLIDE 67

Results: Surface Reconstruction

VolFus TV-L1 Ours Ground Truth 643 1283 2563

26

slide-68
SLIDE 68

Results: Volumetric Completion

[Firman, 2016] Ours Ground Truth

27

slide-69
SLIDE 69

Thank you!