Deep Models for 3D Reconstruction
Andreas Geiger
Autonomous Vision Group, MPI for Intelligent Systems, T¨ ubingen Computer Vision and Geometry Group, ETH Z¨ urich
October 12, 2017
Autonomous Vision Group
Max Planck Institute for Intelligent Systems
Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision - - PowerPoint PPT Presentation
Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision Group, MPI for Intelligent Systems, T ubingen Computer Vision and Geometry Group, ETH Z urich October 12, 2017 Max Planck Institute for Intelligent Systems Autonomous
Autonomous Vision Group, MPI for Intelligent Systems, T¨ ubingen Computer Vision and Geometry Group, ETH Z¨ urich
Autonomous Vision Group
Max Planck Institute for Intelligent Systems
[Furukawa & Hernandez: Multi-View Stereo: A Tutorial]
◮ Given a set of 2D images ◮ Reconstruct 3D shape of object/scene
2
3
3
3
3
3
3
3
[Newcombe et al., 2011] [Choi et al., 2011] [Dai et al., 2017] [Wu et al., 2015] [Chang et al., 2015] [Chang et al., 2017]
4
7
8
◮ Existing 3D networks limited to ∼ 323 voxels
8
9
9
9
10
10
◮ Partition space adaptively based on sparse input
10
11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 11
11
11
◮ Differentiable ⇒ allows for end-to-end learning
11
◮ 4 different cases ◮ First case requires only 1 evaluation!
12
13
13
13
◮ Unpooling operation defined similarly
13
Fully Conn. Convolution and Pooling Fully Conn. Convolution and Pooling
14
15
15
◮ Input: voxelized meshes from ModelNet
16
◮ Input: voxelized meshes from ModelNet
16
17
◮ Dataset: RueMonge2014
18
Convolution and Pooling Convolution and Pooling Skip Skip Unpooling and Conv. Unpooling and Conv.
◮ Decoder octree structure copied from encoder
19
20
◮ p ∈ R3: voxel location ◮ d: distance, w: weight
22
◮ Pros:
◮ Simple, fast, easy to implement ◮ Defacto ”gold standard” (KinectFusion, Voxel Hashing, ...)
23
◮ Pros:
◮ Simple, fast, easy to implement ◮ Defacto ”gold standard” (KinectFusion, Voxel Hashing, ...)
◮ Cons:
◮ Requires many redundant views to reduce noise ◮ Can’t handle outliers / complete missing surfaces
23
◮ Pros:
◮ Prior on surface area ◮ Noise reduction
23
◮ Pros:
◮ Prior on surface area ◮ Noise reduction
◮ Cons:
◮ Simplistic local prior (penalizes surface area, shrinking bias) ◮ Can’t complete missing surfaces
23
◮ Pros:
◮ Learn noise suppression from data ◮ Learn surface completion from data
23
◮ Pros:
◮ Learn noise suppression from data ◮ Learn surface completion from data
◮ Cons:
◮ Requires large 3D datasets for training ◮ How to scale to high resolutions?
23
Convolution and Pooling Convolution and Pooling Skip Skip Unpooling and Conv. Unpooling and Conv.
◮ TSDF ◮ Higher-order statistics
◮ Occupancy ◮ TSDF
24
Convolution and Pooling Convolution and Pooling Skip Skip Unpooling and Conv. Unpooling and Conv.
24
Convolution and Pooling Convolution and Pooling Skip Skip Unpooling and Conv. Unpooling and Conv.
◮ Octree structure unknown ⇒ needs to be inferred as well!
24
Features Features Input Output Input Output Input Output 256³ 256³ 128³ 128³ 64³ 64³ Octree Structure
Octree Structure 25
26
27