Learning 3D Reconstruction in Function Space Andreas Geiger - - PowerPoint PPT Presentation

learning 3d reconstruction in function space
SMART_READER_LITE
LIVE PREVIEW

Learning 3D Reconstruction in Function Space Andreas Geiger - - PowerPoint PPT Presentation

Learning 3D Reconstruction in Function Space Andreas Geiger Autonomous Vision Group MPI for Intelligent Systems and University of T ubingen June 15, 2020 University of Tbingen MPI for Intelligent Systems Autonomous Vision Group


slide-1
SLIDE 1

Learning 3D Reconstruction in Function Space

Andreas Geiger

Autonomous Vision Group MPI for Intelligent Systems and University of T¨ ubingen

June 15, 2020

Autonomous Vision Group

University of Tübingen MPI for Intelligent Systems

slide-2
SLIDE 2

Collaborators

Lars Mescheder Michael Niemeyer Michael Oechsle Sebastian Nowozin Andreas Geiger

2

slide-3
SLIDE 3

Our goal is to make intelligent systems more autonomous, robust and safe

slide-4
SLIDE 4

Intelligent systems interact with a 3D world

slide-5
SLIDE 5

3D reconstruction is a hard problem

slide-6
SLIDE 6

1963: Blocks World

Larry Roberts: Machine Perception of Three-Dimensional Solids. PhD Thesis, MIT, 1963. 6

slide-7
SLIDE 7

Traditional 3D Reconstruction Pipeline

Input Images Camera Poses Dense Correspondences Depth Maps Depth Map Fusion 3D Reconstruction

7

slide-8
SLIDE 8

Humans recognize 3D from a single 2D image

slide-9
SLIDE 9
slide-10
SLIDE 10

Can we learn to infer 3D from a 2D image?

slide-11
SLIDE 11

3D Reconstruction from a 2D Image

3D Reconstruction Input Images Neural Network

11

slide-12
SLIDE 12

What is a good output representation?

slide-13
SLIDE 13

3D Representations

Voxels: ◮ Discretization of 3D space into grid ◮ Easy to process with neural networks ◮ Cubic memory O(n3) ⇒ limited resolution ◮ Manhattan world bias [Maturana et al., IROS 2015]

13

slide-14
SLIDE 14

3D Representations

Points: ◮ Discretization of surface into 3D points ◮ Does not model connectivity / topology ◮ Limited number of points ◮ Global shape description [Fan et al., CVPR 2017]

13

slide-15
SLIDE 15

3D Representations

Meshes: ◮ Discretization into vertices and faces ◮ Limited number of vertices / granularity ◮ Requires class-specific template – or – ◮ Leads to self-intersections [Groueix et al., CVPR 2018]

13

slide-16
SLIDE 16

3D Representations

This work: ◮ Implicit representation ⇒ No discretization ◮ Arbitrary topology & resolution ◮ Low memory footprint ◮ Not restricted to specific class

13

slide-17
SLIDE 17

Occupancy Networks

Key Idea: ◮ Do not represent 3D shape explicitly ◮ Instead, consider surface implicitly as decision boundary of a non-linear classifier:

3D Location Occupancy Probability Condition (eg, Image)

Concurrent work: ◮ DeepSDF [Park et al., CVPR 2019] ◮ IM-NET [Chen et al., CVPR 2019]

Mescheder, Oechsle, Niemeyer, Nowozin and Geiger: Occupancy Networks: Learning 3D Reconstruction in Function Space. CVPR, 2019. 14

slide-18
SLIDE 18

Training Objective

Occupancy Network:Variational Occupancy Encoder: L(θ, ψ) =

K

  • j=1

BCE(fθ(pij, zi), oij) + KL [qψ(z|(pij, oij)j=1:K) p0(z)] ◮ K: Randomly sampled 3D points (K = 2048) ◮ BCE: Cross-entropy loss ◮ qψ: Encoder

Mescheder, Oechsle, Niemeyer, Nowozin and Geiger: Occupancy Networks: Learning 3D Reconstruction in Function Space. CVPR, 2019. 15

slide-19
SLIDE 19

Results

Mescheder, Oechsle, Niemeyer, Nowozin and Geiger: Occupancy Networks: Learning 3D Reconstruction in Function Space. CVPR, 2019. 16

slide-20
SLIDE 20

Can we also learn about object appearance?

slide-21
SLIDE 21

Texture Fields

Texture Field Textured 3D Model 3D Model 2D Image

Oechsle, Mescheder, Niemeyer, Strauss and Geiger: Texture Fields: Learning Texture Representations in Function Space. ICCV, 2019. 18

slide-22
SLIDE 22

Texture Fields

Shape Encoder Texture Field S a m p l i n g Image Encoder 3D Shape 2D Image Recon. Loss Depth Map 3D Point Predicted Image True Image Point Cloud Rendering Unprojection Conditional Model GAN Model VAE Model Color Legend: Color

Oechsle, Mescheder, Niemeyer, Strauss and Geiger: Texture Fields: Learning Texture Representations in Function Space. ICCV, 2019. 19

slide-23
SLIDE 23

Texture Fields

Shape Encoder Texture Field S a m p l i n g 3D Shape Depth Map 3D Point Predicted Image True Image Point Cloud Rendering Unprojection

GAN Discriminator

Adver- sarial Loss Conditional Model GAN Model VAE Model Color Legend: Color

Oechsle, Mescheder, Niemeyer, Strauss and Geiger: Texture Fields: Learning Texture Representations in Function Space. ICCV, 2019. 19

slide-24
SLIDE 24

Texture Fields

Shape Encoder Texture Field S a m p l i n g 3D Shape Recon. Loss Depth Map 3D Point Predicted Image True Image Point Cloud Rendering Unprojection VAE Encoder Conditional Model GAN Model VAE Model Color Legend: KL Divergence Color

Oechsle, Mescheder, Niemeyer, Strauss and Geiger: Texture Fields: Learning Texture Representations in Function Space. ICCV, 2019. 19

slide-25
SLIDE 25

Representation Power

◮ Ground truth vs. Texture Field vs. Voxelization

Oechsle, Mescheder, Niemeyer, Strauss and Geiger: Texture Fields: Learning Texture Representations in Function Space. ICCV, 2019. 20

slide-26
SLIDE 26

Results

Oechsle, Mescheder, Niemeyer, Strauss and Geiger: Texture Fields: Learning Texture Representations in Function Space. ICCV, 2019. 21

slide-27
SLIDE 27

What about object motion?

slide-28
SLIDE 28

Occupancy Flow

◮ Extending Occupancy Networks to 4D is hard (curse of dimensionality) ◮ Represent shape at t = 0 using a 3D Occupancy Network ◮ Represent motion by temporally and spatially continuous vector field ◮ Relationship between 3D trajectory s and velocity v given by (differentiable) ODE: ∂s(t) ∂t = v(s(t), t)

Niemeyer, Mescheder, Oechsle and Geiger: Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics. ICCV, 2019. 23

slide-29
SLIDE 29

Occupancy Flow

Niemeyer, Mescheder, Oechsle and Geiger: Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics. ICCV, 2019. 24

slide-30
SLIDE 30

Results

◮ No correspondences needed ⇒ implicitly established by our model!

Niemeyer, Mescheder, Oechsle and Geiger: Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics. ICCV, 2019. 25

slide-31
SLIDE 31

Can we learn implicit representations from images?

slide-32
SLIDE 32

Architecture

+ +

Occupancy Probability

Niemeyer, Mescheder, Oechsle and Geiger: Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. CVPR, 2020. 27

slide-33
SLIDE 33

Forward Pass

(Rendering)

slide-34
SLIDE 34

Differentiable Volumetric Rendering

Forward Pass: ◮ For all pixels u ◮ Find surface point ˆ p along ray w via ray marching and root finding ◮ Evaluate texture field tθ(ˆ p) at ˆ p ◮ Insert color tθ(ˆ p) at pixel u

Niemeyer, Mescheder, Oechsle and Geiger: Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. CVPR, 2020. 29

slide-35
SLIDE 35

Backward Pass

(Differentiation)

slide-36
SLIDE 36

Differentiable Volumetric Rendering

Backward Pass: ◮ Image Observation I ◮ Loss L(ˆ I, I) =

u ˆ

Iu − Iu ◮ Gradient of loss function:

∂L ∂θ =

  • u

∂L ∂ˆ Iu · ∂ˆ Iu ∂θ ∂ˆ Iu ∂θ = ∂tθ(ˆ p) ∂θ + ∂tθ(ˆ p) ∂ˆ p · ∂ˆ p ∂θ

◮ Differentiation of fθ(ˆ p) = τ yields:

∂ˆ p ∂θ = −w ∂fθ(ˆ p) ∂ˆ p · w −1 ∂fθ(ˆ p) ∂θ

⇒ Analytic solution and no need for storing intermediate results

Niemeyer, Mescheder, Oechsle and Geiger: Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. CVPR, 2020. 31

slide-37
SLIDE 37

Results

Niemeyer, Mescheder, Oechsle and Geiger: Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. CVPR, 2020. 32

slide-38
SLIDE 38

Summary

slide-39
SLIDE 39

Summary

Neural Implicit Models: ◮ Effective output representation for shape, appearance, material, motion, etc. ◮ No discretization, model arbitrary topology ◮ Can be efficiently learned using 2D supervision ◮ Many applications: reconstruction, view synthesis, segmentation, etc. Challenges: ◮ Geometry must be extracted in post-processing step (1-3 sec for ONet) ◮ Extension to 4D not straightforward (curse of dimensionality) ◮ Fully connected architecture and global condition lead to oversmooth results ◮ Promising: Local features (ConvONet, PiFU), Better input encoding (NeRF)

34

slide-40
SLIDE 40

Thank you!

http://autonomousvision.github.io