[PPT] - Scene Represe sentation Networks: ks: Continuous PowerPoint Presentation

SLIDE 1

Scene Represe sentation Networks: ks:

Continuous 3D-Structure-Aware Neural Scene Representations

Vincent Sitzmann Gordon Wetzstein Michael Zollhöfer

SLIDE 2

single image camera pose intrinsics Surface Normals Novel Views

SLIDE 3

+ +

Obse serva vations Image + Pose & Intrinsics

What can we learn about latent 3D scenes from observations? Vision: Learn rich representations just by watching video!

{

Self-supervised Scene Representation Learning

}

,…

Lat Latent ent 3D 3D Scenes cenes

} {

, ,… ,

SLIDE 4

Obse serva vations Re Re-Re Rende dered d Obse serva vations

Self-supervised Scene Representation Learning

Image Loss

Model

,… , ,… ,

SLIDE 5

Obse serva vations Re Re-Re Rende dered d Obse serva vations

Self-supervised Scene Representation Learning

Image Loss Neur eural al Scene cene Represe sentation Persistent feature representation of scene.

,… , ,… ,

SLIDE 6

Obse serva vations Re Re-Re Rende dered d Obse serva vations

Self-supervised Scene Representation Learning

Image Loss Neur eural al Scene cene Represe sentation Persistent feature representation of scene. Neur eural al Rend ender erer er Render from different camera perspectives.

,… , ,… ,

SLIDE 7

Obse serva vations Re Re-Re Rende dered d Obse serva vations

2D baseline: Autoencoder

Image Loss

Latent Code

Output Pose

+

Conv Encoder Conv Decoder

,… , ,… ,

SLIDE 8

Obse serva vations Re Re-Re Rende dered d Obse serva vations

2D baseline: Autoencoder

Image Loss

,… ,

Latent Code

Output Pose

Conv Decoder

,… ,

SLIDE 9

Doesn’t capture 3D properties of scenes. Trained on ~2500 shapenet cars with 50 observations each. Need 3D inductive bias!

SLIDE 10

Related Work

Tatarchenko et al., 2015 Worrall et al., 2017 Eslami et al., 2018 …

Scene Represe sentation Learning 3D Computer Visi sion

Goodfellow et al., 2014 Kingma et al., 2013 Kingma et al., 2018 …

2D Generative ve Models 3D inductive ve bias s / 3D st structure Se Self lf-su supervi vise sed with pose sed images

Choy et al., 2016 Huang et al., 2018 Park et al., 2018 …

Voxe xel-base sed Represe sentations

Sitzmann et al., 2019 Lombardi et al., 2019 Phuoc et al., 2019 …

Memory inefficient: ! "# .
Doesn’t parameterize scene surfaces smoothly.
Generalization is hard.

SLIDE 11

Obse serva vations Re Re-Re Rende dered d Obse serva vations

Scene Representation Networks

Image Loss Neur eural al Scene cene Represe sentation Neur eural al Rend ender erer er

,… , ,… ,

SLIDE 12

Obse serva vations Re Re-Re Rende dered d Obse serva vations

Scene Representation Networks

Image Loss Neur eural al Scene cene Represe sentation Neur eural al Rend ender erer er

,… , ,… ,

SLIDE 13

Free Space

!"

Objects

!#

SLIDE 14

Model scene as function Φ that maps coordinates to features.

…

" ∈

[]

…

" ∈ " ∈

Free Space

[]

Free Space

$%

Objects

$&

[]

… …

Φ: ℝ )→ ℝ+

SLIDE 15

Scene Representation Network parameterizes Φ as MLP.

…

" ∈

[]

…

" ∈

[]

…

" ∈

Free Space

…

[]

Φ: ℝ &→ ℝ(

Sc Scene Represe sentation Net etwor

rk

Free Space

)*

Objects

)+

SLIDE 16

Scene Representation Network parameterizes Φ as MLP. Φ: ℝ $→ ℝ&

Sc Scene Represe sentation Net etwor

rk

Can sample anywhere, at arbitrary resolutions. Parameterizes scene surfaces smoothly. Memory scales with scene complexity.

SLIDE 17

Obse serva vations Re Re-Re Rende dered d Obse serva vations

Scene Representation Networks

Image Loss Neur eural al Rend ender erer er

Φ: ℝ $→ ℝ&

Neur eural al Scene cene Represe sentation

,… , ,… ,

SLIDE 18

Obse serva vations Re Re-Re Rende dered d Obse serva vations

Scene Representation Networks

Image Loss Neur eural al Rend ender erer er

Φ: ℝ $→ ℝ&

Neur eural al Scene cene Represe sentation

,… , ,… ,

SLIDE 19

!" !#

Neural Renderer.

Free Space

SLIDE 20

Neural Renderer.

SLIDE 21

Neural Renderer.

SLIDE 22

Neural Renderer Step 1: Intersection Testing.

? ? ? ? ?

Idea: march along ray until arrived at surface.

SLIDE 23

Neural Renderer Step 1: Intersection Testing.

!" !#

world coordinates

$#

feature vector

Φ: ℝ (→ ℝ*

Scene Represe sentation

SLIDE 24

Neural Renderer Step 1: Intersection Testing.

!"

world coordinates

#"

feature vector

Φ: ℝ '→ ℝ)

Scene Represe sentation Ray Marching LSTM

*"+,

Step length

!- !"+,

Feasible step length: Distance to closest scene surface

SLIDE 25

Neural Renderer Step 1: Intersection Testing.

Iteration 0

SLIDE 26

Neural Renderer Step 1: Intersection Testing.

Iteration 1

SLIDE 27

Neural Renderer Step 1: Intersection Testing.

Iteration 2

SLIDE 28

Neural Renderer Step 1: Intersection Testing.

Iteration 3

SLIDE 29

Neural Renderer Step 2: Color Generation

Iteration 4

SLIDE 30

Neural Renderer Step 1: Intersection Testing.

Iteration …

SLIDE 31

Neural Renderer Step 1: Intersection Testing.

SLIDE 32

Neural Renderer Step 2: Color Generation

Φ: ℝ $→ ℝ&

Scene Represe sentation Color MLP

SLIDE 33

Obse serva vations Re Re-Re Rende dered d Obse serva vations

Can now train end-to-end with posed images only!

Image Loss Neur eural al Rend ender erer er

Φ: ℝ $→ ℝ&

Neur eural al Scene cene Represe sentation

,… , ,… ,

SLIDE 34

Generalizing across a class of scenes

SLIDE 35

Each scene represented by its own SRN.

parameters !" ∈ ℝ% parameters !& ∈ ℝ% parameters !' ∈ ℝ% parameters !( ∈ ℝ%

SLIDE 36

Each scene represented by its own SRN.

!" live on k-dimensional subspace of ℝ$, % < '. parameters !( ∈ ℝ$ parameters !* ∈ ℝ$ parameters !+ ∈ ℝ$ parameters !, ∈ ℝ$

SLIDE 37

Each scene represented by its own SRN.

Represent each scene with low-dimensional embedding embedding !" ∈ ℝ% embedding !& ∈ ℝ% embedding !' ∈ ℝ% embedding !( ∈ ℝ% parameters )" ∈ ℝ* parameters )& ∈ ℝ* parameters )' ∈ ℝ* parameters )( ∈ ℝ*

SLIDE 38

parameters !" ∈ ℝ% parameters !& ∈ ℝ% parameters !' ∈ ℝ% parameters !( ∈ ℝ%

Each scene represented by its own SRN.

embedding )" ∈ ℝ* embedding )& ∈ ℝ* embedding )' ∈ ℝ* embedding )( ∈ ℝ*

Ψ: ℝ *→ ℝ%, z/ ↦ Ψ )1 = !1 Hyp ypernetwork k

SLIDE 39

Results

SLIDE 40

SRNs Tatarchenko et al. Deterministic GQN Worrall et al.

Novel View Synthesis – Baseline Comparison

Shapenet v2 – si single-sh shot reconst struction of objects in held-out test set

SRNs (Ours) Tatarchenko et al. 2015 Deterministic GQN, adapted Eslami et al. 2018 Worrall et al. 2017 Training

§ Shapenet cars / chairs. § 50 observations per object.

Testing

Cars / chairs from unseen

test set

Single observation!

Input pose

SLIDE 41

Novel View Synthesis – SRN Output

Shapenet v2 – si single-sh shot reconst struction of objects in held-out test set

In Input pose se

SLIDE 42

Sampling at arbitrary resolutions

32x32 64x64 128x128 512x512 256x256

Surface Normals RGB

SLIDE 43

Generalization to unseen camera poses

Camera Roll Camera close-up SRNs

SLIDE 44

Generalization to unseen camera poses

Camera Roll Camera close-up

Doesn’t reconstruct geometry Doesn’t reconstruct geometry

SRNs Tatarchenko et al.

SLIDE 45

Latent code interpolation

Surface Normals RGB

SLIDE 46

Latent code interpolation

Surface Normals RGB

SLIDE 47

Can represent room-scale scenes, but aren’t compositional.

Training set novel-view synthesis on GQN rooms (Eslami et al. 2018) with Shapenet cars, 50 observations. Work-in-progress: Compositional SRNs generalize to unseen numbers of objects!

SLIDE 48

Scene Representation Networks: Continuous 3D-structure-aware Neural Scene Representations

Interpolation Single-shot reconstruction Camera pose extrapolation

Gordon Wetzstein Michael Zollhöfer Find me at Poster # 71! Looki king fo for rese search posi sitions in n sc scene represe sentation lear earni ning ng. Vincent Sitzmann @vincesitzmann vsitzmann.github.io