Scene Represe sentation Networks: ks:
Continuous 3D-Structure-Aware Neural Scene Representations
Vincent Sitzmann Gordon Wetzstein Michael Zollhöfer
Scene Represe sentation Networks: ks: Continuous - - PowerPoint PPT Presentation
Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations Vincent Sitzmann Michael Zollhfer Gordon Wetzstein single image camera pose Novel Views Surface Normals intrinsics Self-supervised Scene
Scene Represe sentation Networks: ks:
Continuous 3D-Structure-Aware Neural Scene Representations
Vincent Sitzmann Gordon Wetzstein Michael Zollhöfer
single image camera pose intrinsics Surface Normals Novel Views
+ +
Obse serva vations Image + Pose & Intrinsics
What can we learn about latent 3D scenes from observations? Vision: Learn rich representations just by watching video!
Self-supervised Scene Representation Learning
Lat Latent ent 3D 3D Scenes cenes
Obse serva vations Re Re-Re Rende dered d Obse serva vations
Self-supervised Scene Representation Learning
Image Loss
Model
Obse serva vations Re Re-Re Rende dered d Obse serva vations
Self-supervised Scene Representation Learning
Image Loss Neur eural al Scene cene Represe sentation Persistent feature representation of scene.
Obse serva vations Re Re-Re Rende dered d Obse serva vations
Self-supervised Scene Representation Learning
Image Loss Neur eural al Scene cene Represe sentation Persistent feature representation of scene. Neur eural al Rend ender erer er Render from different camera perspectives.
Obse serva vations Re Re-Re Rende dered d Obse serva vations
2D baseline: Autoencoder
Image Loss
Latent Code
Output Pose
+
Conv Encoder Conv Decoder
Obse serva vations Re Re-Re Rende dered d Obse serva vations
2D baseline: Autoencoder
Image Loss
Latent Code
Output Pose
Conv Decoder
Doesn’t capture 3D properties of scenes. Trained on ~2500 shapenet cars with 50 observations each. Need 3D inductive bias!
Related Work
Tatarchenko et al., 2015 Worrall et al., 2017 Eslami et al., 2018 …
Scene Represe sentation Learning 3D Computer Visi sion
Goodfellow et al., 2014 Kingma et al., 2013 Kingma et al., 2018 …
2D Generative ve Models 3D inductive ve bias s / 3D st structure Se Self lf-su supervi vise sed with pose sed images
Choy et al., 2016 Huang et al., 2018 Park et al., 2018 …
Voxe xel-base sed Represe sentations
Sitzmann et al., 2019 Lombardi et al., 2019 Phuoc et al., 2019 …
Obse serva vations Re Re-Re Rende dered d Obse serva vations
Scene Representation Networks
Image Loss Neur eural al Scene cene Represe sentation Neur eural al Rend ender erer er
Obse serva vations Re Re-Re Rende dered d Obse serva vations
Scene Representation Networks
Image Loss Neur eural al Scene cene Represe sentation Neur eural al Rend ender erer er
Free Space
!"
Objects
!#
Model scene as function Φ that maps coordinates to features.
…
" ∈
…
" ∈ " ∈
Free Space
Free Space
$%
Objects
$&
… …
Φ: ℝ )→ ℝ+
Scene Representation Network parameterizes Φ as MLP.
…
" ∈
…
" ∈
…
" ∈
Free Space
…
Φ: ℝ &→ ℝ(
Sc Scene Represe sentation Net etwor
Free Space
)*
Objects
)+
Scene Representation Network parameterizes Φ as MLP. Φ: ℝ $→ ℝ&
Sc Scene Represe sentation Net etwor
Can sample anywhere, at arbitrary resolutions. Parameterizes scene surfaces smoothly. Memory scales with scene complexity.
Obse serva vations Re Re-Re Rende dered d Obse serva vations
Scene Representation Networks
Image Loss Neur eural al Rend ender erer er
Φ: ℝ $→ ℝ&
Neur eural al Scene cene Represe sentation
Obse serva vations Re Re-Re Rende dered d Obse serva vations
Scene Representation Networks
Image Loss Neur eural al Rend ender erer er
Φ: ℝ $→ ℝ&
Neur eural al Scene cene Represe sentation
!" !#
Neural Renderer.
Free Space
Neural Renderer.
Neural Renderer.
Neural Renderer Step 1: Intersection Testing.
? ? ? ? ?
Idea: march along ray until arrived at surface.
Neural Renderer Step 1: Intersection Testing.
!" !#
world coordinates
$#
feature vector
Φ: ℝ (→ ℝ*
Scene Represe sentation
Neural Renderer Step 1: Intersection Testing.
!"
world coordinates
#"
feature vector
Φ: ℝ '→ ℝ)
Scene Represe sentation Ray Marching LSTM
*"+,
Step length
!- !"+,
Feasible step length: Distance to closest scene surface
Neural Renderer Step 1: Intersection Testing.
Iteration 0
Neural Renderer Step 1: Intersection Testing.
Iteration 1
Neural Renderer Step 1: Intersection Testing.
Iteration 2
Neural Renderer Step 1: Intersection Testing.
Iteration 3
Neural Renderer Step 2: Color Generation
Iteration 4
Neural Renderer Step 1: Intersection Testing.
Iteration …
Neural Renderer Step 1: Intersection Testing.
Neural Renderer Step 2: Color Generation
Φ: ℝ $→ ℝ&
Scene Represe sentation Color MLP
Obse serva vations Re Re-Re Rende dered d Obse serva vations
Can now train end-to-end with posed images only!
Image Loss Neur eural al Rend ender erer er
Φ: ℝ $→ ℝ&
Neur eural al Scene cene Represe sentation
Generalizing across a class of scenes
Each scene represented by its own SRN.
parameters !" ∈ ℝ% parameters !& ∈ ℝ% parameters !' ∈ ℝ% parameters !( ∈ ℝ%
Each scene represented by its own SRN.
!" live on k-dimensional subspace of ℝ$, % < '. parameters !( ∈ ℝ$ parameters !* ∈ ℝ$ parameters !+ ∈ ℝ$ parameters !, ∈ ℝ$
Each scene represented by its own SRN.
Represent each scene with low-dimensional embedding embedding !" ∈ ℝ% embedding !& ∈ ℝ% embedding !' ∈ ℝ% embedding !( ∈ ℝ% parameters )" ∈ ℝ* parameters )& ∈ ℝ* parameters )' ∈ ℝ* parameters )( ∈ ℝ*
parameters !" ∈ ℝ% parameters !& ∈ ℝ% parameters !' ∈ ℝ% parameters !( ∈ ℝ%
Each scene represented by its own SRN.
embedding )" ∈ ℝ* embedding )& ∈ ℝ* embedding )' ∈ ℝ* embedding )( ∈ ℝ*
Ψ: ℝ *→ ℝ%, z/ ↦ Ψ )1 = !1 Hyp ypernetwork k
Results
SRNs Tatarchenko et al. Deterministic GQN Worrall et al.
Novel View Synthesis – Baseline Comparison
Shapenet v2 – si single-sh shot reconst struction of objects in held-out test set
SRNs (Ours) Tatarchenko et al. 2015 Deterministic GQN, adapted Eslami et al. 2018 Worrall et al. 2017 Training
§ Shapenet cars / chairs. § 50 observations per object.
Testing
test set
Input pose
Novel View Synthesis – SRN Output
Shapenet v2 – si single-sh shot reconst struction of objects in held-out test set
In Input pose se
Sampling at arbitrary resolutions
32x32 64x64 128x128 512x512 256x256
Surface Normals RGB
Generalization to unseen camera poses
Camera Roll Camera close-up SRNs
Generalization to unseen camera poses
Camera Roll Camera close-up
Doesn’t reconstruct geometry Doesn’t reconstruct geometry
SRNs Tatarchenko et al.
Latent code interpolation
Surface Normals RGB
Latent code interpolation
Surface Normals RGB
Can represent room-scale scenes, but aren’t compositional.
Training set novel-view synthesis on GQN rooms (Eslami et al. 2018) with Shapenet cars, 50 observations. Work-in-progress: Compositional SRNs generalize to unseen numbers of objects!
Scene Representation Networks: Continuous 3D-structure-aware Neural Scene Representations
Interpolation Single-shot reconstruction Camera pose extrapolation
Gordon Wetzstein Michael Zollhöfer Find me at Poster # 71! Looki king fo for rese search posi sitions in n sc scene represe sentation lear earni ning ng. Vincent Sitzmann @vincesitzmann vsitzmann.github.io