Make3D: Learning 3D Scene Structure from a Single Still Image - PowerPoint PPT Presentation

Make3D: Learning 3D Scene Structure from a Single Still Image Ashoutosh Saxena, Min Sun, and Andrew Ng Ian Endres CS598 February 5, 2009

Overview Goal:Infer 3D models from monocular cues ◮ Segment into planar patches ◮ Build model from depth maps ◮ Estimate orientation/location of patches ◮ Construct 3D model

Properties to Model ◮ Single Image Depth from features Connectedness Coplanarity ◮ Multiple Images Depths from triangulation ◮ Objects Object A is above B Object Orientation

Superpixel Model ◮ Superpixel as a plane ◮ Model as a 3d mesh of polygons ◮ Use Felzenzwalb and Huttenlocher’s segmenter ◮ Goal: Determine location and orientation of each superpixel

Superpixel Parameters ◮ α ∈ R 3 ◮ ˆ α = | α | unit normal of plane α 1 | α | distance from camera center to plane ◮ ◮ Thus, q T α = 1 for any point q ∈ R 3 on plane ◮ R i ∈ R 3 Unit length ray pointing from camera center to pixel i on image plane (using “reasonable guess” of camera’s intrinsic parameters). 1 ◮ d i = i α is distance of point i (having ray R i ) from camera R T center if it lies on plane described by α

Features ◮ Monocular Features: x i ∈ R 524 ◮ Filter response + shape computed for each superpixel ◮ Additional contextual information from neighbors, at 3 scales Uses features from largest superpixel neighbor in each bin (i.e. S1C) ◮ Boundary Features: ǫ ij ∈ { 0 , 1 } 14 ◮ Segmentations based on 7 different properties at 2 scales Properties include color, texture, and edges ◮ For each segmentation k , if superpixels i , j fall on same segment, ǫ ij ( k ) = 1, otherwise 0

Models ◮ P ( y ij | ǫ ij ; ψ ) - models the confidence of superpixels i , j belonging to same planar surface (0 for boundary/fold - 1 for planar) ◮ P ( α | X , v , y , R ; θ ) - models depth and orientation parameters of superpixels, composed of: ◮ f 1 ( α i | X i , v i , R i ; θ ) - plane parameters as a function of single superpixel i features ◮ f 2 ( α i , α j | y ij , R i , R j ) - plane parmeters as a function of edge features between superpixels i , j ◮ P ( v i | x i ; φ r ) - models each pixel’s ability to predict parameters of associated superpixel

Occlusion Boundary and Fold Model ◮ Simple edge detector not sufficient for detecting 3d discontinuities (consider a shadow) ◮ y i , j ∈ [0 , 1] where 0 indicates boundary/fold, 1 indicates planar surface ◮ y ij hand labeled in 50 images 1 ◮ P ( y ij | ǫ ij ; ψ ) = 1+exp ( − ψ T ǫ ij ) learned using logistic regression

Unary Depth Model ( f 1 ) ◮ Predict depth ˆ d as a function of features x ◮ Penalize using relative error ˆ d − 1 where ˆ d d = x T θ r 1 d = R T Note: i , s i α i � � s i v i , s i | R T i , s i α i ( x T f 1 ( α i | X i , v i , R i ; θ ) = exp − � i , s i θ r ) − 1 | ◮ The r in θ r indicates one of 11 rows in the image ◮ Parameters learned from pseudo log-likelihood of P ( α | . . . ) Since f 2 ( · ) does not depend on θ r , this gives: θ ∗ 1 � � d i , si ( x T r = argmin θ r s i v i , s i | i , s i θ r ) − 1 | i

Depth Prediction Confidence ( v ) ◮ Given a model ˆ d = x T i θ r for predicting depth, build a model to predict expected error ◮ Thus learn | d i − x T i θ r | 1 = 1+exp ( − φ T d i r x i ) ◮ This (ideally) can predict how well a feature predicts the depth of a pixel 1 ◮ Presumably, v = 1 − r x i ) , indicating confidence of 1+exp ( − φ T prediction ability

Superpixel Interaction Models ( f 2 ) ◮ f 2 ( α i , α j | y ij , R i , R j ) = � { s i , s j }∈ N h s i , s j ( α i , α j | y ij , R i , R j ) ◮ s i , s j are pixels from superpixels i , j respectively, chosen according to the figure depending on property to be modeled (i.e. connectivity, planarity, linearity) ◮ h ( · ) also depends on property

Connectivity and Co-planarity Neighboring superpixels tend to be connected if no occlusion ◮ Uses pairs of neighboring pixels ( s i , s j ) chosen along boundarise of superpixels i , j | d i , si − d j , sj | � √ d i , si d j , sj � ◮ h s i , s j = exp − y ij Neighboring superpixels tend to belong to the same plane if no fold ◮ A pair ( s ′′ i , s ′′ j ) is chosen from the centers of each superpixel i , j respectively j α j )ˆ � − y ij | ( R T j α i − R T � ◮ h s ′′ j = exp d | j , s ′′ j , s ′′ ◮ Penalizes distance between s j and s j projected onto plane i ◮ h s ′′ i = h s ′′ j ( · ) h s ′′ i ( · ) j , s ′′

Co-linearity Superpixels lying on a straight line are likely to lie on the same plane ◮ Same penalty as Co-planar term, except superpixels i , j aren’t adjacent ◮ Also, y i , j is computed from lines in the image instead of the occ/fold model

Inference ◮ α ∗ = argmax α log P ( α | X , v , y , R ; θ r ) = argmax α log 1 � i f 1 ( α i | X i , v i , R i ; θ ) � i , j f 2 ( α i , α j | y ij , R i , R j ) Z ◮ Each term results in L1 norm of a linear function of α ◮ Solved via a Newton method with smooth approximation of L1 norm

Experiments ◮ Depth maps from laser scanner, plus corresponding image (400 training, 134 test) ◮ Images from urban and natural scenes from daytime ◮ 588 Additional test images from internet (no depth map) ◮ Evaluation: Predict depths, then render 3d model ◮ % qualitatively correct ◮ % major planes correctly identified ◮ Average depth error log 10 : | log d − log ˆ d | ◮ Relative depth error | d − ˆ d | d

Performance

Results 1

Other Tasks ◮ 3D model from multiple images Adds extra term ( f 3 ) which penalizes depth discrepencies when 3d correspondences exist between images ◮ Incorporating Object Information Object A is on top of object B Object A is connected to Object B - such as person’s feet on ground Object A has known orientation - such as people standing upright

Make3D: Learning 3D Scene Structure from a Single Still Image - PowerPoint PPT Presentation

Make3D: Learning 3D Scene Structure from a Single Still Image Ashoutosh Saxena, Min Sun, and Andrew Ng Ian Endres CS598 February 5, 2009 Overview Goal:Infer 3D models from monocular cues Segment into planar patches Build model from

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Managing Street Scene Matthew Wakelam Assistant Director Street Scene Cardiff Council 1.

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

JavaFX Basics Scene Builder CS 2112 Lab 9: JavaFX JavaFX Basics Scene Builder CS 2112 Lab 9:

Structure from Motion Structure from Motion For now, static scene and moving camera

Using Single Photons Using Single Photons Using Single Photons Using Single Photons for WIMP

3D Scene Reconstruction with Multi-layer Depth and Epipolar Transformers to appear, ICCV 2019

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical

Natural Scene Perception PSY3280 - Week 10 Lecture (01 Oct 2018) Rafik Hadfi Zhao Hui Koh

DMD array Scene Lens1 A/D Photo diode Lens2 converter Ref: Duarte et al, Single pixel

Computer Graphics Seminar MTAT.03.305 Spring 2015 Raimond Tunnel Contact Information

Introduction to Mobile Robotics Proximity Sensors Wolfram Burgard, Cyrill Stachniss, Maren

Computer Graphics (CS 543) Lecture 4a: Introduction to Transformations Prof Emmanuel Agu

3D Vision Marc Pollefeys and Viktor Larsson Spring 2020 3D Vision Understanding geometric

QREME Quality Strategic Requirements Management model for supporting decision- making

Welcome! Introduction Character and Citizenship Education Roles and

Welcome to CMSC 434 Introduction to Human-Computer Interaction 1 Dilbert

Polarimetric 3D Reconstruction and Image Separation Zhaopeng Cui ETH Zurich 12.19.2019 | |

Make3D: Learning 3D Scene Structure from a Single Still Image - PowerPoint PPT Presentation

Make3D: Learning 3D Scene Structure from a Single Still Image Ashoutosh Saxena, Min Sun, and Andrew Ng Ian Endres CS598 February 5, 2009 Overview Goal:Infer 3D models from monocular cues Segment into planar patches Build model from

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs &amp; hierarchies

Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --&gt; Scene Parsing Scene

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Managing Street Scene Matthew Wakelam Assistant Director Street Scene Cardiff Council 1.

Scene Understanding Introduction &amp; Overview Outline Motivation The problems Scene

JavaFX Basics Scene Builder CS 2112 Lab 9: JavaFX JavaFX Basics Scene Builder CS 2112 Lab 9:

Structure from Motion Structure from Motion For now, static scene and moving camera

Using Single Photons Using Single Photons Using Single Photons Using Single Photons for WIMP

3D Scene Reconstruction with Multi-layer Depth and Epipolar Transformers to appear, ICCV 2019

Deep Incremental Scene Understanding Federico Tombari &amp; Christian Rupprecht Technical

Natural Scene Perception PSY3280 - Week 10 Lecture (01 Oct 2018) Rafik Hadfi Zhao Hui Koh

DMD array Scene Lens1 A/D Photo diode Lens2 converter Ref: Duarte et al, Single pixel

Computer Graphics Seminar MTAT.03.305 Spring 2015 Raimond Tunnel Contact Information

Introduction to Mobile Robotics Proximity Sensors Wolfram Burgard, Cyrill Stachniss, Maren

Computer Graphics (CS 543) Lecture 4a: Introduction to Transformations Prof Emmanuel Agu

3D Vision Marc Pollefeys and Viktor Larsson Spring 2020 3D Vision Understanding geometric

QREME Quality Strategic Requirements Management model for supporting decision- making

Welcome! Introduction Character and Citizenship Education Roles and

Welcome to CMSC 434 Introduction to Human-Computer Interaction 1 Dilbert

Polarimetric 3D Reconstruction and Image Separation Zhaopeng Cui ETH Zurich 12.19.2019 | |

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical