make3d learning 3d scene structure from a single still
play

Make3D: Learning 3D Scene Structure from a Single Still Image - PowerPoint PPT Presentation

Make3D: Learning 3D Scene Structure from a Single Still Image Ashoutosh Saxena, Min Sun, and Andrew Ng Ian Endres CS598 February 5, 2009 Overview Goal:Infer 3D models from monocular cues Segment into planar patches Build model from


  1. Make3D: Learning 3D Scene Structure from a Single Still Image Ashoutosh Saxena, Min Sun, and Andrew Ng Ian Endres CS598 February 5, 2009

  2. Overview Goal:Infer 3D models from monocular cues ◮ Segment into planar patches ◮ Build model from depth maps ◮ Estimate orientation/location of patches ◮ Construct 3D model

  3. Properties to Model ◮ Single Image Depth from features Connectedness Coplanarity ◮ Multiple Images Depths from triangulation ◮ Objects Object A is above B Object Orientation

  4. Superpixel Model ◮ Superpixel as a plane ◮ Model as a 3d mesh of polygons ◮ Use Felzenzwalb and Huttenlocher’s segmenter ◮ Goal: Determine location and orientation of each superpixel

  5. Superpixel Parameters ◮ α ∈ R 3 ◮ ˆ α = | α | unit normal of plane α 1 | α | distance from camera center to plane ◮ ◮ Thus, q T α = 1 for any point q ∈ R 3 on plane ◮ R i ∈ R 3 Unit length ray pointing from camera center to pixel i on image plane (using “reasonable guess” of camera’s intrinsic parameters). 1 ◮ d i = i α is distance of point i (having ray R i ) from camera R T center if it lies on plane described by α

  6. Features ◮ Monocular Features: x i ∈ R 524 ◮ Filter response + shape computed for each superpixel ◮ Additional contextual information from neighbors, at 3 scales Uses features from largest superpixel neighbor in each bin (i.e. S1C) ◮ Boundary Features: ǫ ij ∈ { 0 , 1 } 14 ◮ Segmentations based on 7 different properties at 2 scales Properties include color, texture, and edges ◮ For each segmentation k , if superpixels i , j fall on same segment, ǫ ij ( k ) = 1, otherwise 0

  7. Models ◮ P ( y ij | ǫ ij ; ψ ) - models the confidence of superpixels i , j belonging to same planar surface (0 for boundary/fold - 1 for planar) ◮ P ( α | X , v , y , R ; θ ) - models depth and orientation parameters of superpixels, composed of: ◮ f 1 ( α i | X i , v i , R i ; θ ) - plane parameters as a function of single superpixel i features ◮ f 2 ( α i , α j | y ij , R i , R j ) - plane parmeters as a function of edge features between superpixels i , j ◮ P ( v i | x i ; φ r ) - models each pixel’s ability to predict parameters of associated superpixel

  8. Occlusion Boundary and Fold Model ◮ Simple edge detector not sufficient for detecting 3d discontinuities (consider a shadow) ◮ y i , j ∈ [0 , 1] where 0 indicates boundary/fold, 1 indicates planar surface ◮ y ij hand labeled in 50 images 1 ◮ P ( y ij | ǫ ij ; ψ ) = 1+exp ( − ψ T ǫ ij ) learned using logistic regression

  9. Unary Depth Model ( f 1 ) ◮ Predict depth ˆ d as a function of features x ◮ Penalize using relative error ˆ d − 1 where ˆ d d = x T θ r 1 d = R T Note: i , s i α i � � s i v i , s i | R T i , s i α i ( x T f 1 ( α i | X i , v i , R i ; θ ) = exp − � i , s i θ r ) − 1 | ◮ The r in θ r indicates one of 11 rows in the image ◮ Parameters learned from pseudo log-likelihood of P ( α | . . . ) Since f 2 ( · ) does not depend on θ r , this gives: θ ∗ 1 � � d i , si ( x T r = argmin θ r s i v i , s i | i , s i θ r ) − 1 | i

  10. Depth Prediction Confidence ( v ) ◮ Given a model ˆ d = x T i θ r for predicting depth, build a model to predict expected error ◮ Thus learn | d i − x T i θ r | 1 = 1+exp ( − φ T d i r x i ) ◮ This (ideally) can predict how well a feature predicts the depth of a pixel 1 ◮ Presumably, v = 1 − r x i ) , indicating confidence of 1+exp ( − φ T prediction ability

  11. Superpixel Interaction Models ( f 2 ) ◮ f 2 ( α i , α j | y ij , R i , R j ) = � { s i , s j }∈ N h s i , s j ( α i , α j | y ij , R i , R j ) ◮ s i , s j are pixels from superpixels i , j respectively, chosen according to the figure depending on property to be modeled (i.e. connectivity, planarity, linearity) ◮ h ( · ) also depends on property

  12. Connectivity and Co-planarity Neighboring superpixels tend to be connected if no occlusion ◮ Uses pairs of neighboring pixels ( s i , s j ) chosen along boundarise of superpixels i , j | d i , si − d j , sj | � √ d i , si d j , sj � ◮ h s i , s j = exp − y ij Neighboring superpixels tend to belong to the same plane if no fold ◮ A pair ( s ′′ i , s ′′ j ) is chosen from the centers of each superpixel i , j respectively j α j )ˆ � − y ij | ( R T j α i − R T � ◮ h s ′′ j = exp d | j , s ′′ j , s ′′ ◮ Penalizes distance between s j and s j projected onto plane i ◮ h s ′′ i = h s ′′ j ( · ) h s ′′ i ( · ) j , s ′′

  13. Co-linearity Superpixels lying on a straight line are likely to lie on the same plane ◮ Same penalty as Co-planar term, except superpixels i , j aren’t adjacent ◮ Also, y i , j is computed from lines in the image instead of the occ/fold model

  14. Inference ◮ α ∗ = argmax α log P ( α | X , v , y , R ; θ r ) = argmax α log 1 � i f 1 ( α i | X i , v i , R i ; θ ) � i , j f 2 ( α i , α j | y ij , R i , R j ) Z ◮ Each term results in L1 norm of a linear function of α ◮ Solved via a Newton method with smooth approximation of L1 norm

  15. Experiments ◮ Depth maps from laser scanner, plus corresponding image (400 training, 134 test) ◮ Images from urban and natural scenes from daytime ◮ 588 Additional test images from internet (no depth map) ◮ Evaluation: Predict depths, then render 3d model ◮ % qualitatively correct ◮ % major planes correctly identified ◮ Average depth error log 10 : | log d − log ˆ d | ◮ Relative depth error | d − ˆ d | d

  16. Performance

  17. Results 1

  18. Other Tasks ◮ 3D model from multiple images Adds extra term ( f 3 ) which penalizes depth discrepencies when 3d correspondences exist between images ◮ Incorporating Object Information Object A is on top of object B Object A is connected to Object B - such as person’s feet on ground Object A has known orientation - such as people standing upright

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend