Inferring 3D Cues from a Single Image
Wei Wei-
- Cheng Su
Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su - - PowerPoint PPT Presentation
Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su Wei Motivation 2 Human can estimate the 3D information from a single image easily. But how about computers? Possible cues: defocus, texture, shading, perspective,
2
¤ Human can
¤ Possible cues:
3
¤ Inferring Spatial Layout from A Single Image via
¤ Depth Estimation using Monocular and Stereo
¤ Comparison
4
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
5
¤ Infer 3D spatial layout from a single 2D image ¤ Based on grouping ¤ Focus on indoor scenes
6
Edges Lines Line groups Quadrilaterals Depth-ordered planes
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
7
¤ The most time consuming operation ¤ Canny edge detection ¤ 5 seconds for a 400x400 image with a 2GHz CPU
8
¤ Link edge
¤ Short lines are
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
9
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
10
¤ Estimate vanish points (one for each of the three
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
11
¤ A_ & A|| : measure how likely two lines belong to the same
¤ R⊥: measure how likely two lines belong to different
¤ Pairwise attraction and repulsion in a graph cuts framework
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
12
¤ Quadrilaterals are determined by adjacent lines and
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
13
¤ Coplanarity: based on the degree of overlap, A⃞ ¤ Rectify before measuring
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
14
¤ Relative Depth
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
15
¤ The relative depth between two quadrilaterals is
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
16
¤ Pairwise attraction and directional repulsion in a
⁄ Attraction: A⃞ ⁄ Replusion: Rd
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
17
Edges Lines Line groups Quadrilaterals Depth-ordered planes
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
18
[Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]
19
¤ Inferring Spatial Layout from A Single Image via
¤ Depth Estimation using Monocular and Stereo
¤ Comparison
20
¤ Shortcomings of stereo vision
⁄ Fail for texture-less regions. ⁄ Inaccurate when the distance is large
¤ Monocular cues
⁄ Texture variations and gradients ⁄ Defocus ⁄ Haze
¤ Stereo and monocular cues are complementary
⁄ Stereo: image difference ⁄ Monocular: image content, prior knowledge about the
21
¤ 3-D scanner to collect training data
⁄ Stereo pairs ⁄ Ground truth depthmaps
¤ Estimate posterior distribution of the depths given
⁄ P(depths| monocular features, stereo disparities)
22
¤ Monocular Cues ¤ Stereo Cues
23
¤ 17 filters are used. 9 Laws’ masks, 6 oriented edge
⁄ Texture variation ⁄ Texture gradients ⁄ Color
¤ An image is divided into rectangular patches, a
[Saxena, Schulte, and Ng, IJCAI 2007]
24
¤ Absolute features
⁄ Sum-squared energy of each filter outputs over each
⁄ To capture global information, 4 neighboring patches
⁄ Feature vector: (1+4)*3*17 = 255 dimensions
¤ Relative features
⁄ 10-bin histogram formed by the filter outputs of pixels
25
[Saxena, Schulte, and Ng, IJCAI 2007]
26
¤ Use the sum-of-absolute-differences correlation as
¤ Find disparity ¤ Calculate the depth
27
¤ Markov Random Field model ¤ P(d|X), X: monocular features of the patch, stereo
28
¤ θr : maximizing p(d|X; θr) of the training data.
¤ Model σ2
2rs as a linear function of the patches i and
⁄ σ2
2rs =urs T|yijs| ¤ Model σ2
1r as a linear function of xi
⁄ σ 2
1r = vr Txi
29
¤ The histogram of (di – dj) is close to a Laplacian
¤ Laplacian is more robust to outliers ¤ Gaussian is not able to give depthmaps with sharp
30
¤ Laser scanner on a panning motor ⁄
¤ Stereo cameras ⁄
¤ 257 stereo pairs+depthmaps are
⁄
¤ Scenes ⁄
⁄
⁄
[Saxena, Schulte, and Ng, IJCAI 2007]
31
¤ Baseline ¤ Stereo ¤ Stereo(smooth, Lap) ¤ Mono(Gaussian) ¤ Mono(Lap) ¤ Stereo+Mono(Lap)
32
[Saxena, Schulte, and Ng, IJCAI 2007]
33
Stereo+mono mono stereo Ground truth Image
[Saxena, Schulte, and Ng, IJCAI 2007]
34
Stereo+mono mono stereo Ground truth Image
[Saxena, Schulte, and Ng, IJCAI 2007]
35
[http://ai.stanford.edu/~asaxena/learningdepth/others.html]
36
[http://ai.stanford.edu/~asaxena/learningdepth/others.html]
37
[http://ai.stanford.edu/~asaxena/learningdepth/others.html]
38
[Saxena, Schulte, and Ng, IJCAI 2007]
39
¤ Inferring Spatial Layout from A Single Image via
¤ Depth Estimation using Monocular and Stereo
¤ Comparison
40
¤
⁄
Geometrical
⁄
Learning is not required
⁄
Can be used only for indoor scenes
⁄
Estimate the relative depth between planes
⁄
Objects should be rectangular or quadrilaterals
¤
⁄
Statistical
⁄
Learning is required.
⁄
May not generalize well on images very different from training samples
⁄
Can be used for both indoor and unstructured outdoor environments.
⁄
Estimate the absolute depth