Structure from Motion
Computer Vision Jia-Bin Huang, Virginia Tech
Many slides from S. Seitz, N Snavely, and D. Hoiem
Structure from Motion Computer Vision Jia-Bin Huang, Virginia Tech - - PowerPoint PPT Presentation
Structure from Motion Computer Vision Jia-Bin Huang, Virginia Tech Many slides from S. Seitz, N Snavely, and D. Hoiem Administrative stuffs HW 3 due 11:55 PM, Oct 17 (Wed) Submit your alignment results! [Link] HW 2 will be out this
Computer Vision Jia-Bin Huang, Virginia Tech
Many slides from S. Seitz, N Snavely, and D. Hoiem
enforce rank 2 using SVD
image
(the intersection of the camerasโ baseline with the image plane
image to a line in the other
by SVD
Assume we have matched points x xโ with outliers
Tx x ๏ฝ ~ x T x ๏ข ๏ข ๏ฝ ๏ข ~
T F T F ~
T
๏ข ๏ฝ
~ det ๏ฝ F
๏ฝ ๏ข Fx x T
http://www.3dcadbrowser.com/download.aspx?3dmodel=40454
Guidi et al. High-accuracy 3D modeling of cultural heritage, 2004
https://www.youtube.com/watch?v=1HhOmF22oYA
https://www.youtube.com/watch?v=bK6vCPcFkfk
Images ๏ Points: Structure from Motion Points ๏ More points: Multiple View Stereo Points ๏ Meshes: Model Fitting Meshes ๏ Models: Texture Mapping Images ๏ Models: Image-based Modeling
Slide credit: J. Xiao
Images ๏ Points: Structure from Motion Points ๏ More points: Multiple View Stereo Points ๏ Meshes: Model Fitting Meshes ๏ Models: Texture Mapping Images ๏ Models: Image-based Modeling
Slide credit: J. Xiao
Images ๏ Points: Structure from Motion Points ๏ More points: Multiple View Stereo Points ๏ Meshes: Model Fitting Meshes ๏ Models: Texture Mapping Images ๏ Models: Image-based Modeling
Slide credit: J. Xiao
Images ๏ Points: Structure from Motion Points ๏ More points: Multiple View Stereo Points ๏ Meshes: Model Fitting Meshes ๏ Models: Texture Mapping Images ๏ Models: Image-based Modeling
Slide credit: J. Xiao
Images ๏ Points: Structure from Motion Points ๏ More points: Multiple View Stereo Points ๏ Meshes: Model Fitting Meshes ๏ Models: Texture Mapping Images ๏ Models: Image-based Modeling
Slide credit: J. Xiao
Images ๏ Points: Structure from Motion Points ๏ More points: Multiple View Stereo Points ๏ Meshes: Model Fitting Meshes ๏ Models: Texture Mapping Images ๏ Models: Image-based Modeling
Example: https://photosynth.net/
Slide credit: J. Xiao
Cโ๏ xโ will not exactly intersect
A least squares solution to a system of equations
X
x x'
X P x ๏ข ๏ฝ ๏ข PX x ๏ฝ AX ๏ฝ ๏บ ๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ข ๏ญ ๏ข ๏ข ๏ข ๏ญ ๏ข ๏ข ๏ญ ๏ญ ๏ฝ
T T T T T T T T
v u v u
2 3 1 3 2 3 1 3
p p p p p p p p A
Further reading: HZ p. 312-313
๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ฝ 1 v u w x ๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ข ๏ข ๏ฝ ๏ข 1 v u w x ๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ฝ
T T T 3 2 1
p p p P ๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ข ๏ข ๏ข ๏ฝ ๏ข
T T T 3 2 1
p p p P
๐ฒ = ๐ฅ ๐ฃ ๐ค 1 = ๐ธ๐ = ๐๐
๐ผ
๐๐
๐ผ
๐๐
๐ผ
๐ = ๐๐
๐ผ๐
๐๐
๐ผ๐
๐๐
๐ผ๐
๐ฅ ๐ฃ ๐ค 1 = ๐ฃ๐๐
๐ผ๐
๐ค๐๐
๐ผ๐
๐๐
๐ผ๐
= ๐๐
๐ผ๐
๐๐
๐ผ๐
๐๐
๐ผ๐
๐ฃ๐๐
๐ผ๐ โ ๐๐ ๐ผ๐
= ๐ฃ๐๐
๐ผ โ ๐๐ ๐ผ ๐ = ๐
๐ค๐๐
๐ผ๐ โ ๐๐ ๐ผ๐
= ๐ค๐๐
๐ผ โ ๐๐ ๐ผ ๐ = ๐
๐ฃโฒ๐โฒ๐
๐ผ๐ โ ๐โฒ๐ ๐ผ๐ = ๐ฃโฒ๐โฒ๐ ๐ผ โ ๐โฒ๐ ๐ผ ๐ = ๐
๐คโฒ๐โฒ๐
๐ผ๐ โ ๐โฒ๐ ๐ผ๐ = ๐คโฒ๐โฒ๐ ๐ผ โ ๐โฒ๐ ๐ผ ๐ = ๐
X P x ๏ข ๏ฝ ๏ข PX x ๏ฝ
matrices
4. X = V(:, end) Pros and Cons
corresponding images
๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ฝ 1 v u w x ๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ข ๏ข ๏ฝ ๏ข 1 v u w x ๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ฝ
T T T 3 2 1
p p p P ๏บ ๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ข ๏ญ ๏ข ๏ข ๏ข ๏ญ ๏ข ๏ข ๏ญ ๏ญ ๏ฝ
T T T T T T T T
v u v u
2 3 1 3 2 3 1 3
p p p p p p p p A ๏บ ๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ช ๏ซ ๏ฉ ๏ข ๏ข ๏ข ๏ฝ ๏ข
T T T 3 2 1
p p p P
Code: http://www.robots.ox.ac.uk/~vgg/hzbook/code/vgg_multiview/vgg_X_from_xP_lin.m
Figure source: Robertson and Cipolla (Chpt 13 of Practical Image Processing and Computer Vision)
เท ๐โฒ ๐โฒ ๐ เท ๐
๐๐๐ก๐ข ๐ = ๐๐๐ก๐ข ๐, เท ๐ 2 + ๐๐๐ก๐ข ๐โฒ, เท ๐โฒ 2
เท ๐โฒ๐๐ฎเท ๐=0
Further reading: HZ p. 318
เท ๐โฒ๐๐ฎเท ๐=0
๐๐๐ก๐ข ๐ = ๐๐๐ก๐ข ๐, เท ๐ 2 + ๐๐๐ก๐ข ๐โฒ, เท ๐โฒ 2
xij = Pi Xj , i = 1,โฆ , m, j = 1, โฆ , n
points Xj from the mn corresponding 2D points xij
x1j x2j x3j Xj P1 P2 P3 Slides from Lana Lazebnik
from the known mn corresponding points xij
be recovered up to a 4x4 projective transformation Q:
2mn >= 11m + 3n โ 15
DoF in Pi DoF in Xj Up to 4x4 projective tform Q
two images using fundamental matrix
camera using all the known 3D points that are visible in its image โ calibration/resectioning
cameras points
using fundamental matrix
camera using all the known 3D points that are visible in its image โ calibration
compute new 3D points, re-optimize existing points that are also seen by this camera โ triangulation
cameras points
using fundamental matrix
camera using all the known 3D points that are visible in its image โ calibration
compute new 3D points, re-
seen by this camera โ triangulation
adjustment cameras points
2 1 1
๏ฝ ๏ฝ
m i n j j i ij
x1j x2j x3j Xj P1 P2 P3 P1Xj P2Xj P3Xj
The LevenbergโMarquardt algorithm
The Ceres-Solver from Google
parameters directly from uncalibrated images
moving camera has a fixed intrinsic matrix
projective transformation matrix Q such that all camera matrices are in the form Pi = K [Ri | ti]
matrix, such as zero skew
pixel
adding points
similarity transform
subgraphs
Reconstruction of Cornell (Crandall et al. ECCV 2011) (best method with software available; also has good overview of recent methods)
Building Rome in a Day: Agarwal et al. 2009
Structure from motion under orthographic projection
3D Reconstruction of a Rotating Ping-Pong Ball
A factorization method. IJCV, 9(2):137-154, November 1992.
x X a1 a2
homogeneous coordinates
1. We are given corresponding 2D points (x) in several frames 2. We want to estimate the 3D points (X) and the affine parameters of each camera (A)
x X a1 a2
t AX x ๏ซ ๏ฝ ๏ท ๏ท ๏ธ ๏ถ ๏ง ๏ง ๏จ ๏ฆ ๏ซ ๏ท ๏ท ๏ท ๏ธ ๏ถ ๏ง ๏ง ๏ง ๏จ ๏ฆ ๏บ ๏ป ๏น ๏ช ๏ซ ๏ฉ ๏ฝ ๏ท ๏ท ๏ธ ๏ถ ๏ง ๏ง ๏จ ๏ฆ ๏ฝ
y x
t t Z Y X a a a a a a y x
23 22 21 13 12 11
Projection of world origin
๏ฝ
๏ญ ๏ฝ
n k ik ij ij
n
1
1 ห x x x
i i i
t X A x ๏ซ ๏ฝ
j i n k k j i n k i k i i j i n k ik ij
n n n X A X X A t X A t X A x x ห 1 1 1
1 1 1
๏ฝ ๏ท ๏ธ ๏ถ ๏ง ๏จ ๏ฆ ๏ญ ๏ฝ ๏ซ ๏ญ ๏ซ ๏ฝ ๏ญ
๏ฝ ๏ฝ ๏ฝ j i ij
X A x ห ห ๏ฝ
2d normalized point (observed) 3d normalized point Linear (affine) mapping
mn m m n n n m
2 1 2 22 21 1 12 11 2 1 2 1
Camera Parameters (2mx3) 3D Points (3xn) 2D Image Points (2mxn)
Can we recover the camera parameters and 3d points?
cameras (2m) points (n)
n m mn m m n n
2 1 2 1 2 1 2 22 21 1 12 11
Source: M. Hebert
Source: M. Hebert
Source: M. Hebert
Source: M. Hebert
Source: M. Hebert
A ~ X ~
We get the same D by using any 3ร3 matrix C and applying the transformations A โ AC, X โC-1X
We have only an affine transformation and we have not enforced any Euclidean constraints (e.g., perpendicular image axes)
Source: M. Hebert
S ~ A ~ X ~
unit length
x X a1 a2
a1 ยท a2 = 0
|a1|2 = |a2|2 = 1
Source: M. Hebert
L = CCT
T i T i i 2 1
where
1 1
i T T i
2 2
i T T i
2 1
i T T i
~ ~
Three equations for each image i
๐ ๐ ๐ ๐11 ๐21 ๐31 ๐12 ๐22 ๐32 ๐13 ๐23 ๐33 ๐ ๐ ๐ = ๐
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐11 ๐12 ๐13 ๐21 ๐22 ๐23 ๐31 ๐32 ๐33 = k
๐ ๐ ๐ ๐11 ๐21 ๐31 ๐12 ๐22 ๐32 ๐13 ๐23 ๐33 ๐ ๐ ๐ = ๐
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐11 ๐12 ๐13 ๐21 ๐22 ๐23 ๐31 ๐32 ๐33 = k
reshape([a b c]โ*[d e f], [1, 9])
points in image i
A = U3W3
ยฝ and S = W3 ยฝ V3 T
Source: M. Hebert
in all views
something like this: One solution:
cameras points
A factorization method. IJCV, 9(2):137-154, November 1992.
Lischinksi and Gruber http://www.cs.huji.ac.il/~csip/sfm.pdf
projective SfM
๏บ ๏บ ๏ป ๏น ๏ช ๏ช ๏ซ ๏ฉ ๏ช ๏ฝ ) ( ) ( ) ( ) ( ) ( ) , (
2 2 D y D y x D y x D x I D I
I I I I I I g ๏ณ ๏ณ ๏ณ ๏ณ ๏ณ ๏ณ ๏ณ ๏ญ
59
derivatives
derivatives
filter g(๏ณI)
Ix Iy Ix
2
Iy
2
IxIy g(Ix
2)
g(Iy
2)
g(IxIy)
2 2 2 2 2 2
)] ( ) ( [ )] ( [ ) ( ) (
y x y x y x
I g I g I I g I g I g ๏ซ ๏ญ ๏ญ ๏ก ๏ฝ ๏ญ ๏ฝ ] )) , ( [trace( )] , ( det[
2 D I D I
har ๏ณ ๏ณ ๏ญ ๏ก ๏ณ ๏ณ ๏ญ
har
1 2 1 2
det trace M M ๏ฌ ๏ฌ ๏ฌ ๏ฌ ๏ฝ ๏ฝ ๏ซ
a) Initialize (xโ,yโ) = (x,y) b) Compute (u,v) by c) Shift window by (u, v): xโ=xโ+u; yโ=yโ+v; d) Recalculate It e) Repeat steps 2-4 until small change
2nd moment matrix for feature patch in first image displacement It = I(xโ, yโ, t+1) - I(x, y, t) Original (x,y) position
Tomasi-Kanade factorization Solve for
constraints
Problem: recover F from matches with outliers
load matches.mat
[c1, r1] โ 477 x 2 [c2, r2] โ 500 x 2 matches โ 252 x 2 matches(:,1): matched point in im1 matches(:,2): matched point in im2
Write-up:
x
xโ=[u v 1]
l=Fx=[a b c]
๐ ๐, ๐ฆโฒ = |๐๐ฃ + ๐๐ค + ๐| ๐2 + ๐2
Problem: recover motion and structure
load tracks.mat
track_x โ [500 x 51] track_y - [500 x 51] Use plotSfM(A, S) to diplay motion and shape A โ [2m x 3] motion matrix S โ [3 x n]
CCT
T i T i i 2 1
where
1 1
i T T i
2 2
i T T i
2 1
i T T i
Assume Sign = 1.65m Question: Whatโs the heights of
Input:
segment with (x1, y1, x2, y2, lineLength) Output:
X, Y, Z
line segments correspond to the vanishing point.
Try โun-normalizedโ 8-point algorithm. Report and compare the accuracy with the normalized version
frame throughout the sequence.
positions of points that aren't visible in a particular frame.
the same object or scene, compute a representation
Source: Y. Furukawa
Source: Y. Furukawa
Source: Y. Furukawa
Source: Y. Furukawa
view
reference camera input image
input image
Image 1 Image 2 Sweeping plane Scene surface
Hardware, CVPR 2003
depth map w.r.t. that view using a multi-baseline approach
volume or a mesh (see, e.g., Curless and Levoy 96)
Map 1 Map 2 Merged
camera parameters
Yasutaka Furukawa, Brian Curless, Steven M. Seitz and Richard Szeliski, Towards Internet- scale Multi-view Stereo,CVPR 2010.
Reconstruction," 3DV 2013.
1981
A factorization method.โ C. Tomasi and T. Kanade, IJCV, 9(2):137-154, November 1992