Structure From Motion
EECS 442 โ David Fouhey Fall 2019, University of Michigan
http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/
Structure From Motion EECS 442 David Fouhey Fall 2019, University - - PowerPoint PPT Presentation
Structure From Motion EECS 442 David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/ Structure from Motion Structure from motion Have: 2D points p ij seen in m images Assume: points generated
EECS 442 โ David Fouhey Fall 2019, University of Michigan
http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/
Structure from Motion
Structure from motion
Xj M1 M2 M3
Have: 2D points pij seen in m images Assume: points generated from n fixed 3D points Xj and cameras Mi or ๐๐๐ โก ๐ต๐๐๐
๐๐๐๐ = ๐ต๐๐๐, ๐ โ 0 ๐ต๐ โก ๐ณ๐[๐บ๐, ๐๐]
(Remember)
Known Unknown
Want: Cameras ๐ต๐, points ๐๐
Diagram credit: S. Lazebnik
p2j p3j p1j
Is SFM always uniquely solvable?
Source: N. Snavely
Structure from motion ambiguities
3x1 3x4 4x1 Letโs first find one easy ambiguity
Zoolander, 2001
Structure from motion ambiguities
Letโs first find one easy ambiguity Can pick any arbitrary scaling factor k and adjust the cameras and points
(Can usually be fixed in practice: just need a number,
Structure from motion ambiguity
p1j p2j p3j Xj M1 M2 M3
Does this diagram change meaning if I use this coordinate system? x y z Versus this coordinate system?z x y Coordinate system irrelevant! So global R,t also ambiguous
Structure from motion ambiguities
Not just limited to scale. Given: Can insert any global transform H
H is a 3D homography / perspective transform / projective transform
Similarity/Affine/Perspective
House image: A. Efros
Given: Perspective Lines ๐ ๐ ๐ ๐ ๐ ๐ ๐ โ ๐ Affine +Parallelism ๐ ๐ ๐ ๐ ๐ ๐ 1 Similarity +Angles ๐ก๐บ ๐ 1 3D: same idea, different dimensions
Projective ambiguity
With no constraints on cameras matrices and scene, can only reconstruct up to a perspective ambiguity
Slide credit: S. Lazebnik
Projective ambiguity
Slide credit: S. Lazebnik
Affine ambiguity
If we have constraints in the form of what lines are parallel, can reduce ambiguity to affine ambiguity.
Affine
๐ฐ = ๐ฉ ๐ 1
Slide credit: S. Lazebnik
Affine ambiguity
Slide credit: S. Lazebnik
Similarity ambiguity
๐ฐ = ๐ก๐บ ๐ 1
Slide credit: S. Lazebnik
If we have orthogonality constraints, get up to similarity transform. Really the best we can do. We get this if we have calibrated cameras.
Similarity ambiguity
Slide credit: S. Lazebnik
Affine structure from motion
Weโll do the math with affine / weak perspective cameras (math is much easier) Perspective Weak Perspective
Recall: orthographic projection
Image World
Projection along the z direction
๐ฃ ๐ค 1 = 1 1 1 ๐ฆ ๐ง ๐จ 1 โ ๐ฆ ๐ง Orthographic camera: things infinitely far away but you have an amazing camera
Field of view and focal length
standard wide-angle telephoto
Slide Credit: F. Durand
Affine Camera
๐ต = ๐ฉ2๐ธ ๐2๐ธ 1 1 1 1 ๐ฉ3๐ธ ๐3๐ธ 1 3x3 Matrix Affine 2D 3x4 Ortho. Proj 4x4 Matrix Affine 3D Tedious mathโฆ ๐ต = ๐11 ๐12 ๐13 ๐1 ๐21 ๐22 ๐23 ๐2 1
Affine Camera
So what? Who cares? Examine the projection ๐ฃ ๐ค 1 โก ๐11 ๐12 ๐13 ๐1 ๐21 ๐22 ๐23 ๐2 1 ๐ ๐ ๐ 1 ๐ฃ ๐ค โก ๐11 ๐12 ๐13 ๐21 ๐22 ๐23 ๐ ๐ ๐ + ๐1 ๐2 Projection becomes linear mapping + translation and doesnโt involve homogeneous coordinates! b is projection of origin. Can anyone see why?
Affine structure from motion
General structure from motion:
3x1 3x4 4x1
Assume M is affine camera:
2x1 2x3 3x1 2x1
mn 2D points, m cameras, n 3D points up to arbitrary 3D affine (12 DOF) Need: 2mn โฅ 8m + 3n โ 12 (m = 2): n โฅ 4 (for all m!)
One simplifying trick
๐๐๐ = ๐ฉ๐๐๐ + ๐๐ เท ๐๐๐ = ๐๐๐ โ 1 ๐ เท
๐=1 ๐
๐๐๐ Subtract off the average 2D point = ๐ฉ๐๐๐ + ๐๐ โ 1 ๐ เท
๐=1 ๐
๐ฉ๐ ๐๐ + ๐๐ Gather terms involving Ai ,push out bi เท ๐๐๐ = ๐ฉ๐ ๐๐ โ 1 ๐ เท
๐=1 ๐
๐๐ + ๐๐ โ 1 ๐ เท
๐=1 ๐
๐๐ Set origin to mean of 3D points เท ๐๐๐ = ๐ฉ๐๐๐ Can do this entirely in terms of A!
Affine structure from motion
First, make data measurement matrix consisting
เท ๐๐๐ โฏ เท ๐๐๐ โฎ โฑ โฎ เท ๐๐๐ โฏ เท ๐๐๐ n points m cameras เท ๐ฃ11 เท ๐ค11 โฏ เท ๐ฃ1๐ เท ๐ค1๐ โฎ โฑ โฎ เท ๐ฃ๐1 เท ๐ค๐1 โฏ เท ๐ฃ๐๐ เท ๐ค๐๐
A factorization method. IJCV, 9(2):137-154, November 1992.
How big is this matrix?
Affine structure from motion
๐ฌ = เท ๐๐๐ โฏ เท ๐๐๐ โฎ โฑ โฎ เท ๐๐๐ โฏ เท ๐๐๐ 2m x n D = ๐ฉ๐ โฎ ๐ฉ๐ 2mx3 M ๐๐ โฏ ๐๐ 3xn S Then, write all the equations in one in terms of product of cameras and points. Whatโs the rank of D? 3!
A factorization method. IJCV, 9(2):137-154, November 1992.
Making Matrices Rank Deficient
Repeat of epipolar geometry class, but important enough to see twice. Given matrix M:
See EckartโYoungโMirsky theorem if youโre interested
๐ โ ๐ฮฃ๐๐
๐๐ร๐, ฮฃ๐ร๐ ๐
๐ร๐
rotation matrices diagonal scaling matrix ฮฃ = ๐1 โฏ โฎ โฑ โฎ โฏ ๐๐
เทก ๐ โ ๐เท ฮฃ๐๐
Minimizes ๐ โ เทก ๐ ๐บ(sum of squares) subject to rank( เทก ๐) โค k Keep only k biggest ฯ; set
Affine structure from motion
D M S x =
2m n 3
Weโd like to take the measurements and convert them into M, S
Remake of M. Hebert diagram
Affine structure from motion
Do SVD (typically you donโt make full U,ฮฃ,V)
D
2m n
x = U ฮฃ
n
VT
n n n
x D x = x
U3 ฮฃ3 V3
T
Truncate to top 3 singular values
Remake of M. Hebert diagram
Affine structure from motion
D x = x
U3 ฮฃ3 V3
T
Nearly there apart from this annoying ฮฃ3. One solution (split ฮฃ3 in two): ๐ธ = ๐3ฮฃ3
ฮค 1 2ฮฃ3 1/2๐ 3 ๐
๐ ๐
D x =
M S
But remember that we can put HH-1 in the middle
Remake of M. Hebert diagram
Eliminating the affine ambiguity
Rows ai of Ai give axes of camera. Can multiply each projection Ai with C to make AiC that satisfies:
p X a1 a2 ๐๐
๐ผ๐๐ = 0
๐๐ = 1 ๐๐ = 1
Gives 3 equations per camera, can set AiC to new camera, and C-1S to new points. In general, a recipe for eliminating ambiguities
Remake of M. Hebert diagram
Reconstruction results
A factorization method, IJCV 1992
Dealing with missing data
cameras points
So far, assume we can see all points in all views In reality, measurement matrix typically looks like this: Possible solution: find dense blocks, solve in block, fuse. In general, finding these dense blocks is NP-complete
Figure Credit: S. Lazebnik
But cameras arenโt affine!
Want: m cameras Mi, n 3D points Xj Given: mn 2D points pij
When is this Possible?
2D point (2) Want: m cameras Mi, n 3D points Xj Given: mn 2D points pij 3x4 camera matrix (11) why? 3D point (3)
Need 2mn โฅ 11m + 3n โ 15 (m = 2): n โฅ 7 (m = 3): n โฅ 6 (doesnโt get better after) (m=1): n โค 4
4x4 homography (15) why?
Two Camera Case
For two cameras, we need 7 points. Hmm. What else (in theory) requires 7 points?
๐ต1 = [๐ฑ, ๐] ๐ต2 = [โ ๐๐ฆ ๐ฎ, ๐] Remember: this is up to a projective ambiguity!
X p p'
Compute fundamental matrix F and epipole b s.t. FTb = 0. Then:
b
๐ต1 ๐ต2
Incremental SFM
Key idea: incrementally add cameras, points
Note: numbers of points arenโt to scale.
? ? ? ?
Cameras Points
? ? ? ?
M1 M2
Remake of S. Lazebnik material
Incremental SFM
Key idea: incrementally add cameras, points
= [Ri,ti] with fundamental matrix
Note: numbers of points arenโt to scale.
? ? ? ?
Cameras Points
? ? ? ?
M1 M2
Remake of S. Lazebnik material
Incremental SFM
Key idea: incrementally add cameras, points
= [Ri,ti] with fundamental matrix
Xj with triangulation
Note: numbers of points arenโt to scale.
? ? ? ?
Cameras Points
? ? ? ?
M1 M2
How could we add another camera?
X1 X2 X3
Remake of S. Lazebnik material
Incremental SFM
Key idea: incrementally add cameras, points
Note: numbers of points arenโt to scale.
? ? ? ?
Cameras Points
? ? ? ?
M1 M2 X1 X2 X3
using visible, known points using calibration
M3
Remake of S. Lazebnik material
Incremental SFM
Key idea: incrementally add cameras, points
Note: numbers of points arenโt to scale.
? ? ? ?
Cameras Points
? ? ? ?
M1 M2 X1 X2 X3
using visible, known points using calibration Now we can see the fourth point in two cameras.
M3
Remake of S. Lazebnik material
Incremental SFM
Key idea: incrementally add cameras, points
Note: numbers of points arenโt to scale.
? ? ? ?
Cameras Points
? ? ? ?
M1 M2 X1 X2 X3
using visible, known points using calibration
coordinates of newly visible points using triangulation
M3 X4
Remake of S. Lazebnik material
Incremental SFM
Key idea: incrementally add cameras, points
Note: numbers of points arenโt to scale.
? ? ? ?
Cameras Points
? ? ? ?
M1 M2 X1 X2 X3
Big problem: donโt ever jointly consider all the 3D points and camera. Leads to final step, called bundle adjustment.
M3 X4
Remake of S. Lazebnik material
Bundle Adjustment
p1j p2j p3j Xj M1 M2 M3 M1Xj M2Xj M3Xj
Do non-linear minimization over cameras Mi, points Xj to minimize distance between observed points pij and projections MiXj when theyโre visible.
arg min
๐๐,๐๐ ๐ฅ๐๐ ๐ ๐๐๐ ๐, ๐๐๐ 2
Visibility flag
Figure Credit: S. Lazebnik
Devil is in the details
arg min
๐๐,๐๐ ๐ฅ๐๐ ๐ ๐๐๐ ๐, ๐๐๐ 2
High-level idea: In practice:
experts
Representative SFM pipeline
SIGGRAPH 2006. http://phototour.cs.washington.edu/
Feature detection
Detect SIFT features
Source: N. Snavely
Feature detection
Detect SIFT features
Source: N. Snavely
Feature matching
Match features between each pair of images
Source: N. Snavely
Feature matching
Use RANSAC to estimate fundamental matrix between each pair
Source: N. Snavely
Feature matching
Use RANSAC to estimate fundamental matrix between each pair
Image source
Feature matching
Use RANSAC to estimate fundamental matrix between each pair
Source: N. Snavely
Image connectivity graph
(graph layout produced using the Graphviz toolkit: http://www.graphviz.org/) Source: N. Snavely
In practice
(and preferably, good EXIF data)
point) from EXIF
to initialize model points
in the model
to model
Source: N. Snavely
The devil is in the details
Slide Credit: S. Lazebnik
The devil is in the details
Slide Credit: S. Lazebnik