Structure From Motion EECS 442 David Fouhey Fall 2019, University - - PowerPoint PPT Presentation

โ–ถ
structure from motion
SMART_READER_LITE
LIVE PREVIEW

Structure From Motion EECS 442 David Fouhey Fall 2019, University - - PowerPoint PPT Presentation

Structure From Motion EECS 442 David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/ Structure from Motion Structure from motion Have: 2D points p ij seen in m images Assume: points generated


slide-1
SLIDE 1

Structure From Motion

EECS 442 โ€“ David Fouhey Fall 2019, University of Michigan

http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/

slide-2
SLIDE 2

Structure from Motion

slide-3
SLIDE 3

Structure from motion

Xj M1 M2 M3

Have: 2D points pij seen in m images Assume: points generated from n fixed 3D points Xj and cameras Mi or ๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ

๐œ‡๐’’๐‘—๐‘˜ = ๐‘ต๐’‹๐’€๐’Œ, ๐œ‡ โ‰  0 ๐‘ต๐’‹ โ‰ก ๐‘ณ๐’‹[๐‘บ๐’‹, ๐’–๐’‹]

(Remember)

Known Unknown

Want: Cameras ๐‘ต๐’‹, points ๐’€๐’Œ

Diagram credit: S. Lazebnik

p2j p3j p1j

slide-4
SLIDE 4

Is SFM always uniquely solvable?

Source: N. Snavely

  • Necker cube
slide-5
SLIDE 5

Structure from motion ambiguities

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹ ๐’€๐’Œ

3x1 3x4 4x1 Letโ€™s first find one easy ambiguity

slide-6
SLIDE 6

Zoolander, 2001

slide-7
SLIDE 7

Structure from motion ambiguities

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ

Letโ€™s first find one easy ambiguity Can pick any arbitrary scaling factor k and adjust the cameras and points

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐‘™โˆ’๐Ÿ๐‘™๐’€๐’Œ

(Can usually be fixed in practice: just need a number,

  • btainable from heights of known objects or an IMU)
slide-8
SLIDE 8

Structure from motion ambiguity

p1j p2j p3j Xj M1 M2 M3

Does this diagram change meaning if I use this coordinate system? x y z Versus this coordinate system?z x y Coordinate system irrelevant! So global R,t also ambiguous

slide-9
SLIDE 9

Structure from motion ambiguities

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ

Not just limited to scale. Given: Can insert any global transform H

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ = ๐‘ต๐’‹๐‘ฐโˆ’๐Ÿ๐‘ฐ๐’€๐’Œ

H is a 3D homography / perspective transform / projective transform

slide-10
SLIDE 10

Similarity/Affine/Perspective

House image: A. Efros

Given: Perspective Lines ๐‘ ๐‘ ๐‘‘ ๐‘’ ๐‘“ ๐‘” ๐‘• โ„Ž ๐‘— Affine +Parallelism ๐‘ ๐‘ ๐‘‘ ๐‘’ ๐‘“ ๐‘” 1 Similarity +Angles ๐‘ก๐‘บ ๐’– 1 3D: same idea, different dimensions

slide-11
SLIDE 11

Projective ambiguity

With no constraints on cameras matrices and scene, can only reconstruct up to a perspective ambiguity

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ = ๐‘ต๐’‹๐‘ฐโˆ’๐Ÿ๐‘ฐ๐’€๐’Œ H

Slide credit: S. Lazebnik

slide-12
SLIDE 12

Projective ambiguity

Slide credit: S. Lazebnik

slide-13
SLIDE 13

Affine ambiguity

If we have constraints in the form of what lines are parallel, can reduce ambiguity to affine ambiguity.

Affine

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ = ๐‘ต๐’‹๐‘ฐโˆ’๐Ÿ๐‘ฐ๐’€๐’Œ

๐‘ฐ = ๐‘ฉ ๐’– 1

Slide credit: S. Lazebnik

slide-14
SLIDE 14

Affine ambiguity

Slide credit: S. Lazebnik

slide-15
SLIDE 15

Similarity ambiguity

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ = ๐‘ต๐’‹๐‘ฐโˆ’๐Ÿ๐‘ฐ๐’€๐’Œ

๐‘ฐ = ๐‘ก๐‘บ ๐’– 1

Slide credit: S. Lazebnik

If we have orthogonality constraints, get up to similarity transform. Really the best we can do. We get this if we have calibrated cameras.

slide-16
SLIDE 16

Similarity ambiguity

Slide credit: S. Lazebnik

slide-17
SLIDE 17

Affine structure from motion

Weโ€™ll do the math with affine / weak perspective cameras (math is much easier) Perspective Weak Perspective

slide-18
SLIDE 18

Recall: orthographic projection

Image World

Projection along the z direction

๐‘ฃ ๐‘ค 1 = 1 1 1 ๐‘ฆ ๐‘ง ๐‘จ 1 โ†’ ๐‘ฆ ๐‘ง Orthographic camera: things infinitely far away but you have an amazing camera

slide-19
SLIDE 19

Field of view and focal length

standard wide-angle telephoto

Slide Credit: F. Durand

slide-20
SLIDE 20

Affine Camera

๐‘ต = ๐‘ฉ2๐ธ ๐’–2๐ธ 1 1 1 1 ๐‘ฉ3๐ธ ๐’–3๐ธ 1 3x3 Matrix Affine 2D 3x4 Ortho. Proj 4x4 Matrix Affine 3D Tedious mathโ€ฆ ๐‘ต = ๐‘11 ๐‘12 ๐‘13 ๐‘1 ๐‘21 ๐‘22 ๐‘23 ๐‘2 1

slide-21
SLIDE 21

Affine Camera

So what? Who cares? Examine the projection ๐‘ฃ ๐‘ค 1 โ‰ก ๐‘11 ๐‘12 ๐‘13 ๐‘1 ๐‘21 ๐‘22 ๐‘23 ๐‘2 1 ๐‘Œ ๐‘ ๐‘Ž 1 ๐‘ฃ ๐‘ค โ‰ก ๐‘11 ๐‘12 ๐‘13 ๐‘21 ๐‘22 ๐‘23 ๐‘Œ ๐‘ ๐‘Ž + ๐‘1 ๐‘2 Projection becomes linear mapping + translation and doesnโ€™t involve homogeneous coordinates! b is projection of origin. Can anyone see why?

slide-22
SLIDE 22

Affine structure from motion

General structure from motion:

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ

3x1 3x4 4x1

Assume M is affine camera:

๐’’๐‘—๐‘˜ = ๐‘ฉ๐’‹๐’€๐’Œ + ๐’„๐’‹

2x1 2x3 3x1 2x1

mn 2D points, m cameras, n 3D points up to arbitrary 3D affine (12 DOF) Need: 2mn โ‰ฅ 8m + 3n โ€“ 12 (m = 2): n โ‰ฅ 4 (for all m!)

slide-23
SLIDE 23

One simplifying trick

๐’’๐‘—๐‘˜ = ๐‘ฉ๐’‹๐’€๐’Œ + ๐’„๐’‹ เทž ๐’’๐‘—๐‘˜ = ๐’’๐‘—๐‘˜ โˆ’ 1 ๐‘œ เท

๐‘™=1 ๐‘œ

๐’’๐‘—๐‘™ Subtract off the average 2D point = ๐‘ฉ๐‘—๐’€๐‘˜ + ๐’„๐‘— โˆ’ 1 ๐‘œ เท

๐‘™=1 ๐‘œ

๐‘ฉ๐‘— ๐’€๐‘™ + ๐’„๐‘— Gather terms involving Ai ,push out bi เทž ๐’’๐‘—๐‘˜ = ๐‘ฉ๐’‹ ๐’€๐’Œ โˆ’ 1 ๐‘œ เท

๐‘™=1 ๐‘œ

๐’€๐‘™ + ๐’„๐’‹ โˆ’ 1 ๐‘œ เท

๐‘™=1 ๐‘œ

๐’„๐‘— Set origin to mean of 3D points เทž ๐’’๐‘—๐‘˜ = ๐‘ฉ๐’‹๐’€๐’Œ Can do this entirely in terms of A!

slide-24
SLIDE 24

Affine structure from motion

First, make data measurement matrix consisting

  • f all the points stacked together

เทž ๐’’๐Ÿ๐Ÿ โ‹ฏ เทž ๐’’๐Ÿ๐’ โ‹ฎ โ‹ฑ โ‹ฎ เทž ๐’’๐’๐Ÿ โ‹ฏ เทŸ ๐’’๐’๐’ n points m cameras เทž ๐‘ฃ11 เทž ๐‘ค11 โ‹ฏ เทž ๐‘ฃ1๐‘œ เทž ๐‘ค1๐‘œ โ‹ฎ โ‹ฑ โ‹ฎ เทž ๐‘ฃ๐‘›1 เทž ๐‘ค๐‘›1 โ‹ฏ เทž ๐‘ฃ๐‘›๐‘œ เทž ๐‘ค๐‘›๐‘œ

  • C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

How big is this matrix?

slide-25
SLIDE 25

Affine structure from motion

๐‘ฌ = เทž ๐’’๐Ÿ๐Ÿ โ‹ฏ เทž ๐’’๐Ÿ๐’ โ‹ฎ โ‹ฑ โ‹ฎ เทž ๐’’๐’๐Ÿ โ‹ฏ เทŸ ๐’’๐’๐’ 2m x n D = ๐‘ฉ๐Ÿ โ‹ฎ ๐‘ฉ๐’ 2mx3 M ๐’€๐Ÿ โ‹ฏ ๐’€๐’ 3xn S Then, write all the equations in one in terms of product of cameras and points. Whatโ€™s the rank of D? 3!

  • C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

slide-26
SLIDE 26

Making Matrices Rank Deficient

Repeat of epipolar geometry class, but important enough to see twice. Given matrix M:

See Eckartโ€“Youngโ€“Mirsky theorem if youโ€™re interested

๐‘ โ†’ ๐‘‰ฮฃ๐‘Š๐‘ˆ

๐‘‰๐‘›ร—๐‘›, ฮฃ๐‘›ร—๐‘œ ๐‘Š

๐‘œร—๐‘œ

rotation matrices diagonal scaling matrix ฮฃ = ๐œ1 โ‹ฏ โ‹ฎ โ‹ฑ โ‹ฎ โ‹ฏ ๐œ๐‘›

เทก ๐‘ โ† ๐‘‰เท  ฮฃ๐‘Š๐‘ˆ

Minimizes ๐‘ โˆ’ เทก ๐‘ ๐บ(sum of squares) subject to rank( เทก ๐‘) โ‰ค k Keep only k biggest ฯƒ; set

  • thers to 0
slide-27
SLIDE 27

Affine structure from motion

D M S x =

2m n 3

Weโ€™d like to take the measurements and convert them into M, S

Remake of M. Hebert diagram

slide-28
SLIDE 28

Affine structure from motion

Do SVD (typically you donโ€™t make full U,ฮฃ,V)

D

2m n

x = U ฮฃ

n

VT

n n n

x D x = x

U3 ฮฃ3 V3

T

Truncate to top 3 singular values

Remake of M. Hebert diagram

slide-29
SLIDE 29

Affine structure from motion

D x = x

U3 ฮฃ3 V3

T

Nearly there apart from this annoying ฮฃ3. One solution (split ฮฃ3 in two): ๐ธ = ๐‘‰3ฮฃ3

ฮค 1 2ฮฃ3 1/2๐‘Š 3 ๐‘ˆ

๐‘ ๐‘‡

D x =

M S

But remember that we can put HH-1 in the middle

Remake of M. Hebert diagram

slide-30
SLIDE 30

Eliminating the affine ambiguity

Rows ai of Ai give axes of camera. Can multiply each projection Ai with C to make AiC that satisfies:

p X a1 a2 ๐’ƒ๐Ÿ

๐‘ผ๐’ƒ๐Ÿ‘ = 0

๐’ƒ๐Ÿ = 1 ๐’ƒ๐Ÿ‘ = 1

Gives 3 equations per camera, can set AiC to new camera, and C-1S to new points. In general, a recipe for eliminating ambiguities

Remake of M. Hebert diagram

slide-31
SLIDE 31

Reconstruction results

  • C. Tomasi and T. Kanade, Shape and motion from image streams under orthography:

A factorization method, IJCV 1992

slide-32
SLIDE 32

Dealing with missing data

cameras points

So far, assume we can see all points in all views In reality, measurement matrix typically looks like this: Possible solution: find dense blocks, solve in block, fuse. In general, finding these dense blocks is NP-complete

Figure Credit: S. Lazebnik

slide-33
SLIDE 33

But cameras arenโ€™t affine!

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ = ๐‘ต๐’‹๐‘ฐโˆ’๐Ÿ๐‘ฐ๐’€๐’Œ

Want: m cameras Mi, n 3D points Xj Given: mn 2D points pij

slide-34
SLIDE 34

When is this Possible?

๐’’๐‘—๐‘˜ โ‰ก ๐‘ต๐’‹๐’€๐’Œ = ๐‘ต๐’‹๐‘ฐโˆ’๐Ÿ๐‘ฐ๐’€๐’Œ

2D point (2) Want: m cameras Mi, n 3D points Xj Given: mn 2D points pij 3x4 camera matrix (11) why? 3D point (3)

Need 2mn โ‰ฅ 11m + 3n โ€“ 15 (m = 2): n โ‰ฅ 7 (m = 3): n โ‰ฅ 6 (doesnโ€™t get better after) (m=1): n โ‰ค 4

4x4 homography (15) why?

slide-35
SLIDE 35

Two Camera Case

For two cameras, we need 7 points. Hmm. What else (in theory) requires 7 points?

๐‘ต1 = [๐‘ฑ, ๐Ÿ] ๐‘ต2 = [โˆ’ ๐’„๐‘ฆ ๐‘ฎ, ๐’„] Remember: this is up to a projective ambiguity!

X p p'

Compute fundamental matrix F and epipole b s.t. FTb = 0. Then:

b

๐‘ต1 ๐‘ต2

slide-36
SLIDE 36

Incremental SFM

Key idea: incrementally add cameras, points

Note: numbers of points arenโ€™t to scale.

? ? ? ?

Cameras Points

? ? ? ?

M1 M2

Remake of S. Lazebnik material

slide-37
SLIDE 37

Incremental SFM

Key idea: incrementally add cameras, points

  • 1. Initialize motion Mi

= [Ri,ti] with fundamental matrix

Note: numbers of points arenโ€™t to scale.

? ? ? ?

Cameras Points

? ? ? ?

M1 M2

Remake of S. Lazebnik material

slide-38
SLIDE 38

Incremental SFM

Key idea: incrementally add cameras, points

  • 1. Initialize motion Mi

= [Ri,ti] with fundamental matrix

  • 2. Initialize structure

Xj with triangulation

Note: numbers of points arenโ€™t to scale.

? ? ? ?

Cameras Points

? ? ? ?

M1 M2

How could we add another camera?

X1 X2 X3

Remake of S. Lazebnik material

slide-39
SLIDE 39

Incremental SFM

Key idea: incrementally add cameras, points

Note: numbers of points arenโ€™t to scale.

? ? ? ?

Cameras Points

? ? ? ?

M1 M2 X1 X2 X3

  • 1. Solve for camera matrix

using visible, known points using calibration

M3

Remake of S. Lazebnik material

slide-40
SLIDE 40

Incremental SFM

Key idea: incrementally add cameras, points

Note: numbers of points arenโ€™t to scale.

? ? ? ?

Cameras Points

? ? ? ?

M1 M2 X1 X2 X3

  • 1. Solve for camera matrix

using visible, known points using calibration Now we can see the fourth point in two cameras.

M3

Remake of S. Lazebnik material

slide-41
SLIDE 41

Incremental SFM

Key idea: incrementally add cameras, points

Note: numbers of points arenโ€™t to scale.

? ? ? ?

Cameras Points

? ? ? ?

M1 M2 X1 X2 X3

  • 1. Solve for camera matrix

using visible, known points using calibration

  • 2. Solve for 3D

coordinates of newly visible points using triangulation

M3 X4

Remake of S. Lazebnik material

slide-42
SLIDE 42

Incremental SFM

Key idea: incrementally add cameras, points

Note: numbers of points arenโ€™t to scale.

? ? ? ?

Cameras Points

? ? ? ?

M1 M2 X1 X2 X3

Big problem: donโ€™t ever jointly consider all the 3D points and camera. Leads to final step, called bundle adjustment.

M3 X4

Remake of S. Lazebnik material

slide-43
SLIDE 43

Bundle Adjustment

p1j p2j p3j Xj M1 M2 M3 M1Xj M2Xj M3Xj

Do non-linear minimization over cameras Mi, points Xj to minimize distance between observed points pij and projections MiXj when theyโ€™re visible.

arg min

๐‘๐‘—,๐‘Œ๐‘˜ ๐‘ฅ๐‘—๐‘˜ ๐‘’ ๐‘๐‘—๐‘Œ ๐‘˜, ๐‘ž๐‘—๐‘˜ 2

Visibility flag

Figure Credit: S. Lazebnik

slide-44
SLIDE 44

Devil is in the details

arg min

๐‘๐‘—,๐‘Œ๐‘˜ ๐‘ฅ๐‘—๐‘˜ ๐‘’ ๐‘๐‘—๐‘Œ ๐‘˜, ๐‘ž๐‘—๐‘˜ 2

High-level idea: In practice:

  • Have to initialize reasonably well
  • Should minimize over K,R,t directly
  • Problem is very sparse: wij almost always zero
  • Need to integrate uncertainty information
  • Probably want to use a system written by

experts

slide-45
SLIDE 45

Representative SFM pipeline

  • N. Snavely, S. Seitz, and R. Szeliski, Photo tourism: Exploring photo collections in 3D,

SIGGRAPH 2006. http://phototour.cs.washington.edu/

slide-46
SLIDE 46

Feature detection

Detect SIFT features

Source: N. Snavely

slide-47
SLIDE 47

Feature detection

Detect SIFT features

Source: N. Snavely

slide-48
SLIDE 48

Feature matching

Match features between each pair of images

Source: N. Snavely

slide-49
SLIDE 49

Feature matching

Use RANSAC to estimate fundamental matrix between each pair

Source: N. Snavely

slide-50
SLIDE 50

Feature matching

Use RANSAC to estimate fundamental matrix between each pair

Image source

slide-51
SLIDE 51

Feature matching

Use RANSAC to estimate fundamental matrix between each pair

Source: N. Snavely

slide-52
SLIDE 52

Image connectivity graph

(graph layout produced using the Graphviz toolkit: http://www.graphviz.org/) Source: N. Snavely

slide-53
SLIDE 53

In practice

  • Pick a pair of images with lots of inliers

(and preferably, good EXIF data)

  • Initialize intrinsic parameters (focal length, principal

point) from EXIF

  • Estimate extrinsic parameters (R and t) Use triangulation

to initialize model points

  • While remaining images exist
  • Find an image with many feature matches with images

in the model

  • Run RANSAC on feature matches to register new image

to model

  • Triangulate new points
  • Perform bundle adjustment to re-optimize everything

Source: N. Snavely

slide-54
SLIDE 54

The devil is in the details

  • Degenerate configurations (homographies)
  • Eliminating outliers
  • Repetition and symmetry

Slide Credit: S. Lazebnik

slide-55
SLIDE 55

The devil is in the details

  • Degenerate configurations (homographies)
  • Eliminating outliers
  • Repetition and symmetry
  • Multiple connected components

Slide Credit: S. Lazebnik