Structure from Motion Structure from Motion For now, static scene - - PowerPoint PPT Presentation

structure from motion
SMART_READER_LITE
LIVE PREVIEW

Structure from Motion Structure from Motion For now, static scene - - PowerPoint PPT Presentation

Structure from Motion Structure from Motion For now, static scene and moving camera Equivalently, rigidly moving scene and static camera Limiting case of stereo with many cameras Limiting case of multiview camera calibration with


slide-1
SLIDE 1

Structure from Motion

slide-2
SLIDE 2

Structure from Motion

  • For now, static scene and moving camera

– Equivalently, rigidly moving scene and

static camera

  • Limiting case of stereo with many cameras
  • Limiting case of multiview camera calibration

with unknown target

  • Given n points and N camera positions, have

2nN equations and 3n+6N unknowns

slide-3
SLIDE 3

Approaches

  • Obtaining point correspondences

– Optical flow – Stereo methods: correlation, feature matching

  • Solving for points and camera motion

– Nonlinear minimization (bundle adjustment) – Various approximations…

slide-4
SLIDE 4

Orthographic Approximation

  • Simplest SFM case: camera approximated by
  • rthographic projection

Perspective Orthographic

slide-5
SLIDE 5

Weak Perspective

  • An orthographic assumption is sometimes well

approximated by a telephoto lens

Weak Perspective

slide-6
SLIDE 6

Consequences of Orthographic Projection

  • Translation perpendicular to image plane

cannot be recovered

  • Scene can be recovered up to scale

(if weak perspective)

slide-7
SLIDE 7

Orthographic Structure from Motion

  • Method due to Tomasi & Kanade, 1992
  • Assume n points in 3D space p1 .. pn
  • Observed at N points in time at image

coordinates (xij, yij), i = 1..N, j=1..n

– Feature tracking, optical flow, etc. – All points visible in all frames

slide-8
SLIDE 8

Orthographic Structure from Motion

  • Write down matrix of data

                    =

Nn N n Nn N n

y y y y x x x x          

1 1 11 1 1 11

D

Points → Frames → Frames →

slide-9
SLIDE 9

Orthographic Structure from Motion

  • Step 1: find translation
  • Translation perpendicular to viewing

direction cannot be obtained

  • Translation parallel to viewing direction equals

motion of average position of all points

slide-10
SLIDE 10

Orthographic Structure from Motion

  • After finding translation, subtract it out

(i.e., subtract average of each row)

                    − − − − − − − − =

N Nn N N n N Nn N N n

y y y y y y y y x x x x x x x x          

1 1 1 1 11 1 1 1 1 11

~ D

slide-11
SLIDE 11

Orthographic Structure from Motion

  • Step 2: try to find rotation
  • Rotation at each frame defines local coordinate

axes , , and

  • Then

i ˆ j ˆ k ˆ

j i ij j i ij

y x p j p i ~ ˆ ~ , ~ ˆ ~ ⋅ = ⋅ =

slide-12
SLIDE 12

Orthographic Structure from Motion

  • So, can write where R is a “rotation”

matrix and S is a “shape” matrix

RS D = ~

[ ]

n N N

p p S j j i i R ~ ~ ˆ ˆ ˆ ˆ

1 T T 1 T T 1

   =                     =                     − − − − − − − − =

N Nn N N n N Nn N N n

y y y y y y y y x x x x x x x x          

1 1 1 1 11 1 1 1 1 11

~ D

slide-13
SLIDE 13

Orthographic Structure from Motion

  • Goal is to factor
  • Before we do, observe that rank( ) should be 3

(in ideal case with no noise)

  • Proof:

– Rank of R is 3 unless no rotation – Rank of S is 3 iff have noncoplanar points – Product of 2 matrices of rank 3 has rank 3

  • With noise, rank( ) might be > 3

D ~ D ~ D ~

slide-14
SLIDE 14

SVD

  • Goal is to factor into R and S
  • Apply SVD:
  • But should have rank 3 ⇒

all but 3 of the wi should be 0

  • Extract the top 3 wi, together with the

corresponding columns of U and V

D ~

T

~ UWV D = D ~

slide-15
SLIDE 15

Factoring for Orthographic Structure from Motion

  • After extracting columns, U3 has dimensions

2N×3 (just what we wanted for R)

  • W3V3

T has dimensions 3×n (just what we

wanted for S)

  • So, let R*=U3, S*=W3V3

T

slide-16
SLIDE 16

Affine Structure from Motion

  • The i and j entries of R* are not, in general,

unit length and perpendicular

  • We have found motion (and therefore shape)

up to an affine transformation

  • This is the best we could do if we didn’t

assume orthographic camera

slide-17
SLIDE 17

Ensuring Orthogonality

  • Since can be factored as R* S*, it can also be

factored as (R*Q)(Q-1S*), for any Q

  • So, search for Q such that R = R* Q has the

properties we want

D ~

slide-18
SLIDE 18

Ensuring Orthogonality

  • Want or
  • Let T = QQT
  • Equations for elements of T – solve by

least squares

  • Ambiguity – add constraints

( ) ( ) 1

ˆ ˆ

T * T *

= ⋅ Q i Q i

i i

ˆ ˆ 1 ˆ ˆ 1 ˆ ˆ

* T T * * T T * * T T *

= = =

i i i i i i

j QQ i j QQ j i QQ i           =           = 1 ˆ , 1 ˆ

* 1 T * 1 T

j Q i Q

slide-19
SLIDE 19

Ensuring Orthogonality

  • Have found T = QQT
  • Find Q by taking “square root” of T

– Cholesky decomposition if T is positive definite – General algorithms (e.g. sqrtm in Matlab)

slide-20
SLIDE 20

Orthogonal Structure from Motion

  • Let’s recap:

– Write down matrix of observations – Find translation from avg. position – Subtract translation – Factor matrix using SVD – Write down equations for orthogonalization – Solve using least squares, square root

  • At end, get matrix R = R* Q of camera positions

and matrix S = Q-1S* of 3D points

slide-21
SLIDE 21

Results

  • Image sequence

[Tomasi & Kanade]

slide-22
SLIDE 22

Results

  • Tracked features

[Tomasi & Kanade]

slide-23
SLIDE 23

Results

  • Reconstructed shape

[Tomasi & Kanade]

Front view Top view

slide-24
SLIDE 24

Orthographic → Perspective

  • With orthographic or “weak perspective” can’t

recover all information

  • With full perspective, can recover more

information (translation along optical axis)

  • Result: can recover geometry and full motion up

to global scale factor

slide-25
SLIDE 25

Perspective SFM Methods

  • Bundle adjustment (full nonlinear minimization)
  • Methods based on factorization
  • Methods based on fundamental matrices
  • Methods based on vanishing points
slide-26
SLIDE 26

Motion Field for Camera Motion

  • Translation:
  • Motion field lines converge (possibly at ∞)
slide-27
SLIDE 27

Motion Field for Camera Motion

  • Rotation:
  • Motion field lines do not converge
slide-28
SLIDE 28

Motion Field for Camera Motion

  • Combined rotation and translation:

motion field lines have component that converges, and component that does not

  • Algorithms can look for vanishing point,

then determine component of motion around this point

  • “Focus of expansion / contraction”
  • “Instantaneous epipole”
slide-29
SLIDE 29

Finding Instantaneous Epipole

  • Observation: motion field due to translation

depends on depth of points

  • Motion field due to rotation does not
  • Idea: compute difference between motion of a

point, motion of neighbors

  • Differences point towards instantaneous epipole
slide-30
SLIDE 30

SVD (Again!)

  • Want to fit direction to all ∆v (differences in
  • ptical flow) within some neighborhood
  • PCA on matrix of ∆v
  • Equivalently, take eigenvector of A = Σ(∆v)(∆v)T

corresponding to largest eigenvalue

  • Gives direction of parallax li in that patch,

together with estimate of reliability

slide-31
SLIDE 31

SFM Algorithm

  • Compute optical flow
  • Find vanishing point (least squares solution)
  • Find direction of translation from epipole
  • Find perpendicular component of motion
  • Find velocity, axis of rotation
  • Find depths of points (up to global scale)