Outline Last lecture: stereo reconstruction with calibrated - - PDF document

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Last lecture: stereo reconstruction with calibrated - - PDF document

Outline Last lecture: stereo reconstruction with calibrated cameras non-geometric correspondence constraints Lecture 12: Multi-view Homogeneous coordinates, projection matrices geometry / Stereo III Camera calibration


slide-1
SLIDE 1

Lecture 12: Multi-view geometry / Stereo III

Tuesday, Oct 23

CS 378/395T

  • Prof. Kristen Grauman

Outline

  • Last lecture:

– stereo reconstruction with calibrated cameras – non-geometric correspondence constraints

  • Homogeneous coordinates, projection matrices
  • Camera calibration
  • Weak calibration/self-calibration

– Fundamental matrix – 8-point algorithm

Review: stereo with calibrated cameras

Camera-centered coordinate systems are related by known rotation R and translation T.

Vector p’ in second coord.

  • sys. has

coordinates Rp’ in the first one.

Review: the essential matrix

( ) [ ]

= ′ × ⋅ p R T p

[ ] [ ]

= ′ = = ′ ⋅

Τ p

E p R T E p R T p

x x

E is the essential matrix, which relates corresponding image points in both cameras, given the rotation and translation between their coordinate systems.

Let

Review: stereo with calibrated cameras

  • Image pair
  • Detect some features
  • Compute E from R and T
  • Match features using the

epipolar and other constraints

  • Triangulate for 3d structure

Review: disparity/depth maps

image I(x,y) image I´(x´,y´) Disparity map D(x,y)

(x´,y´)=(x+D(x,y),y)

slide-2
SLIDE 2

Review: correspondence problem

Multiple match hypotheses satisfy epipolar constraint, but which is correct?

Figure from Gee & Cipolla 1999

  • To find matches in the image pair, assume

– Most scene points visible from both views – Image regions for the matches are similar in appearance

  • Dense or sparse matches
  • Additional (non-epipolar) constraints:

– Similarity – Uniqueness – Ordering – Figural continuity – Disparity gradient

Review: correspondence problem

Review: correspondence error sources

  • Low-contrast / textureless image regions
  • Occlusions
  • Camera calibration errors
  • Poor image resolution
  • Violations of brightness constancy (specular

reflections)

  • Large motions

Homogeneous coordinates

  • Extend Euclidean space: add an extra coordinate
  • Points are represented by equivalence classes
  • Why? This will allow us to write process of

perspective projection as a matrix

2d: (x, y)’ (x, y, 1)’ 3d: (x, y, z)’ (x, y, z, 1)’

Mapping to homogeneous coordinates Mapping back from homogeneous coordinates

2d: (x, y, w)’ (x/w, y/w)’ 3d: (x, y, z, w)’ (x/w, y/w, z/w)’

Homogeneous coordinates

  • Equivalence relation:

(x, y, z, w) is the same as (kx, ky, kz, kw) Homogeneous coordinates are only defined up to a scale

Perspective projection equations

Scene point Image coordinates Camera frame Image plane Optical axis Focal length

slide-3
SLIDE 3

Projection matrix for perspective projection

From pinhole camera model Same thing, but written in terms of homogeneous coordinates From pinhole camera model

Z fX x = Z fY y =

Projection matrix for perspective projection Projection matrix for orthographic projection

1 Y y =

1

1 X x =

  • Extrinsic: location and orientation of camera

frame with respect to reference frame

  • Intrinsic: how to map pixel coordinates to

image plane coordinates

Camera parameters

Camera 1 frame Reference frame

Rigid transformations

Combinations of rotations and translation

  • Translation: add values to coordinates
  • Rotation: matrix multiplication

Rotation about coordinate axes in 3d

Express 3d rotation as series of rotations around coordinate axes by angles

γ β α , ,

Overall rotation is product of these elementary rotations:

z y x

R R R R =

slide-4
SLIDE 4

Extrinsic camera parameters

) ( T P R P − =

w c

Point in world Point in camera reference frame

( )

Z Y X

c

, , = P

  • Extrinsic: location and orientation of camera

frame with respect to reference frame

  • Intrinsic: how to map pixel coordinates

to image plane coordinates

Camera parameters

Camera 1 frame Reference frame

Intrinsic camera parameters

  • Ignoring any geometric distortions from
  • ptics, we can describe them by:

x x im

s

  • x

x ) ( − − =

y y im

s

  • y

y ) ( − − =

Coordinates of projected point in camera reference frame Coordinates of image point in pixel units Coordinates of image center in pixel units Effective size of a pixel (mm)

Camera parameters

  • We know that in terms of camera reference frame:
  • Substituting previous eqns describing intrinsic and

extrinsic parameters, can relate pixels coordinates to world points:

) ( ) ( ) (

3 1

T P R T P R − − = − −

Τ Τ w w x x im

f s

  • x

) ( ) ( ) (

3 2

T P R T P R − − = − −

Τ Τ w w y y im

f s

  • y

Ri = Row i of rotation matrix

Linear version of perspective projection equations

  • This can be rewritten

as a matrix product using homogeneous coordinates: x1 x2 x3

= Mint Mext

Xw Yw Zw 1

1 / /

y y x x

  • s

f

  • s

f − −

Mint= T R T R T R

Τ Τ Τ

− − −

3 33 32 31 2 23 22 21 1 13 12 11

r r r r r r r r r Mext=

3 2 3 1

/ / x x y x x x

im im

= =

point in camera coordinates

Calibrating a camera

  • Compute intrinsic and extrinsic

parameters using observed camera data Main idea

  • Place “calibration object” with

known geometry in the scene

  • Get correspondences
  • Solve for mapping from scene

to image: estimate M=MintMext

slide-5
SLIDE 5

Linear version of perspective projection equations

  • This can be rewritten

as a matrix product using homogeneous coordinates: x1 x2 x3

= Mint Mext

Xw Yw Zw 1

w w im w w im

y x P M P M P M P M ⋅ ⋅ = ⋅ ⋅ =

3 2 3 1

M

Let Mi be row i of matrix M

product M is single projection matrix encoding both extrinsic and intrinsic parameters

3 2 3 1

/ / x x y x x x

im im

= =

Pw in homog.

Estimating the projection matrix

w w im w w im

y x P M P M P M P M ⋅ ⋅ = ⋅ ⋅ =

3 2 3 1

w im

x P M M ⋅ − = ) (

3 1 w im

y P M M ⋅ − = ) (

3 2

Estimating the projection matrix

For a given feature point: In matrix form:

w im

x P M M ⋅ − = ) (

3 1 w im

y P M M ⋅ − = ) (

3 2

T w im w w im w

P y P P x P − −

Τ Τ Τ Τ Τ

3 2 1

M M M

=

Stack rows

  • f matrix M

Estimating the projection matrix

Expanding this to see the elements:

im w im w im w im w w w im w im w im w im w w w

y Z y Y y X y Z Y X x Z x Y x X x Z Y X − − − − − − − − 1 1

=

T w im w w im w

P y P P x P − −

Τ Τ Τ Τ Τ

3 2 1

M M M

=

w im

x P M M ⋅ − = ) (

3 1 w im

y P M M ⋅ − = ) (

3 2

Estimating the projection matrix

This is true for every feature point, so we can stack up n

  • bserved image features and their associated 3d points

in single equation: Solve for mij’s (the calibration information) with least squares. [F&P Section 3.1]

) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 (

1 1

im w im w im w w w w im w im w im w im w w w

y Z y Y y X y Z Y X x Z x Y x X x Z Y X

im

− − − − − − − −

… … … …

) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (

1 1

n n n n n n n im n n n n n n n n n n n n n

im w im w im w w w w im w im w im w im w w w

y Z y Y y X y Z Y X x Z x Y x X x Z Y X − − − − − − − −

=

=

P m Pm = 0

Summary: camera calibration

  • Associate image points with scene points
  • n object with known geometry
  • Use together with perspective projection

relationship to estimate projection matrix

  • (Can also solve for explicit parameters

themselves)

slide-6
SLIDE 6

When would we calibrate this way?

  • Makes sense when geometry of system is

not going to change over time

  • …When would it change?

Self-calibration

  • Want to estimate world geometry without

requiring calibrated cameras

– Archival videos – Photos from multiple unrelated users – Dynamic camera system

  • We can still reconstruct 3d structure, up to

certain ambiguities, if we can find correspondences between points…

Uncalibrated case

) ( ) (

1 int ,

left left

left

p M p

=

) ( 1 int ,

) (

right right

right

p M p

=

Camera coordinates Internal calibration matrices Image pixel coordinates

w extP

M M p

int

= p

So:

Uncalibrated case: fundamental matrix

) ( ) (

=

Τ left right Ep

p

From before, the essential matrix

( ) ( )

1 1

=

− Τ − left left right right

p M E p M

Dropped subscript, still internal parameter matrices

( )

1

=

− Τ − Τ left left right right

p EM M p =

Τ left right p

F p

Fundamental matrix

) ( ) (

1 int ,

left left

left

p M p

=

) ( 1 int ,

) (

right right

right

p M p

=

Fundamental matrix

  • Relates pixel coordinates in the two views
  • More general form than essential matrix:

we remove need to know intrinsic parameters

  • If we estimate fundamental matrix from

correspondences in pixel coordinates, can reconstruct epipolar geometry without intrinsic or extrinsic parameters

Computing F from correspondences

  • Cameras are uncalibrated: we don’t know

E or left or right Mint matrices

  • Estimate F from 8+ point correspondences.

=

Τ left right p

F p

( )

1 − Τ −

=

left rightEM

M F

slide-7
SLIDE 7

Computing F from correspondences =

Τ left right p

F p

Each point correspondence generates one constraint on F Collect n of these constraints Invert and solve for F. Or, if n > 8, least squares solution.

Robust computation

  • Find corners
  • Unguided matching – local search, cross-

correlation to get some seed matches

  • Compute F and epipolar geometry: find F that is

consistent with many of the seed matches

  • Now guide matching: using F to restrict search

to epipolar lines

RANSAC application: robust computation

Interest points (Harris corners) in left and right images about 500 pts / image

640x480 resolution Outliers (117) (t=1.25 pixel; 43 iterations) Final inliers (262)

Hartley & Zisserman p. 126

Putative correspondences (268) (Best match,SSD<20) Inliers (151)

Figure by Gee and Cipolla, 1999

Need for multi-view geometry and 3d reconstruction

Applications including:

  • 3d tracking
  • Depth-based grouping
  • Image rendering and generating interpolated or

“virtual” viewpoints

  • Interactive video

Z-keying for virtual reality

  • Merge synthetic and real images given

depth maps

Kanade et al., CMU, 1995

slide-8
SLIDE 8

Z-keying for virtual reality

http://www.cs.cmu.edu/afs/cs/project/stereo-machine/www/z-key.html

Virtualized RealityTM

http://www.cs.cmu.edu/~virtualized-reality/3manbball_new.html Kanade et al, CMU Capture 3d shape from multiple views, texture from images Use them to generate new views on demand

Virtual viewpoint video

  • C. Zitnick et al, High-quality video view interpolation using a layered representation,

SIGGRAPH 2004.

Virtual viewpoint video

http://research.microsoft.com/IVM/VVV/ Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," ACM Transactions on Graphics (SIGGRAPH Proceedings), 25(3), 2006, 835-846. http://phototour.cs.washington.edu/, http://labs.live.com/photosynth/

slide-9
SLIDE 9

Coming up

  • Tuesday: Local invariant features

– Read Lowe paper on SIFT

  • Problem set 3 out next Tuesday, due

11/13

  • Graduate students: remember paper

reviews and extensions, due 12/6