Geometry and Structure from Motion Computer Vision Fall 2018 - - PowerPoint PPT Presentation

geometry and structure from motion
SMART_READER_LITE
LIVE PREVIEW

Geometry and Structure from Motion Computer Vision Fall 2018 - - PowerPoint PPT Presentation

Geometry and Structure from Motion Computer Vision Fall 2018 Columbia University Stereo epipolar lines (x 2 , y 1 ) (x 1 , y 1 ) Two images captured by a purely horizontal translating camera ( rectified stereo pair) x 2 -x 1 = the disparity


slide-1
SLIDE 1

Geometry and Structure from Motion

Computer Vision Fall 2018 Columbia University

slide-2
SLIDE 2

Stereo

epipolar lines

(x1, y1) (x2, y1)

x2 -x1 = the disparity of pixel (x1, y1)

Two images captured by a purely horizontal translating camera (rectified stereo pair)

Slide credit: Noah Snavely

slide-3
SLIDE 3

Results with window search

Window-based matching (best window size) Ground truth

Slide credit: Noah Snavely

slide-4
SLIDE 4

Stereo as energy minimization

y = 141 x d

Simple pixel / window matching: choose the minimum of each column in the DSI independently:

Slide credit: Noah Snavely

slide-5
SLIDE 5

Stereo as energy minimization

y = 141 x d

Slide credit: Noah Snavely

  • Finds “smooth”, low-cost path through DPI from left

to right

  • {

{

match cost smoothness cost

slide-6
SLIDE 6

Dynamic Programming

slide-7
SLIDE 7

General case, with calibrated cameras

  • The two cameras need not have parallel optical axes.
slide-8
SLIDE 8

Stereo correspondence constraints

O O’ p p’ ? If we see a point in camera 1, are there any constraints on where we
 will find it on camera 2? Camera 1 Camera 2

8 Slide credit: Antonio Torralba

slide-9
SLIDE 9

Epipolar constraint

O O’ p p’ ?

9 Slide credit: Antonio Torralba

slide-10
SLIDE 10

Some terminology

10

O O’ p p’ ?

Slide credit: Antonio Torralba

slide-11
SLIDE 11

Some terminology

11

O O’ p p’ ?

Baseline: the line connecting the two camera centers Epipole: point of intersection of baseline with the image plane

Baseline

Slide credit: Antonio Torralba

slide-12
SLIDE 12

Some terminology

12

O O’ p p’ ?

Baseline: the line connecting the two camera centers Epipole: point of intersection of baseline with the image plane

epipole epipole Baseline

Slide credit: Antonio Torralba

slide-13
SLIDE 13

Some terminology

13

O O’ p p’ ?

Baseline: the line connecting the two camera centers Epipolar plane: the plane that contains the two camera centers and a 3D point in the world Epipole: point of intersection of baseline with the image plane

epipolar plane

Slide credit: Antonio Torralba

slide-14
SLIDE 14

Some terminology

14

O O’ p p’ ?

Baseline: the line connecting the two camera centers Epipolar plane: the plane that contains the two camera centers and a 3D point in the world Epipolar line: intersection of the epipolar plane with each image plane Epipole: point of intersection of baseline with the image plane

epipolar line epipolar line

Slide credit: Antonio Torralba

slide-15
SLIDE 15

Epipolar constraint

O O’ p p’ ?

15

epipolar line We can search for matches across epipolar lines All epipolar lines intersect at the epipoles

Slide credit: Antonio Torralba

slide-16
SLIDE 16

The essential matrix

16

O O’ p p’

pT Ep’ = 0

E: essential matrix p, p’: image points in homogeneous coordinates If we observe a point in one image, its position in the other image is constrained to lie

  • n line defined by above.

Slide credit: Antonio Torralba

slide-17
SLIDE 17

Epipolar Examples

Source: S. Lazebnik

slide-18
SLIDE 18

Where do they come from?

Source: S. Lazebnik

slide-19
SLIDE 19

Fundamental matrix – calibrated case

: intrinsics of camera 1 : intrinsics of camera 2 : rotation of image 2 w.r.t. camera 1 : ray through p in camera 1’s (and world) coordinate system : ray through q in camera 2’s coordinate system

slide-20
SLIDE 20

Fundamental matrix – calibrated case

  • , , and are coplanar
  • epipolar plane can be represented as
slide-21
SLIDE 21

Fundamental matrix – calibrated case

  • One more substitution:

– Cross product with t can be represented as a 3x3 matrix

slide-22
SLIDE 22

Fundamental matrix – calibrated case

slide-23
SLIDE 23

Fundamental matrix – calibrated case

: ray through p in camera 1’s (and world) coordinate system : ray through q in camera 2’s coordinate system

{

the Essential matrix

slide-24
SLIDE 24

Fundamental matrix – uncalibrated case

the Fundamental matrix

: intrinsics of camera 1 : intrinsics of camera 2 : rotation of image 2 w.r.t. camera 1

slide-25
SLIDE 25

Properties of the Fundamental Matrix

  • is the epipolar line associated with
  • is the epipolar line associated with
  • and
  • is rank 2
  • How many parameters does F have?

20

T

slide-26
SLIDE 26

Rectified case

slide-27
SLIDE 27

Stereo image rectification

  • reproject image planes onto a common
  • plane parallel to the line between optical centers
  • pixel motion is horizontal after this transformation
  • two homographies (3x3 transform), one for each input

image reprojection

  • C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo
  • Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.
slide-28
SLIDE 28

Original stereo pair After rectification

slide-29
SLIDE 29
slide-30
SLIDE 30

Estimating F

  • If we don’t know K1, K2, R, or t, can we

estimate F for two images?

  • Yes, given enough correspondences
slide-31
SLIDE 31

Estimating F – 8-point algorithm

  • The fundamental matrix F is defined by
  • Fx

x'

for any pair of matches x and x’ in two images.

  • Let x=(u,v,1)T and x’=(u’,v’,1)T,
  • 33

32 31 23 22 21 13 12 11

f f f f f f f f f F

each match gives a linear equation

' ' ' ' ' '

33 32 31 23 22 21 13 12 11

  • f

vf uf f v f vv f uv f u f vu f uu

slide-32
SLIDE 32

8-point algorithm

1 ´ ´ ´ ´ ´ ´ 1 ´ ´ ´ ´ ´ ´ 1 ´ ´ ´ ´ ´ ´

33 32 31 23 22 21 13 12 11 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1

  • f

f f f f f f f f v u v v v v u u u v u u v u v v v v u u u v u u v u v v v v u u u v u u

n n n n n n n n n n n n

  • Like with homographies, instead of solving ,
  • Af = 0

We want to solve the linear system: But, this has a trivial solution of f = 0.

slide-33
SLIDE 33

8-point algorithm

1 ´ ´ ´ ´ ´ ´ 1 ´ ´ ´ ´ ´ ´ 1 ´ ´ ´ ´ ´ ´

33 32 31 23 22 21 13 12 11 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1

  • f

f f f f f f f f v u v v v v u u u v u u v u v v v v u u u v u u v u v v v v u u u v u u

n n n n n n n n n n n n

  • Like with homographies, instead of solving ,
  • Af = 0

We want to solve the linear system: The solution f is the eigenvector corresponding to the zero eigenvalue of ATA

slide-34
SLIDE 34

8-point algorithm – Problem?

  • F should have rank 2
  • To enforce that F is of rank 2, F is replaced by F’ that

minimizes subject to the rank constraint.

' F F

  • This is achieved by SVD. Let , where

, let then is the solution.

  • V

U F Σ

  • 3

2 1

Σ

  • Σ'

2 1

  • V

U F Σ' '

slide-35
SLIDE 35

1 ´ ´ ´ ´ ´ ´ 1 ´ ´ ´ ´ ´ ´ 1 ´ ´ ´ ´ ´ ´

33 32 31 23 22 21 13 12 11 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1

  • f

f f f f f f f f v u v v v v u u u v u u v u v v v v u u u v u u v u v v v v u u u v u u

n n n n n n n n n n n n

  • Problem with 8-point algorithm

~10000 ~10000 ~10000 ~10000 ~100 ~100 1 ~100 ~100

!

Orders of magnitude difference between column of data matrix least-squares yields poor results

slide-36
SLIDE 36

Normalized 8-point algorithm

(0,0) (700,500) (700,0) (0,500) (1,-1) (0,0) (1,1) (-1,1) (-1,-1)

  • 1

1 500 2 1 700 2

normalized least squares yields good results Transform image to ~[-1,1]x[-1,1]

slide-37
SLIDE 37

Normalized 8-point algorithm

  • 1. Transform input by ,
  • 2. Call 8-point on to obtain

3.

i i

Tx x ˆ

' i ' i

Tx x ˆ

' i i x

x ˆ , ˆ T F T F ˆ

Τ

'

  • F

ˆ

  • Fx

x' ˆ ' ˆ

1

  • x

FT T x' F ˆ

slide-38
SLIDE 38

What about more than two views?

  • The geometry of three views is described by a

3 x 3 x 3 tensor called the trifocal tensor

  • The geometry of four views is described by a

3 x 3 x 3 x 3 tensor called the quadrifocal tensor

  • After this it starts to get complicated…
slide-39
SLIDE 39

Structure from motion

  • Given many images, how can we

a) figure out where they were all taken from? b) build a 3D model of the scene? This is (roughly) the structure from motion problem

slide-40
SLIDE 40

Structure from motion

  • Input: images with points in correspondence

pi,j = (ui,j,vi,j)

  • Output
  • structure: 3D location xi for each point pi
  • motion: camera parameters Rj, tj possibly Kj
  • Objective function: minimize reprojection error

Reconstruction (side) (top)

slide-41
SLIDE 41

Also doable from video

slide-42
SLIDE 42

What we’ve seen so far…

  • 2D transformations between images

– Translations, affine transformations, homographies…

  • Fundamental matrices

– Still represent relationships between 2D images

  • What’s new: Explicitly representing 3D

geometry of cameras and points

slide-43
SLIDE 43

Input

slide-44
SLIDE 44

Camera calibration and triangulation

  • Suppose we know 3D points

– And have matches between these points and an image – How can we compute the camera parameters?

  • Suppose we have know camera parameters,

each of which observes a point

– How can we compute the 3D location of that point?

slide-45
SLIDE 45

Structure from motion

  • SfM solves both of these problems at once
  • A kind of chicken-and-egg problem

– (but solvable)

slide-46
SLIDE 46

Feature detection

Detect features using SIFT [Lowe, IJCV 2004]

slide-47
SLIDE 47

Feature matching

Match features between each pair of images

slide-48
SLIDE 48

Feature matching

Refine matching using RANSAC to estimate fundamental matrix between each pair

slide-49
SLIDE 49

Image connectivity graph

(graph layout produced using the Graphviz toolkit: http://www.graphviz.org/)

slide-50
SLIDE 50

Correspondence estimation

  • Link up pairwise matches to form connected components of

matches across several images

Image 1 Image 2 Image 3 Image 4

slide-51
SLIDE 51
slide-52
SLIDE 52

Structure from motion

  • Minimize sum of squared reprojection errors:
  • Minimizing this function is called bundle

adjustment

– Optimized using non-linear least squares, e.g. Levenberg-Marquardt

predicted image location

  • bserved

image location indicator variable: is point i visible in image j ?

slide-53
SLIDE 53

Is SfM always uniquely solvable?

slide-54
SLIDE 54

Is SfM always uniquely solvable?

  • No…
slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59
slide-60
SLIDE 60

Building Rome in a Day Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, Richard Szeliski

slide-61
SLIDE 61

Building Rome in a Day Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, Richard Szeliski

slide-62
SLIDE 62

Building Rome in a Day Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, Richard Szeliski

slide-63
SLIDE 63
slide-64
SLIDE 64