C18 Computer Vision Lecture 8 Recovering 3D from two images II: - - PowerPoint PPT Presentation

c18 computer vision
SMART_READER_LITE
LIVE PREVIEW

C18 Computer Vision Lecture 8 Recovering 3D from two images II: - - PowerPoint PPT Presentation

C18 Computer Vision Lecture 8 Recovering 3D from two images II: stereo correspondences, triangulation. Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor Computer Vision: This time 5. Imaging geometry, camera calibration. 6.


slide-1
SLIDE 1

Victor Adrian Prisacariu

http://www.robots.ox.ac.uk/~victor

C18 Computer Vision

Lecture 8

Recovering 3D from two images II: stereo correspondences, triangulation.

slide-2
SLIDE 2

Computer Vision: This time…

  • 5. Imaging geometry, camera calibration.
  • 6. Salient feature detection and description.
  • 7. Recovering 3D from two images I: epipolar

geometry. 8.

  • 8. Recoveri

ring 3D fr from tw two im images II: II: stereo corr rrespondences, tria riangula latio ion, neural l nets.

1. The Correspondence Problem 2. Triangulation for Depth Recovery. 3. 1- views? 4. 2+ views?.

slide-3
SLIDE 3

Reconstruction from two views

  • We have seen that because the projection onto a

second camera of the back-projected ray from the first is a straight line, the search for potential matches lies (mostly) across epipolar lines.

  • Search occurs (mostly ☺) in 1D, which must be less

demanding than 2D search throughout the image.

  • We have seen how knowledge of the camera matrices

allows us to recover the fundamental matrix, which defines this epipolar geometry.

slide-4
SLIDE 4

What next?

  • The key tasks now are:
  • 1. Determining which points in the images are from the

same scene location - the correspondence problem.

– We will discuss how to obtain sparse and dense correspondences.

  • 2. Determining the 3D structure by back-projecting rays,
  • r tria

triangulation.

– Although triangulation seems straightforward, it turns out to be a non-trivial estimation problem to which there are several approaches.

slide-5
SLIDE 5

8.1 Establishing Correspondence

  • With knowledge of the fundamental matrix, we

are able to reduce the search for correspondence down to a search along the epipolar line 𝐦′ = 𝑮𝐲.

  • Correspondence algorithms can be either:

– Sparse: correspondence is sought for individual corner

  • r edge features; or

– Dense: correspondences are sought for all the pixels in an image.

  • We will use both methods.
slide-6
SLIDE 6

Sparse Correspondences

  • Aim to match the points we found in lecture 2

(corners and/or SIFTs).

  • Outline:
  • 1. Extract image points – same as in L2.
  • 2. Obtain initial corner matches using local

descriptors.

  • 3. Remove outliers and estimate the fundamental

matrix 𝑮 using RANSAC.

  • 4. Obtain further corner matches using 𝑮.
slide-7
SLIDE 7

Why salient points, especially?

  • They make the algorithm faster, more efficient

and sometimes more robust.

  • Salient points are:

– relatively sparse; – reasonably cheap to compute; – well-localized; – appear quite robustly from frame to frame.

slide-8
SLIDE 8

Initial Matching

  • Extract salient points (SIFT, Harris, etc.) in both

images (feature detection).

  • For each corner 𝐲 in C, make a list of potential

matches 𝐲′ in a region in C′ around 𝐲 (heuristic).

  • Rank the matches by comparing the regions

around the corners using e.g. the L2 norm of the SIFT descriptor or cross-correlation (see later).

  • Process them to reconcile forward-backward

inconsistencies.

  • The idea here is to not to do too much work —

just enough to get some good matches.

slide-9
SLIDE 9

Initial Matching (example using Harris Corners)

slide-10
SLIDE 10

Initial Matching

Matches — some good matches, some mismatches. Can still compute 𝑮 with around 50% mismatches. How?

slide-11
SLIDE 11

RANSAC – RANdom SAmple Concensus

  • Suppose you tried to fit a straight line to data containing out
  • utli

liers — points which are not properly described by the assumed probability distribution.

  • The usual methods of least squares are hopelessly corrupted.
  • Need to detect outliers and exclude them.
  • Use estimation based on robu
  • bust statis

istic ics RANSAC was the first, devised by vision researchers, Fischler & Bolles (1981).

slide-12
SLIDE 12

RANSAC algorithm for lines

  • 1. For many repeated trials:

1. Select a random sample of tw two points. 2. Fit a line through them.

  • 3. Count how many other points are within a threshold

distance of the line (in inliers).

  • 2. Select the line with the largest number of inliers.
  • 3. Refine the line by fitting it to all the inliers (using

least squares).

Remarks:

  • Sample a min

inimal set t of f poin ints for your problem (2 for lines).

  • Repeat such that there is a high chance that at least
  • ne minimal set contains only inliers (see tutorial

sheet).

slide-13
SLIDE 13

RANSAC algorithm for lines

slide-14
SLIDE 14

RANSAC algorithm for lines

slide-15
SLIDE 15

RANSAC algorithm for 𝑮

  • 1. For many repeated trials:
  • 1. Select a random sample of se

seven poin ints.

  • 2. Compute 𝑮 using the 7 point method.
  • 3. Count how many other correspondences are

within threshold distance of the epipolar lines (inlie inliers).

  • 2. Select the 𝑮 with the largest number of

inliers.

  • 3. Refine 𝑮 by fitting it to all the inliers (using

the SVD method).

slide-16
SLIDE 16

RANSAC algorithm for 𝑰

  • 1. For many repeated trials:

1. Select a random sample of four points. 2. Compute 𝑰 as in B14.

  • 3. Count how many other correspondences are within

threshold distance of the epipolar lines (in inliers).

  • 2. Select the 𝑰 with the largest number of inliers.
  • 3. Refine 𝑰 by fitting it to all the inliers, optimizing

the reprojection error:

min

𝑰

𝐲,𝐲′ ∈inliers

𝑒2 𝐲′, 𝑰𝐲 + 𝑒2(𝑰−1𝐲′, 𝐲)

slide-17
SLIDE 17

RANSAC Song

slide-18
SLIDE 18

Correspondences consistent with epipolar geometry

slide-19
SLIDE 19

A Virtual Compass

slide-20
SLIDE 20

Fyuse

slide-21
SLIDE 21

Dense Correspondences

  • Aim to match all

ll poin ints between the two vie iews.

  • Useful for e.g. 3D reconstruction, augmented

reality, etc.

  • First we will explore how to adjust the epipolar

geometry for convenience a process called image rectification.

slide-22
SLIDE 22

Adjusting epipolar geometry: Rectification

New optical axes are chosen to be coplanar and perpendicular to the base line, and the new image planes set with the same intrinsic matrix 𝑳rect.

slide-23
SLIDE 23

Rectification

  • The actual and rectified frames

differ by a rotation 𝑺 only. Obviously this must be rotation about the optic centre.

  • The scene point 𝐘 is projected

to 𝐲 in the actual image, and 𝐲rect in the rectified image.

  • What is the relationship

between them?

slide-24
SLIDE 24

Rectification

  • Allow 𝐘 to be defined in the actual camera frame. The projections

are: 𝐲 = 𝑳 𝑱 𝟏 𝐘 = 𝑳𝐘3×1 and 𝐲rect = 𝑳rect 𝑺 𝟏 𝐘 = 𝑳rect𝑺𝐘3×1

  • Eliminate 𝐘3×1 using 𝐘3×1 = 𝑳−1𝐲 (so 𝐘’s is actually irrelevant)

𝐲rect = 𝑳rect𝑺𝑳−1𝐲.

  • The 3 × 3 transformation is a homography 𝑰 = 𝑳rect𝑺𝑳−1 - a

plane to plane mapping through the projection center.

slide-25
SLIDE 25

A correlation-based approach

  • The basic assumption here is

that the image bands around two corresponding epipolar lines are similar.

  • The example, for || cameras,

shows that this is ~ the case, though there are regions of noise, ambiguity, and occlusion.

  • In Lecture 2, we saw that auto-

correlation provides a method

  • f quantifying the self-similarity
  • f image patches. Here we use

cr cross ss-correlati tion.

slide-26
SLIDE 26

Zero-Normalized cross-correlation

  • Images are from different cameras with potentially different

global gains and offsets.

  • Model by 𝐽′ = 𝛽𝐽 + 𝛾 and use zero-normalized cross-

correlation.

  • Step 1: Set source patch 𝐵 around 𝐲.
  • Step 2: Subtract the patch mean 𝐵𝑗𝑘 ← 𝐵𝑗𝑘 − 𝜈𝐵.
  • Step 3: For each 𝐲′ on the epipolar line:

– Generate patch 𝐶 and 𝐶𝑗𝑘 ← 𝐶𝑗𝑘 − 𝜈𝐶 – Compute NCC 𝐲, 𝐲′ =

σ𝑗 σ𝑘 𝐵𝑗𝑘𝐶𝑗𝑘 σ𝑗 σ𝑘 𝐵𝑗𝑘

2

σ𝑗 σ𝑘 𝐶𝑗𝑘

2

  • Useful analogy: Imagine the regions as vectors 𝐵 → 𝐛 and

𝐶 → 𝐜. . NCC ≡

𝐛⋅𝐜 𝐛 𝐜 , a scale product of two unit vectors, so

− 1 ≤ NCC ≤ 1.

slide-27
SLIDE 27

Examples

slide-28
SLIDE 28

Examples

Why is cross-correlation a poor measure in Example 2?

  • The neighbourhood region does not have a distinctive spatial
  • intensity. Weak aut

auto-correlatio ion → am ambig iguous cr cross-corr rrela latio ion.

  • Foreshortening effects perspective distortion. Correlation assumes

minim inimal ap appearance ch change which favours fronto-parallel surfaces.

slide-29
SLIDE 29

Outline of a dense correspondence algorithm

Algorit ithm 1. Rectify images in C and C’. 2. For each pixel 𝐲 in image C:

  • Compute NCC for each pixel

𝐲′ along epipolar line 𝐦′ in image C′.

  • Choose the match 𝐲′ with the

highest NCC. Par arameters to

  • adju

adjust

  • size of neighbourhood patches
  • limit on maximum disparity

(𝐲′ − 𝐲) Con

  • nstraints to
  • ap

apply

  • uniqueness of match.
  • match ordering.
  • smoothness of disparity field

(disparity gradient limit).

  • figural continuity.

Limit itatio ions

  • scene must be textured, and
  • largely fronto-parallel (related

to d.g. limit).

slide-30
SLIDE 30

Example: Left and Right Images

slide-31
SLIDE 31

Results: Left Image and 3D Range Map

slide-32
SLIDE 32

Example: Full Sequence

slide-33
SLIDE 33

The “Other Constraints” on correspondence

  • Un

Uniq iqueness s of

  • f match: Promote 1-to-1 matches, but there can be 0-to-1 and 1-to-0

because of occlusion.

  • Or

Ordering: From a continuous opaque surface it is not possible to change the match

  • rder.
  • Di

Disparity sm smoo

  • othness

ss: This is stronger than the previous constraint and favours smooth surfaces with no sudden changes in depth.

  • Fi

Figural continuit ity: The disparity field along an epipolar line should not be much different from that along a neighbouring epipolar line.

  • There are lots of ways to impose these, by e.g. discrete optimisation or dynamic

programming.

slide-34
SLIDE 34

8.2 Triangulation: bursting into 3D

We have:

  • Exact projection matrices

𝑸 and 𝑸′

  • Potentially noisy 𝐲 ↔ 𝐲′

We want the 3D point 𝐘

Many ways, we explore only two here

slide-35
SLIDE 35

Approach 1: Linear Triangulation

  • We use the equations 𝐲 = 𝑸𝐘 and 𝐲’ = 𝑸’𝐘 to solve for X.

X.

  • We first write:

𝑸 = 𝑞11 𝑞12 𝑞13 𝑞14 𝑞21 𝑞22 𝑞23 𝑞24 𝑞31 𝑞32 𝑞33 𝑞34 = 𝐪1T 𝐪2T 𝐪3T

  • Eliminate unknown scale 𝜇𝐲 = 𝑸𝐘, by forming a cross product 𝐲 × 𝑸𝐘 =
  • 0. The three components are:

𝑦 𝐪3T𝐘 − 𝐪1T𝐘 = 0 𝑧 𝐪3T𝐘 − 𝐪2T𝐘 = 0 𝑦 𝐪2T𝐘 − 𝑧 𝐪1T𝐘 = 0

  • Rearrange just the first two as:

𝑦𝐪3T − 𝐪1T 𝑧𝐪3T − 𝐪2T 𝑌4×1 = 𝟏2×1

slide-36
SLIDE 36

Linear Triangulation

  • For the second camera we can write the same:

𝑦′𝐪′3T − 𝐪′1T 𝑧′𝐪′3T − 𝐪′2T 𝑌4×1 = 𝟏2×1

  • Put all in joint system to obtain 𝑩4×4𝐘4×1 = 𝟏4×1, where:

𝑩4×4𝐘4×1 = 𝑦𝐪3T − 𝐪1T 𝑧𝐪3T − 𝐪2T 𝑦′𝐪′3T − 𝐪′1T 𝑧′𝐪′3T − 𝐪′2T 𝑌4×1 = 𝟏4×1

  • Solve as null space problem or least squares if we have more

than two views – sa same math th as s in in th the 𝑮 matr trix case se.

slide-37
SLIDE 37

Approach 2: Minimizing a geometrical/statistical error

  • Project estimated scene position: ො

𝐲 = 𝑸෡ 𝐘 and ො 𝐲′ = 𝑸′෡ 𝐘.

  • Measure Euclidean displacements: 𝑒(𝐲, ො

𝐲) and 𝑒(𝐲′, ො 𝐲′).

  • Compute the cost 𝐷 ෡

𝐘 = 𝑒2 𝐲, ො 𝐲 + 𝑒2(𝐲′, ො 𝐲′).

  • Adjust ෡

𝐘 to minimise the cost.

  • If the measurement noise is a zero-mean Gaussian

𝑂(0, 𝜏2), then this is the Maxim imum Lik ikelihood Estim timator of 𝒀.

slide-38
SLIDE 38

8.3 1- views? Depth from Neural Net

Predicting Dep Depth, , Surf Surfac ace Norm

  • rmal

als and and Sem Semantic Lab Labels with ith a a Common Mul ulti-Scale Con

  • nvolutional

al Ar Architecture David Eigen, Rob Fergus, ICCV 2015

slide-39
SLIDE 39

8.4 2+ views?: Structure-from-Motion

slide-40
SLIDE 40

Semantic Large Scale Dense SLAM

slide-41
SLIDE 41

Summary of Lecture 8

The Correspondence Problem:

  • Sparse Correspondences.
  • RANSAC.
  • Rectification.
  • Zero Normalized cross correlation.
  • Outline of a dense stereo algorithm.

Triangulation for Depth Recovery. Others: Depth from Neural Nets, Structure-from- Motion