[PDF] - 6.869 Advances in Computer Vision Prof. Bill Freeman March 1, PDF Document

SLIDE 1

1

6.869 Advances in Computer Vision

Prof. Bill Freeman

March 1, 2005

2 3

Local Features

Matching points across images important for:

bject identification (instance recognition)
bject (class) recognition

pose estimation stereo (3-d shape) motion estimate stitching together photographs into a mosaic etc

4

Today

Interesting points, correspondence. Scale and rotation invariant descriptors [Lowe]

5

Correspondence using window matching

Points are highly individually ambiguous… More unique matches are possible with small regions of image.

6

Correspondence using window matching

error disparity Left Right scanline

Criterion function:

SLIDE 2

2

7

Sum of Squared (Pixel) Differences

Left Right

L

w

R

w

L

I

R

I

∑

∈

− − = + ≤ ≤ − + ≤ ≤ − =

) , ( ) , ( 2 2 2 2 2

)] , ( ) , ( [ ) , , ( : disparity

f

function a as difference intensity the measures cost SSD The } , | , { ) , ( : function window the define We pixels.

f

windows by ing correspond are and

y x W v u R L r m m m m m R L

m

v d u I v u I d y x C y v y x u x v u y x W m m w w

L

w

R

w ) , (

L L y

x ) , (

L L

y d x −

m m 8

Image Normalization

Even when the cameras are identical models, there

can be differences in gain and sensitivity.

The cameras do not see exactly the same surfaces,

so their overall light levels can differ.

For these reasons and more, it is a good idea to

normalize the pixels in each window:

pixel Normalized ) , ( ) , ( ˆ magnitude Window )] , ( [ pixel Average ) , (

) , ( ) , ( ) , ( 2 ) , ( ) , ( ) , ( ) , ( 1 y x W y x W v u y x W y x W v u y x W

m m m m m

I I I y x I y x I v u I I v u I I − − = = =

∑ ∑

∈ ∈ 9

Images as Vectors

Left Right

L

w

R

w

m m L

w

L

w

row 1 row 2 row 3

m m m

“Unwrap” image to form vector, using raster scan order

Each window is a vector in an m2 dimensional vector space. Normalization makes them unit length.

10

Image windows as vectors

11

Possible metrics

L

w ) (d wR

Distance? Angle?

12

Image Metrics

L

w ) (d wR

2 ) , ( ) , ( 2 SSD

) ( )] , ( ˆ ) , ( ˆ [ ) ( d w w v d u I v u I d C

R L y x W v u R L

m

− = − − =

∑

∈

(Normalized) Sum of Squared Differences Normalized Correlation

θ cos ) ( ) , ( ˆ ) , ( ˆ ) (

) , ( ) , ( NC

= ⋅ = − =

∑

∈

d w w v d u I v u I d C

R L y x W v u R L

m

) ( max arg ) ( min arg

2 *

d w w d w w d

R L d R L d

⋅ = − =

SLIDE 3

3

13

Local Features

Not all points are equally good for matching…

14 15 16

Aperture Problem and Normal Flow

17

Aperture Problem and Normal Flow

18

Aperture Problem and Normal Flow

SLIDE 4

4

19

Aperture Problem and Normal Flow

20

Aperture Problem and Normal Flow

21

Aperture Problem and Normal Flow

22

(Review) Differential approach: Optical flow constraint equation

) , , ( ) , , ( t y x I t t t v y t u x I = + + + δ δ δ

Brightness should stay constant as you track motion

) , , ( ) , , ( t y x I tI tI v tI u t y x I

t y x

= + + + δ δ δ

1st order Taylor series, valid for small t

δ

= + +

t y x

I vI uI

Constraint equation

“BCCE” - Brightness Change Constraint Equation

23

Aperture Problem and Normal Flow

=

∇

= + + U I I v I u I

t y x

r

The gradient constraint: Defines a line in the (u,v) space u v

I I I I u

t

∇ ∇ ∇ − =

⊥

Normal Flow:

24

Combining Local Constraints

u v

1 1 t

I U I − =

∇

2 2 t

I U I − =

∇

3 3 t

I U I − =

∇

etc.

SLIDE 5

5

25

Lucas-Kanade: Integrate gradients over a Patch

( )

∑

Ω ∈

+ + =

y x t y x

I v y x I u y x I v u E

, 2

) , ( ) , ( ) , (

Solve with: Assume a single velocity for all pixels within an image patch

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡

∑ ∑ ∑ ∑ ∑ ∑

t y t x y y x y x x

I I I I v u I I I I I I

2 2

( )

t T

I I U I I

∑ ∑

∇ − = ∇ ∇ r

On the LHS: sum of the 2x2 outer product tensor of the gradient vector

26

Local Patch Analysis

27

Selecting Good Features

What’s a “good feature”?

– Satisfies brightness constancy – Has sufficient texture variation – Does not have too much texture variation – Corresponds to a “real” surface patch – Does not deform too much over time

28

Good Features to Track

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡

∑ ∑ ∑ ∑ ∑ ∑

t y t x y y x y x x

I I I I v u I I I I I I

2 2

When is This Solvable?

A should be invertible
A should not be too small due to noise

– eigenvalues λ1 and λ2 of A should not be too small

A should be well-conditioned

– λ1/ λ2 should not be too large (λ1 = larger eigenvalue) Both conditions satisfied when min(λ1, λ2) > c

A u b =

29

Harris detector

⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡

∑ ∑ ∑ ∑

∈ ∈ ∈ ∈ W y x k k y W y x k k y k k x W y x k k y k k x W y x k k x

k k k k k k k k

y x I y x I y x I y x I y x I y x I

) , ( 2 ) , ( ) , ( ) , ( 2

)) , ( ( ) , ( ) , ( ) , ( ) , ( )) , ( ( Auto-correlation matrix

Auto-correlation matrix

– captures the structure of the local neighborhood – measure based on eigenvalues of this matrix

2 strong eigenvalues => interest point
1 strong eigenvalue

=> contour

0 eigenvalue

=> uniform region

Interest point detection

– threshold on the eigenvalues – local maximum for localization

30

Selecting Good Features

λ1 and λ2 are large

SLIDE 6

6

31

Selecting Good Features

large λ1, small λ2

32

Selecting Good Features

small λ1, small λ2

33

Today

Interesting points, correspondence. Scale and rotation invariant descriptors [Lowe]

CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features

David Lowe Computer Science Department University of British Columbia

35

Invariant Local Features

Image content is transformed into local feature

coordinates that are invariant to translation, rotation, scale, and other imaging parameters

SIFT Features

Advantages of invariant local features

Locality: features are local, so robust to
cclusion and clutter (no prior segmentation)
Distinctiveness: individual features can be

matched to a large database of objects

Quantity: many features can be generated for

even small objects

Efficiency: close to real-time performance
Extensibility: can easily be extended to wide

range of differing feature types, with each adding robustness

SLIDE 7

7

37

Scale invariance

Requires a method to repeatably select points in location and scale:

The only reasonable scale-space kernel is a Gaussian

(Koenderink, 1984; Lindeberg, 1994)

An efficient choice is to detect peaks in the difference of

Gaussian pyramid (Burt & Adelson, 1983; Crowley & Parker, 1984 – but examining more scales)

Difference-of-Gaussian with constant ratio of scales is a

close approximation to Lindeberg’s scale-normalized Laplacian (can be shown from the heat diffusion equation)

B l u r S u b t r a c t B l u r S u b t r a c t

38

Scale space processed one octave at a time

39

Key point localization

Detect maxima and minima of

difference-of-Gaussian in scale space

Fit a quadratic to surrounding

values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002)

Taylor expansion around point:
Offset of extremum (use finite

differences for derivatives):

B l u r S u b t r a c t

40

Select canonical orientation

Create histogram of local

gradient directions computed at selected scale

Assign canonical orientation

at peak of smoothed histogram

Each key specifies stable 2D

coordinates (x, y, scale,

rientation)

2π

41

Example of keypoint detection

Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach)

(a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures

42

SIFT vector formation

Thresholded image gradients are sampled over 16x16

array of locations in scale space

Create array of orientation histograms
8 orientations x 4x4 histogram array = 128 dimensions

SLIDE 8

8

43

Feature stability to noise

Match features after random change in image scale &
rientation, with differing levels of image noise
Find nearest neighbor in database of 30,000 features

44

Feature stability to affine change

Match features after random change in image scale &
rientation, with 2% image noise, and affine distortion
Find nearest neighbor in database of 30,000 features

45

Distinctiveness of features

Vary size of database of features, with 30 degree affine

change, 2% image noise

Measure % correct for single nearest neighbor match

46 47 48

A good SIFT features tutorial

http://www.cs.toronto.edu/~jepson/csc2503/tutSIFT04.pdf By Estrada, Jepson, and Fleet.

SLIDE 9

9

49

An application of SIFT features in my

wn research…

50

The couch potato project:

Learning from looking at images.

Bill Freeman, MIT Joint work with: Josef Sivic, Andrew Zisserman (Oxford); Bryan Russell (MIT), Alyosha Efros (CMU). December 18, 2004

51

What can you learn about

bject categories by simply

looking at images?

52

Labelled training databases

Labelling object classes in images is tedious, and can introduce biases.

53

Discover topics Find words Form histograms

54

SLIDE 10

10

55

SIFT (scale invariant feature transforms)

David Lowe, IJCV 2004

56

Visual words

Vector quantize SIFT descriptors to a

vocabulary of 2237 “visual words”.

Heuristic design of descriptors makes these

words somewhat invariant to:

– Lighting – 2-d Orientation – 3-d Viewpoint

57

Examples of visual words

58

More visual words

59

Polysemy—the same word with different meanings

60

Experiment E

SLIDE 11

11

61

Observation matrix – experiment E

Frame # Visual word # 13.8 % non-zero entries

62

Binarized observation matrix – experiment E

Frame # Visual word # 13.8 % non-zero entries

63 64

Example segmentations

000117 000306 001986 002359 001448 001567 010748 010758 Faces Motorbikes Airplanes Cars Background I Background II Background III Original images Segmentations All detected visual words 65 66

SLIDE 12

12

67 68 69 70 Faces Motorbikes Airplanes Cars Background I Background II Background III 71 72

SLIDE 13

13

73 Faces Motorbikes Airplanes Cars Background I Background II Background III 74