6.869 Advances in Computer Vision Prof. Bill Freeman March 1, - - PDF document

6 869 advances in computer vision
SMART_READER_LITE
LIVE PREVIEW

6.869 Advances in Computer Vision Prof. Bill Freeman March 1, - - PDF document

6.869 Advances in Computer Vision Prof. Bill Freeman March 1, 2005 1 2 Local Features Today Matching points across images important for: Interesting points, correspondence. object identification (instance recognition) object (class)


slide-1
SLIDE 1

1

1

6.869 Advances in Computer Vision

  • Prof. Bill Freeman

March 1, 2005

2 3

Local Features

Matching points across images important for:

  • bject identification (instance recognition)
  • bject (class) recognition

pose estimation stereo (3-d shape) motion estimate stitching together photographs into a mosaic etc

4

Today

Interesting points, correspondence. Scale and rotation invariant descriptors [Lowe]

5

Correspondence using window matching

Points are highly individually ambiguous… More unique matches are possible with small regions of image.

6

Correspondence using window matching

error disparity Left Right scanline

Criterion function:

slide-2
SLIDE 2

2

7

Sum of Squared (Pixel) Differences

Left Right

L

w

R

w

L

I

R

I

− − = + ≤ ≤ − + ≤ ≤ − =

) , ( ) , ( 2 2 2 2 2

)] , ( ) , ( [ ) , , ( : disparity

  • f

function a as difference intensity the measures cost SSD The } , | , { ) , ( : function window the define We pixels.

  • f

windows by ing correspond are and

y x W v u R L r m m m m m R L

m

v d u I v u I d y x C y v y x u x v u y x W m m w w

L

w

R

w ) , (

L L y

x ) , (

L L

y d x −

m m 8

Image Normalization

  • Even when the cameras are identical models, there

can be differences in gain and sensitivity.

  • The cameras do not see exactly the same surfaces,

so their overall light levels can differ.

  • For these reasons and more, it is a good idea to

normalize the pixels in each window:

pixel Normalized ) , ( ) , ( ˆ magnitude Window )] , ( [ pixel Average ) , (

) , ( ) , ( ) , ( 2 ) , ( ) , ( ) , ( ) , ( 1 y x W y x W v u y x W y x W v u y x W

m m m m m

I I I y x I y x I v u I I v u I I − − = = =

∑ ∑

∈ ∈ 9

Images as Vectors

Left Right

L

w

R

w

m m L

w

L

w

row 1 row 2 row 3

m m m

“Unwrap” image to form vector, using raster scan order

Each window is a vector in an m2 dimensional vector space. Normalization makes them unit length.

10

Image windows as vectors

11

Possible metrics

L

w ) (d wR

Distance? Angle?

12

Image Metrics

L

w ) (d wR

2 ) , ( ) , ( 2 SSD

) ( )] , ( ˆ ) , ( ˆ [ ) ( d w w v d u I v u I d C

R L y x W v u R L

m

− = − − =

(Normalized) Sum of Squared Differences Normalized Correlation

θ cos ) ( ) , ( ˆ ) , ( ˆ ) (

) , ( ) , ( NC

= ⋅ = − =

d w w v d u I v u I d C

R L y x W v u R L

m

) ( max arg ) ( min arg

2 *

d w w d w w d

R L d R L d

⋅ = − =

slide-3
SLIDE 3

3

13

Local Features

Not all points are equally good for matching…

14 15 16

Aperture Problem and Normal Flow

17

Aperture Problem and Normal Flow

18

Aperture Problem and Normal Flow

slide-4
SLIDE 4

4

19

Aperture Problem and Normal Flow

20

Aperture Problem and Normal Flow

21

Aperture Problem and Normal Flow

22

(Review) Differential approach: Optical flow constraint equation

) , , ( ) , , ( t y x I t t t v y t u x I = + + + δ δ δ

Brightness should stay constant as you track motion

) , , ( ) , , ( t y x I tI tI v tI u t y x I

t y x

= + + + δ δ δ

1st order Taylor series, valid for small t

δ

= + +

t y x

I vI uI

Constraint equation

“BCCE” - Brightness Change Constraint Equation

23

Aperture Problem and Normal Flow

=

= + + U I I v I u I

t y x

r

The gradient constraint: Defines a line in the (u,v) space u v

I I I I u

t

∇ ∇ ∇ − =

Normal Flow:

24

Combining Local Constraints

u v

1 1 t

I U I − =

2 2 t

I U I − =

3 3 t

I U I − =

etc.

slide-5
SLIDE 5

5

25

Lucas-Kanade: Integrate gradients over a Patch

( )

Ω ∈

+ + =

y x t y x

I v y x I u y x I v u E

, 2

) , ( ) , ( ) , (

Solve with: Assume a single velocity for all pixels within an image patch

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡

∑ ∑ ∑ ∑ ∑ ∑

t y t x y y x y x x

I I I I v u I I I I I I

2 2

( )

t T

I I U I I

∑ ∑

∇ − = ∇ ∇ r

On the LHS: sum of the 2x2 outer product tensor of the gradient vector

26

Local Patch Analysis

27

Selecting Good Features

  • What’s a “good feature”?

– Satisfies brightness constancy – Has sufficient texture variation – Does not have too much texture variation – Corresponds to a “real” surface patch – Does not deform too much over time

28

Good Features to Track

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡

∑ ∑ ∑ ∑ ∑ ∑

t y t x y y x y x x

I I I I v u I I I I I I

2 2

When is This Solvable?

  • A should be invertible
  • A should not be too small due to noise

– eigenvalues λ1 and λ2 of A should not be too small

  • A should be well-conditioned

– λ1/ λ2 should not be too large (λ1 = larger eigenvalue) Both conditions satisfied when min(λ1, λ2) > c

A u b =

29

Harris detector

⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡

∑ ∑ ∑ ∑

∈ ∈ ∈ ∈ W y x k k y W y x k k y k k x W y x k k y k k x W y x k k x

k k k k k k k k

y x I y x I y x I y x I y x I y x I

) , ( 2 ) , ( ) , ( ) , ( 2

)) , ( ( ) , ( ) , ( ) , ( ) , ( )) , ( ( Auto-correlation matrix

  • Auto-correlation matrix

– captures the structure of the local neighborhood – measure based on eigenvalues of this matrix

  • 2 strong eigenvalues => interest point
  • 1 strong eigenvalue

=> contour

  • 0 eigenvalue

=> uniform region

  • Interest point detection

– threshold on the eigenvalues – local maximum for localization

30

Selecting Good Features

λ1 and λ2 are large

slide-6
SLIDE 6

6

31

Selecting Good Features

large λ1, small λ2

32

Selecting Good Features

small λ1, small λ2

33

Today

Interesting points, correspondence. Scale and rotation invariant descriptors [Lowe]

CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features

David Lowe Computer Science Department University of British Columbia

35

Invariant Local Features

  • Image content is transformed into local feature

coordinates that are invariant to translation, rotation, scale, and other imaging parameters

SIFT Features

Advantages of invariant local features

  • Locality: features are local, so robust to
  • cclusion and clutter (no prior segmentation)
  • Distinctiveness: individual features can be

matched to a large database of objects

  • Quantity: many features can be generated for

even small objects

  • Efficiency: close to real-time performance
  • Extensibility: can easily be extended to wide

range of differing feature types, with each adding robustness

slide-7
SLIDE 7

7

37

Scale invariance

Requires a method to repeatably select points in location and scale:

  • The only reasonable scale-space kernel is a Gaussian

(Koenderink, 1984; Lindeberg, 1994)

  • An efficient choice is to detect peaks in the difference of

Gaussian pyramid (Burt & Adelson, 1983; Crowley & Parker, 1984 – but examining more scales)

  • Difference-of-Gaussian with constant ratio of scales is a

close approximation to Lindeberg’s scale-normalized Laplacian (can be shown from the heat diffusion equation)

B l u r S u b t r a c t B l u r S u b t r a c t

38

Scale space processed one octave at a time

39

Key point localization

  • Detect maxima and minima of

difference-of-Gaussian in scale space

  • Fit a quadratic to surrounding

values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002)

  • Taylor expansion around point:
  • Offset of extremum (use finite

differences for derivatives):

B l u r S u b t r a c t

40

Select canonical orientation

  • Create histogram of local

gradient directions computed at selected scale

  • Assign canonical orientation

at peak of smoothed histogram

  • Each key specifies stable 2D

coordinates (x, y, scale,

  • rientation)

41

Example of keypoint detection

Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach)

(a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures

42

SIFT vector formation

  • Thresholded image gradients are sampled over 16x16

array of locations in scale space

  • Create array of orientation histograms
  • 8 orientations x 4x4 histogram array = 128 dimensions
slide-8
SLIDE 8

8

43

Feature stability to noise

  • Match features after random change in image scale &
  • rientation, with differing levels of image noise
  • Find nearest neighbor in database of 30,000 features

44

Feature stability to affine change

  • Match features after random change in image scale &
  • rientation, with 2% image noise, and affine distortion
  • Find nearest neighbor in database of 30,000 features

45

Distinctiveness of features

  • Vary size of database of features, with 30 degree affine

change, 2% image noise

  • Measure % correct for single nearest neighbor match

46 47 48

A good SIFT features tutorial

http://www.cs.toronto.edu/~jepson/csc2503/tutSIFT04.pdf By Estrada, Jepson, and Fleet.

slide-9
SLIDE 9

9

49

An application of SIFT features in my

  • wn research…

50

The couch potato project:

Learning from looking at images.

Bill Freeman, MIT Joint work with: Josef Sivic, Andrew Zisserman (Oxford); Bryan Russell (MIT), Alyosha Efros (CMU). December 18, 2004

51

What can you learn about

  • bject categories by simply

looking at images?

52

Labelled training databases

Labelling object classes in images is tedious, and can introduce biases.

53

Discover topics Find words Form histograms

54

slide-10
SLIDE 10

10

55

SIFT (scale invariant feature transforms)

David Lowe, IJCV 2004

56

Visual words

  • Vector quantize SIFT descriptors to a

vocabulary of 2237 “visual words”.

  • Heuristic design of descriptors makes these

words somewhat invariant to:

– Lighting – 2-d Orientation – 3-d Viewpoint

57

Examples of visual words

58

More visual words

59

Polysemy—the same word with different meanings

60

Experiment E

slide-11
SLIDE 11

11

61

Observation matrix – experiment E

Frame # Visual word # 13.8 % non-zero entries

62

Binarized observation matrix – experiment E

Frame # Visual word # 13.8 % non-zero entries

63 64

Example segmentations

000117 000306 001986 002359 001448 001567 010748 010758 Faces Motorbikes Airplanes Cars Background I Background II Background III Original images Segmentations All detected visual words 65 66

slide-12
SLIDE 12

12

67 68 69 70 Faces Motorbikes Airplanes Cars Background I Background II Background III 71 72

slide-13
SLIDE 13

13

73 Faces Motorbikes Airplanes Cars Background I Background II Background III 74