Feature Descriptors Computer Vision Fall 2018 Columbia University - - PowerPoint PPT Presentation

feature descriptors
SMART_READER_LITE
LIVE PREVIEW

Feature Descriptors Computer Vision Fall 2018 Columbia University - - PowerPoint PPT Presentation

Feature Descriptors Computer Vision Fall 2018 Columbia University Tali Dekel Tuesday, October 2, 11am, CEPSR 620 http://people.csail.mit.edu/talidekel Seam Carving Seam carving: main idea Content-aware resizing Traditional resizing [Shai


slide-1
SLIDE 1

Feature Descriptors

Computer Vision Fall 2018 Columbia University

slide-2
SLIDE 2

Tali Dekel

Tuesday, October 2, 11am, CEPSR 620 http://people.csail.mit.edu/talidekel

slide-3
SLIDE 3

Seam Carving

slide-4
SLIDE 4

Content-aware resizing Traditional resizing

Seam carving: main idea

[Shai & Avidan, SIGGRAPH 2007]

slide-5
SLIDE 5

Seam Carving

slide-6
SLIDE 6

[Shai & Avidan, SIGGRAPH 2007]

Seam carving: main idea

slide-7
SLIDE 7

Seam Carving

slide-8
SLIDE 8

Let a vertical seam s consist of h positions that form an 8- connected path. Let the cost of a seam be: Optimal seam minimizes this cost: Compute it efficiently with dynamic programming.

Seam carving: algorithm

=

=

h i i

s f Energy Cost

1

)) ( ( ) (s

) ( min * s s

s Cost

=

s1 s2 s3 s4 s5

= ) ( f Energy

Slide credit: Kristen Grauman

slide-9
SLIDE 9

How to identify the minimum cost seam?

  • First, consider a greedy approach:

6 2 5 9 8 2 3 1

Energy matrix (gradient magnitude)

Slide credit: Kristen Grauman

slide-10
SLIDE 10

row i-1

Seam carving: algorithm

  • Compute the cumulative minimum energy for all possible

connected seams at each entry (i,j):

  • Then, min value in last row of M indicates end of the

minimal connected vertical seam.

  • Backtrack up from there, selecting min of 3 above in M.

( )

) 1 , 1 ( ), , 1 ( ), 1 , 1 ( min ) , ( ) , ( + − − − − + = j i j i j i j i Energy j i M M M M

j-1 j row i M matrix: cumulative min energy (for vertical seams) Energy matrix (gradient magnitude) j j+1

Slide credit: Kristen Grauman

slide-11
SLIDE 11

Example

6 2 5 9 8 2 3 1

Energy matrix (gradient magnitude) M matrix (for vertical seams)

14 5 8 9 8 3 3 1

( )

) 1 , 1 ( ), , 1 ( ), 1 , 1 ( min ) , ( ) , ( + − − − − + = j i j i j i j i Energy j i M M M M

Slide credit: Kristen Grauman

slide-12
SLIDE 12

Example

6 2 5 9 8 2 3 1

Energy matrix (gradient magnitude) M matrix (for vertical seams)

14 5 8 9 8 3 3 1

( )

) 1 , 1 ( ), , 1 ( ), 1 , 1 ( min ) , ( ) , ( + − − − − + = j i j i j i j i Energy j i M M M M

Slide credit: Kristen Grauman

slide-13
SLIDE 13

Real image example

Original Image Energy Map

Blue = low energy Red = high energy

Slide credit: Kristen Grauman

slide-14
SLIDE 14

Seam Carving

slide-15
SLIDE 15

Original Resized

Why did it fail?

slide-16
SLIDE 16

Original Resized

Why did it fail?

slide-17
SLIDE 17

Feature Descriptors

slide-18
SLIDE 18

Core visual understanding task: finding correspondences between images

Source: Deva Ramanan

slide-19
SLIDE 19

Example: image matching of landmarks

Correspondence + geometry estimation

Source: Deva Ramanan

slide-20
SLIDE 20

Object recognition by matching

Sparse correspondence Dense corrrespondence

Source: Deva Ramanan

slide-21
SLIDE 21

Example: license plate recognition

Source: Deva Ramanan

slide-22
SLIDE 22

Example: product recognition

Source: Deva Ramanan

slide-23
SLIDE 23

Motivation

Which of these patches are easier to match? Why? How can we mathematically operationalize this?

Source: Deva Ramanan

slide-24
SLIDE 24

Corner Detector: Basic Idea

“flat” region:
 no change in any direction “edge”:
 no change along the edge direction “corner”:
 significant change in all directions

Defn: points are “matchable” if small shifts always produce a large SSD error

Source: Deva Ramanan

slide-25
SLIDE 25

Ex0,y0(u, v) =

  • (x,y)∈W (x0,y0)

[I(x + u, y + v) − I(x, y)]2

The math

W

where Defn: points are “matchable” if small shifts always produce a large SSD error

cornerness(x0, y0) = min

u,v Ex0,y0(u, v)

Why can’t this be right?

Source: Deva Ramanan

slide-26
SLIDE 26

Ex0,y0(u, v) =

  • (x,y)∈W (x0,y0)

[I(x + u, y + v) − I(x, y)]2

The math

W

where Defn: points are “matchable” if small shifts always produce a large SSD error

cornerness(x0, y0) = min

u,v Ex0,y0(u, v)

Why can’t this be right?

Source: Deva Ramanan

slide-27
SLIDE 27

Ex0,y0(u, v) =

  • (x,y)∈W (x0,y0)

[I(x + u, y + v) − I(x, y)]2

The math

W

where Defn: points are “matchable” if small shifts always produce a large SSD error

cornerness(x0, y0) = min

u,v Ex0,y0(u, v)

u2 + v2 = 1

Source: Deva Ramanan

slide-28
SLIDE 28

Background: taylor series expansion

f(x + u) = f(x) + ∂f(x) ∂x u + 1 2 ∂f(x) ∂xx u2 + Higher Order Terms

Approximation of f(x) = ex at x=0 Why are low-order expansions reasonable? Underyling smoothness of real-world signals

Source: Deva Ramanan

log(x + 1)

slide-29
SLIDE 29

Multivariate taylor series

I(x + u, y + v) = I(x, y) + h

∂I(x,y) ∂x ∂I(x,y) ∂y

i u v

  • +

1 2 ⇥u v⇤ " ∂I(x,y)

∂xx ∂I(x,y) ∂xy ∂I(x,y) ∂xy ∂I(x,y) ∂yy

# u v

  • + Higher Order Terms

gradient Hessian

I(x + u, y + v) ≈ I + Ixu + Iyv

Ix = ∂I(x, y) ∂x

where

Source: Deva Ramanan

slide-30
SLIDE 30

Consider shifting the window W by (u,v)

  • how do the pixels in W change?
  • compare each pixel before and after by


summing up the squared differences

  • this defines an “error” of E(u,v):

Feature detection: the math

W

E(u, v) = X

(x,y)∈W

[I(x + u, y + u) − I(x, y)]2 ≈ X

(x,y)∈W

[I + Ixu + Iyv − I]2 = X

(x,y)∈W

[I2

xu2 + I2 yv2 + 2IxIyuv]

= ⇥ u v ⇤ A  u v

  • ,

A = X

(x,y)∈W

 I2

x

IxIy IyIx I2

y

  • Source: Deva Ramanan
slide-31
SLIDE 31

The surface E(u,v) is locally approximated by a quadratic form. Let’s try to understand its shape.

Interpreting the second moment matrix

  • v

u M v u v u E ] [ ) , (

  • y

x y y x y x x

I I I I I I y x w M

, 2 2

) , (

James Hays

A = ∑

(x,y)∈W

A

slide-32
SLIDE 32

Consider a horizontal “slice” of E(u, v):

Interpreting the second moment matrix

This is the equation of an ellipse. const ] [

  • v

u M v u

James Hays

A

slide-33
SLIDE 33

Consider a horizontal “slice” of E(u, v):

Interpreting the second moment matrix

This is the equation of an ellipse.

R R M

  • 2

1 1

  • The axis lengths of the ellipse are determined by the eigenvalues,

and the orientation is determined by a rotation matrix 𝑆.

direction of the slowest change direction of the fastest change

(max)-1/2 (min)-1/2 const ] [

  • v

u M v u Diagonalization of M:

James Hays

A

A

A

slide-34
SLIDE 34

Classification of image points using eigenvalues of M 1 2 “Corner” 1 and 2 are large, 1 ~ 2; E increases in all

directions

1 and 2 are small; E is almost constant

in all directions

“Edge” 1 >> 2 “Edge” 2 >> 1 “Flat” region

Source: Deva Ramanan

slide-35
SLIDE 35

Back to corner(ness)

W

where Defn: points are “matchable” if small shifts always produce a large SSD error

Corner(x0, y0) = min

u2+v2=1 E(u, v)

E(u, v) = ⇥u v⇤ A u v

  • ,

A = X

(x,y)∈W (x0,y0)

 I2

x

IxIy IyIx I2

y

  • Solution is given by minimum eigenvalue

Implies (xo,yo) is a good corner if minimum eigenvalue is large

(or alternatively, if both eigenvalues of ‘A’ are large)

Source: Deva Ramanan

slide-36
SLIDE 36

– Det(A) = λminλmax – Trace(A) = λmin+λmax

Efficient computation

Computing eigenvalues (and eigenvectors) is expensive Turns out that it’s easy to compute their sum (trace) and product (determinant) (is proportional to the ratio of eigvenvalues and is 1 if they are equal) (also favors large eigenvalues) R = 4 Det(A) Trace(A)2

R = Det(A) − αTrace(A)2

(trace = sum of diagonal entries)

Source: Deva Ramanan

slide-37
SLIDE 37

Harris Corner Detector [Harris88]

  • 1. Compute image derivatives (optionally, blur first).
  • 2. Compute 𝑁 components

as squares of derivatives.

  • 3. Gaussian filter g() with width s

𝐽𝑦 𝐽𝑧 𝑕(𝐽𝑦

2)

𝑕(𝐽𝑧

2)

𝑕(𝐽𝑦 ∘ 𝐽𝑧)

  • 4. Compute cornerness

𝑆

  • 5. Threshold on 𝐷 to pick high cornerness
  • 6. Non-maxima suppression to pick peaks.

James Hays

  • 0. Input image

We want to compute M at each pixel. 𝐽 𝐽𝑦𝑧

𝐽𝑦

2

𝐽𝑧

2

𝐷 = det 𝑁 − 𝛽 trace 𝑁 2 = 𝑕 𝐽𝑦

2 ∘ 𝑕 𝐽𝑧 2 − 𝑕 𝐽𝑦 ∘ 𝐽𝑧 2

−𝛽 𝑕 𝐽𝑦

2 + 𝑕 𝐽𝑧 2 2

slide-38
SLIDE 38

Harris Detector: Steps

Source: Deva Ramanan

slide-39
SLIDE 39

Harris Detector: Steps

Compute corner response 𝐷

Source: Deva Ramanan

slide-40
SLIDE 40

Harris Detector: Steps

Find points with large corner response: 𝐷 > threshold

Source: Deva Ramanan

slide-41
SLIDE 41

Harris Detector: Steps

Take only the points of local maxima of 𝐷

Source: Deva Ramanan

slide-42
SLIDE 42

Harris Detector: Steps

Source: Deva Ramanan

slide-43
SLIDE 43

Scale and rotation invariance

Will interest point detector still fire on rotated & scaled images?

Source: Deva Ramanan

slide-44
SLIDE 44

Rotation invariance (?)

Are eigenvector stable under rotations? Are eigenvalues stable under rotations? No Yes

Source: Deva Ramanan

slide-45
SLIDE 45

Image rotation

Second moment ellipse rotates but its shape (i.e., eigenvalues) remains the same. Corner location is covariant w.r.t. rotation

James Hays

slide-46
SLIDE 46

Scale invariance?

Are eigenvector stable under scalings? Are eigenvalues stable under scalings? Yes No

Source: Deva Ramanan

slide-47
SLIDE 47

Scaling

All points will be classified as edges Corner

Corner location is not covariant to scaling!

James Hays

slide-48
SLIDE 48

Automatic Scale Selection

  • K. Grauman, B. Leibe

)) , ( ( )) , ( (

1 1

  • x

I f x I f

m m

i i i i

  • How to find patch sizes at which f response is equal?

What is a good f ?

slide-49
SLIDE 49

Automatic Scale Selection

  • Function responses for increasing scale (scale signature)
  • K. Grauman, B. Leibe

)) , ( (

1
  • x

I f

m

i i

)) , ( (

1

  • x

I f

m

i i

Response

  • f some

function f

slide-50
SLIDE 50

What Is A Useful Signature Function f ?

  • “Blob” detector is common for corners

– - Laplacian (2nd derivative) of Gaussian (LoG)

  • K. Grauman, B. Leibe

Image blob size Scale space Function response

slide-51
SLIDE 51

Find local maxima in position-scale space

  • K. Grauman, B. Leibe
  • 2

3 4 5 List of (x, y, s) Find maxima

slide-52
SLIDE 52

Approximate LoG with Difference-of-Gaussian (DoG).

  • 1. Blur image with σ Gaussian kernel
  • 2. Blur image with kσ Gaussian kernel
  • 3. Subtract 2. from 1.

Alternative approach

  • K. Grauman, B. Leibe
  • =
slide-53
SLIDE 53

Find local maxima in position-scale space of DoG

  • K. Grauman, B. Leibe
  • k

List of (x, y, s)

  • =

2k

Input image

  • =
  • =

… … k Find maxima

slide-54
SLIDE 54

Results: Difference-of-Gaussian

  • Larger circles = larger scale
  • Descriptors with maximal scale response
  • K. Grauman, B. Leibe
slide-55
SLIDE 55

Core visual understanding task: finding correspondences between images

Source: Deva Ramanan

slide-56
SLIDE 56

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe Computer Science Department University of British Columbia Vancouver, B.C., Canada lowe@cs.ubc.ca January 5, 2004

IJCV 04

48,547 citations!

SIFT

Scale Invariant Feature Transform

slide-57
SLIDE 57

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe Computer Science Department University of British Columbia Vancouver, B.C., Canada lowe@cs.ubc.ca January 5, 2004

IJCV 04

48,547 citations!

SIFT

Scale Invariant Feature Transform 48,563 citations!

slide-58
SLIDE 58

Coordinate frames

Represent each patch in a canonical scale and orientation (or general affine coordinate frame)

Source: Deva Ramanan

slide-59
SLIDE 59

Find dominant orientation

Compute gradients for all pixels in patch. Histogram (bin) gradients by orientation

Source: Deva Ramanan

slide-60
SLIDE 60

Appearance descriptors

Represent each patch in a canonical scale and orientation (or general affine coordinate frame)

Source: Deva Ramanan

slide-61
SLIDE 61

Computing the SIFT Descriptor

Histograms of gradient directions over spatial regions

\

Source: Deva Ramanan

slide-62
SLIDE 62

Post-processing

  • 1. Rescale 128-dim vector to have unit norm
  • 2. Clip high values

“invariant to linear scalings of intensity”

x = x ||x||, x ∈ R128

approximate binarization allows for for flat patches with small gradients to remain stable x := min(x, .2) x := x ||x||

Source: Deva Ramanan

slide-63
SLIDE 63

Evaluation

Historic problem in computer vision: “wide-baseline matching”

Source: Deva Ramanan

slide-64
SLIDE 64

SIFT

10 20 30 40 50 60 1 2 3 4 5 Correct nearest descriptor (%) Width n of descriptor (angle 50 deg, noise 4%) With 16 orientations With 8 orientations With 4 orientations

This graph shows the percent of keypoints giving the correct match to a data

What made this work? Exhaustive evaluation of hyper-parameters on annotated dataset

k

(a) (b)

Source: Deva Ramanan

slide-65
SLIDE 65

Properties of SIFT

Extraordinarily robust matching technique

  • Can handle changes in viewpoint

– Up to about 60 degree out of plane rotation

  • Can handle significant changes in illumination

– Sometimes even day vs. night (below)

  • Fast and efficient—can run in real time
  • Lots of code available

– http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT

slide-66
SLIDE 66

Dense sampling

  • So far: Descriptors of patches centered at sparse

interest points

  • But we can use the descriptors at any point
  • Common case:

– Regularly sampled grid of points – Dense SIFT (or LBP , or…)

128-dim SIFT feature Visual words from clusters in 128-dim space

Source: Deva Ramanan

slide-67
SLIDE 67

HOG

Compute SIFT descriptors on a grid equal to size of individual “cell” In practice, re-optimize hyper-parameters (2x2 grid of cells, with each cell of 8x8 pixels)

Source: Deva Ramanan

slide-68
SLIDE 68

Common visualization

  • Source: Deva Ramanan
slide-69
SLIDE 69

Alternative global desciptor: Gist

8 orientations 4 scales x 16 spatial bins 512 dimensions

Oliva and Torralba, 2001

1.Compute frequency energy (magnitude) at each spatial (x,y) location with gabor filters

  • 2. Average energy over 4x4 spatial grids

Source: Deva Ramanan

slide-70
SLIDE 70

Chair

slide-71
SLIDE 71

Car

slide-72
SLIDE 72

Aeroplane

slide-73
SLIDE 73

Car

slide-74
SLIDE 74

Car

slide-75
SLIDE 75

Image

What information does HOG have?

HOG

slide-76
SLIDE 76

Image

What information does HOG have?

HOG Nearest Neighbors

slide-77
SLIDE 77

Image

What information does HOG have?

HOG Nearest Neighbors

slide-78
SLIDE 78

Image

What information does HOG have?

HOG Nearest Neighbors

slide-79
SLIDE 79

Image

What information does HOG have?

HOG Nearest Neighbors

slide-80
SLIDE 80

What information is lost?

slide-81
SLIDE 81

What information is lost?

min

x∈Rd ||φ(x) − y||2 2

slide-82
SLIDE 82

Human Vision HOG Vision vs

slide-83
SLIDE 83

The HOGgles Challenge

Clap your hands when you see a person

slide-84
SLIDE 84
slide-85
SLIDE 85
slide-86
SLIDE 86
slide-87
SLIDE 87
slide-88
SLIDE 88
slide-89
SLIDE 89
slide-90
SLIDE 90
slide-91
SLIDE 91
slide-92
SLIDE 92
slide-93
SLIDE 93
slide-94
SLIDE 94

The HOGgles Challenge

slide-95
SLIDE 95

Chair Detections

slide-96
SLIDE 96

Chair Detections

slide-97
SLIDE 97

Car Detections

slide-98
SLIDE 98

Car Detections

slide-99
SLIDE 99
  • Car

Why did the detector fail?