Feature Descriptors
Computer Vision Fall 2018 Columbia University
Feature Descriptors Computer Vision Fall 2018 Columbia University - - PowerPoint PPT Presentation
Feature Descriptors Computer Vision Fall 2018 Columbia University Tali Dekel Tuesday, October 2, 11am, CEPSR 620 http://people.csail.mit.edu/talidekel Seam Carving Seam carving: main idea Content-aware resizing Traditional resizing [Shai
Computer Vision Fall 2018 Columbia University
Tuesday, October 2, 11am, CEPSR 620 http://people.csail.mit.edu/talidekel
Content-aware resizing Traditional resizing
[Shai & Avidan, SIGGRAPH 2007]
[Shai & Avidan, SIGGRAPH 2007]
Let a vertical seam s consist of h positions that form an 8- connected path. Let the cost of a seam be: Optimal seam minimizes this cost: Compute it efficiently with dynamic programming.
=
=
h i i
s f Energy Cost
1
)) ( ( ) (s
) ( min * s s
s Cost
=
s1 s2 s3 s4 s5
= ) ( f Energy
Slide credit: Kristen Grauman
Energy matrix (gradient magnitude)
Slide credit: Kristen Grauman
row i-1
connected seams at each entry (i,j):
minimal connected vertical seam.
) 1 , 1 ( ), , 1 ( ), 1 , 1 ( min ) , ( ) , ( + − − − − + = j i j i j i j i Energy j i M M M M
j-1 j row i M matrix: cumulative min energy (for vertical seams) Energy matrix (gradient magnitude) j j+1
Slide credit: Kristen Grauman
Energy matrix (gradient magnitude) M matrix (for vertical seams)
) 1 , 1 ( ), , 1 ( ), 1 , 1 ( min ) , ( ) , ( + − − − − + = j i j i j i j i Energy j i M M M M
Slide credit: Kristen Grauman
Energy matrix (gradient magnitude) M matrix (for vertical seams)
) 1 , 1 ( ), , 1 ( ), 1 , 1 ( min ) , ( ) , ( + − − − − + = j i j i j i j i Energy j i M M M M
Slide credit: Kristen Grauman
Original Image Energy Map
Blue = low energy Red = high energy
Slide credit: Kristen Grauman
Original Resized
Original Resized
Source: Deva Ramanan
Correspondence + geometry estimation
Source: Deva Ramanan
Sparse correspondence Dense corrrespondence
Source: Deva Ramanan
Source: Deva Ramanan
Source: Deva Ramanan
Which of these patches are easier to match? Why? How can we mathematically operationalize this?
Source: Deva Ramanan
Corner Detector: Basic Idea
“flat” region: no change in any direction “edge”: no change along the edge direction “corner”: significant change in all directions
Defn: points are “matchable” if small shifts always produce a large SSD error
Source: Deva Ramanan
Ex0,y0(u, v) =
[I(x + u, y + v) − I(x, y)]2
W
where Defn: points are “matchable” if small shifts always produce a large SSD error
cornerness(x0, y0) = min
u,v Ex0,y0(u, v)
Why can’t this be right?
Source: Deva Ramanan
Ex0,y0(u, v) =
[I(x + u, y + v) − I(x, y)]2
W
where Defn: points are “matchable” if small shifts always produce a large SSD error
cornerness(x0, y0) = min
u,v Ex0,y0(u, v)
Why can’t this be right?
Source: Deva Ramanan
Ex0,y0(u, v) =
[I(x + u, y + v) − I(x, y)]2
W
where Defn: points are “matchable” if small shifts always produce a large SSD error
cornerness(x0, y0) = min
u,v Ex0,y0(u, v)
u2 + v2 = 1
Source: Deva Ramanan
f(x + u) = f(x) + ∂f(x) ∂x u + 1 2 ∂f(x) ∂xx u2 + Higher Order Terms
Approximation of f(x) = ex at x=0 Why are low-order expansions reasonable? Underyling smoothness of real-world signals
Source: Deva Ramanan
log(x + 1)
I(x + u, y + v) = I(x, y) + h
∂I(x,y) ∂x ∂I(x,y) ∂y
i u v
1 2 ⇥u v⇤ " ∂I(x,y)
∂xx ∂I(x,y) ∂xy ∂I(x,y) ∂xy ∂I(x,y) ∂yy
# u v
gradient Hessian
I(x + u, y + v) ≈ I + Ixu + Iyv
Ix = ∂I(x, y) ∂x
where
Source: Deva Ramanan
Consider shifting the window W by (u,v)
summing up the squared differences
W
E(u, v) = X
(x,y)∈W
[I(x + u, y + u) − I(x, y)]2 ≈ X
(x,y)∈W
[I + Ixu + Iyv − I]2 = X
(x,y)∈W
[I2
xu2 + I2 yv2 + 2IxIyuv]
= ⇥ u v ⇤ A u v
A = X
(x,y)∈W
I2
x
IxIy IyIx I2
y
The surface E(u,v) is locally approximated by a quadratic form. Let’s try to understand its shape.
x y y x y x x
I I I I I I y x w M
, 2 2
) , (
James Hays
(x,y)∈W
Consider a horizontal “slice” of E(u, v):
This is the equation of an ellipse. const ] [
u M v u
James Hays
Consider a horizontal “slice” of E(u, v):
This is the equation of an ellipse.
R R M
1 1
and the orientation is determined by a rotation matrix 𝑆.
direction of the slowest change direction of the fastest change
(max)-1/2 (min)-1/2 const ] [
u M v u Diagonalization of M:
James Hays
A
Classification of image points using eigenvalues of M 1 2 “Corner” 1 and 2 are large, 1 ~ 2; E increases in all
directions
1 and 2 are small; E is almost constant
in all directions
“Edge” 1 >> 2 “Edge” 2 >> 1 “Flat” region
Source: Deva Ramanan
W
where Defn: points are “matchable” if small shifts always produce a large SSD error
Corner(x0, y0) = min
u2+v2=1 E(u, v)
E(u, v) = ⇥u v⇤ A u v
A = X
(x,y)∈W (x0,y0)
I2
x
IxIy IyIx I2
y
Implies (xo,yo) is a good corner if minimum eigenvalue is large
(or alternatively, if both eigenvalues of ‘A’ are large)
Source: Deva Ramanan
– Det(A) = λminλmax – Trace(A) = λmin+λmax
Computing eigenvalues (and eigenvectors) is expensive Turns out that it’s easy to compute their sum (trace) and product (determinant) (is proportional to the ratio of eigvenvalues and is 1 if they are equal) (also favors large eigenvalues) R = 4 Det(A) Trace(A)2
R = Det(A) − αTrace(A)2
(trace = sum of diagonal entries)
Source: Deva Ramanan
as squares of derivatives.
𝐽𝑦 𝐽𝑧 (𝐽𝑦
2)
(𝐽𝑧
2)
(𝐽𝑦 ∘ 𝐽𝑧)
𝑆
James Hays
We want to compute M at each pixel. 𝐽 𝐽𝑦𝑧
𝐽𝑦
2
𝐽𝑧
2
𝐷 = det 𝑁 − 𝛽 trace 𝑁 2 = 𝐽𝑦
2 ∘ 𝐽𝑧 2 − 𝐽𝑦 ∘ 𝐽𝑧 2
−𝛽 𝐽𝑦
2 + 𝐽𝑧 2 2
Source: Deva Ramanan
Compute corner response 𝐷
Source: Deva Ramanan
Find points with large corner response: 𝐷 > threshold
Source: Deva Ramanan
Take only the points of local maxima of 𝐷
Source: Deva Ramanan
Source: Deva Ramanan
Will interest point detector still fire on rotated & scaled images?
Source: Deva Ramanan
Are eigenvector stable under rotations? Are eigenvalues stable under rotations? No Yes
Source: Deva Ramanan
Second moment ellipse rotates but its shape (i.e., eigenvalues) remains the same. Corner location is covariant w.r.t. rotation
James Hays
Are eigenvector stable under scalings? Are eigenvalues stable under scalings? Yes No
Source: Deva Ramanan
All points will be classified as edges Corner
Corner location is not covariant to scaling!
James Hays
)) , ( ( )) , ( (
1 1
I f x I f
m m
i i i i
What is a good f ?
)) , ( (
1I f
mi i
)) , ( (
1
I f
m
i i
Response
function f
– - Laplacian (2nd derivative) of Gaussian (LoG)
Image blob size Scale space Function response
3 4 5 List of (x, y, s) Find maxima
Approximate LoG with Difference-of-Gaussian (DoG).
List of (x, y, s)
2k
Input image
… … k Find maxima
Source: Deva Ramanan
Distinctive Image Features from Scale-Invariant Keypoints
David G. Lowe Computer Science Department University of British Columbia Vancouver, B.C., Canada lowe@cs.ubc.ca January 5, 2004
IJCV 04
48,547 citations!
Scale Invariant Feature Transform
Distinctive Image Features from Scale-Invariant Keypoints
David G. Lowe Computer Science Department University of British Columbia Vancouver, B.C., Canada lowe@cs.ubc.ca January 5, 2004
IJCV 04
48,547 citations!
Scale Invariant Feature Transform 48,563 citations!
Represent each patch in a canonical scale and orientation (or general affine coordinate frame)
Source: Deva Ramanan
Compute gradients for all pixels in patch. Histogram (bin) gradients by orientation
2π
Source: Deva Ramanan
Represent each patch in a canonical scale and orientation (or general affine coordinate frame)
Source: Deva Ramanan
Histograms of gradient directions over spatial regions
\
Source: Deva Ramanan
Post-processing
“invariant to linear scalings of intensity”
x = x ||x||, x ∈ R128
approximate binarization allows for for flat patches with small gradients to remain stable x := min(x, .2) x := x ||x||
Source: Deva Ramanan
Historic problem in computer vision: “wide-baseline matching”
Source: Deva Ramanan
10 20 30 40 50 60 1 2 3 4 5 Correct nearest descriptor (%) Width n of descriptor (angle 50 deg, noise 4%) With 16 orientations With 8 orientations With 4 orientations
This graph shows the percent of keypoints giving the correct match to a data
What made this work? Exhaustive evaluation of hyper-parameters on annotated dataset
k
(a) (b)
Source: Deva Ramanan
Extraordinarily robust matching technique
– Up to about 60 degree out of plane rotation
– Sometimes even day vs. night (below)
– http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT
interest points
– Regularly sampled grid of points – Dense SIFT (or LBP , or…)
128-dim SIFT feature Visual words from clusters in 128-dim space
Source: Deva Ramanan
Compute SIFT descriptors on a grid equal to size of individual “cell” In practice, re-optimize hyper-parameters (2x2 grid of cells, with each cell of 8x8 pixels)
Source: Deva Ramanan
8 orientations 4 scales x 16 spatial bins 512 dimensions
Oliva and Torralba, 2001
1.Compute frequency energy (magnitude) at each spatial (x,y) location with gabor filters
Source: Deva Ramanan
Image
HOG
Image
HOG Nearest Neighbors
Image
HOG Nearest Neighbors
Image
HOG Nearest Neighbors
Image
HOG Nearest Neighbors