Instance-level recognition: Local invariant features
Cordelia Schmid INRIA, Grenoble
Instance-level recognition: Local invariant features Cordelia - - PowerPoint PPT Presentation
Instance-level recognition: Local invariant features Cordelia Schmid INRIA, Grenoble Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Scale
Instance-level recognition: Local invariant features
Cordelia Schmid INRIA, Grenoble
Overview
Local features
local descriptor
Several / many local descriptors per image Robust to occlusion/clutter + no object segmentation required Photometric : distinctive Invariant : to image transformations + illumination changes
Local features: interest points
Local features: Contours/segments
Local features: segmentation
Application: Matching
Find corresponding locations in the image
Illustration – Matching
Interest points extracted with Harris detector (~ 500 points)
Matching Illustration – Matching
Interest points matched based on cross-correlation (188 pairs)
Global constraints
Global constraint - Robust estimation of the fundamental matrix
Illustration – Matching
99 inliers 89 outliers
Application: Panorama stitching
Images courtesy of A. Zisserman.
Application: Instance-level recognition
Search for particular objects and scenes in large databases
geometric and photometric transformations
Instance-level recognition: Approach
Finding the object despite possibly large changes in scale, viewpoint, lighting and partial occlusion
Difficulties
Viewpoint Scale Lighting Occlusion
Difficulties
– Flickr has 2 billion photographs, more than 1 million added daily – Facebook has 15 billion images (~27 million added daily) – Facebook has 15 billion images (~27 million added daily) – Large personal collections – Video collections, i.e., YouTube
Applications
find relevant information on the web [Pixee – Milpix]
Applications
Search in 200h of video Query video
– Recognize docking station – Place recognition – Loop closure in SLAM
Applications
Local features - history
Tuytelaars&VanGool’00, Mikolajczyk&Schmid’02, Matas et al.’02]
Perona’05, Lazebnik et al.’06]
Opelt et al.’06, Ferrari et al.’06, Leordeanu et al.’07]
Local features
1) Extraction of local features
– Contours/segments – Interest points & regions – Regions by segmentation – Dense features, points on a regular grid
2) Description of local features
– Dependant on the feature type – Contours/segments angles, length ratios – Interest points greylevels, gradient histograms – Regions (segmentation) texture + color distributions
Line matching
– Zero crossing of Laplacian – Local maxima of gradients
– Mi-point, length, orientation, angle between pairs etc.
Experimental results – line segments
images 600 x 600
Experimental results – line segments
248 / 212 line segments extracted
Experimental results – line segments
89 matched line segments - 100% correct
Experimental results – line segments
3D reconstruction
Problems of line segments
– Line segments broken into parts – Missing parts
– 1D information – Similar for many segments
– Pairs and triplets of segments – Interest points
Overview
Harris detector [Harris & Stephens’88]
Based on the idea of auto-correlation Important difference in all directions => interest point
Harris detector
2 ) , ( ) , (
)) , ( ) , ( ( ) , ( y y x x I y x I y x A
k k k y x W y x k
k k∆ + ∆ + − =
∑
∈
Auto-correlation function for a point and a shift
) , ( y x ) , ( y x ∆ ∆
W
) , ( y x ∆ ∆
Harris detector
2 ) , ( ) , (
)) , ( ) , ( ( ) , ( y y x x I y x I y x A
k k k y x W y x k
k k∆ + ∆ + − =
∑
∈
Auto-correlation function for a point and a shift
) , ( y x ) , ( y x ∆ ∆
W
) , ( y x ∆ ∆
small in all directions large in one directions large in all directions
) , ( y x A {
→ uniform region → contour → interest point
Harris detector
Harris detector
Discret shifts are avoided based on the auto-correlation matrix
∆ + = ∆ + ∆ + x y x I y x I y x I y y x x I )) , ( ) , ( ( ) , ( ) , (
with first order approximation
( )
2 ) , (
) , ( ) , (
∑
∈
∆ ∆ =
W y x k k y k k x
k ky x y x I y x I
∆ + = ∆ + ∆ + y y x I y x I y x I y y x x I
k k y k k x k k k k
)) , ( ) , ( ( ) , ( ) , (
2 ) , ( ) , (
)) , ( ) , ( ( ) , ( y y x x I y x I y x A
k k k y x W y x k
k k∆ + ∆ + − =
∑
∈
Harris detector
( )
∆ ∆ ∆ ∆ =
∑ ∑ ∑ ∑
∈ ∈ ∈ ∈
y x y x I y x I y x I y x I y x I y x I y x
W y x k k y W y x k k y k k x W y x k k y k k x W y x k k x
k k k k k k k k) , ( 2 ) , ( ) , ( ) , ( 2
)) , ( ( ) , ( ) , ( ) , ( ) , ( )) , ( (
Auto-correlation matrix the sum can be smoothed with a Gaussian
( )
∆ ∆ ⊗ ∆ ∆ = y x I I I I I I G y x
y y x y x x 2 2
Harris detector
⊗ =
2 2
) , (
y y x y x x
I I I I I I G y x A
– captures the structure of the local neighborhood – measure based on eigenvalues of this matrix
=> interest point => contour => uniform region
Interpreting the eigenvalues
λ2 “Corner” λ1 and λ2 are large, λ ~ λ ; “Edge” λ2 >> λ1
Classification of image points using eigenvalues of autocorrelation matrix:
λ1 λ1 ~ λ2; \ λ1 and λ2 are small; “Edge” λ1 >> λ2 “Flat” region
Corner response function
“Corner” R > 0 “Edge” R < 0
2 2 1 2 1 2
) ( ) ( trace ) det( λ λ α λ λ α + − = − = A A R
α: constant (0.04 to 0.06)
“Edge” R < 0 “Flat” region |R| small
Harris detector
2 2 1 2 1 2
) ( )) ( ( ) det( λ λ λ λ + − = − = k A trace k A f
Reduces the effect of a strong contour
Reduces the effect of a strong contour
– Treshold (absolut, relatif, number of corners) – Local maxima
) , ( ) , ( 8 , y x f y x f
neighbourh y x thresh f ′ ′ ≥ − ∈ ∀ ∧ >
Harris Detector: Steps
Harris Detector: Steps
Compute corner response R
Harris Detector: Steps
Find points with large corner response: R>threshold
Harris Detector: Steps
Take only the points of local maxima of R
Harris Detector: Steps
Harris detector: Summary of steps
1. Compute Gaussian derivatives at each pixel 2. Compute second moment matrix A in a Gaussian window around each pixel 3. Compute corner response function R 4. Threshold R 4. Threshold R 5. Find local maxima of response function (non-maximum suppression)
Harris - invariance to transformations
– translation – rotation – similitude (rotation + scale change) – similitude (rotation + scale change) – affine (valide for local planar objects)
– Affine intensity changes (I → a I + b)
Harris Detector: Invariance Properties
Ellipse rotates but its shape (i.e. eigenvalues) remains the same Corner response R is invariant to image rotation
Harris Detector: Invariance Properties
Only derivatives are used => invariance to intensity shift I → I + b Intensity scale: I → a I R x (image coordinate)
threshold
R x (image coordinate) Partially invariant to affine intensity change, dependent on type of threshold
Harris Detector: Invariance Properties
All points will be classified as edges Corner
Not invariant to scaling
Comparison of patches - SSD
) , (
1 1 y
x
Comparison of the intensities in the neighborhood of two interest points
) , (
2 2 y
x
image 1 image 2
SSD : sum of square difference
2 2 2 2 1 1 1 ) 1 2 ( 1
)) , ( ) , ( (
2j y i x I j y i x I
N N i N N j N
+ + − + +
∑ ∑
− = − = +
Small difference values similar patches
Comparison of patches
2 2 2 2 1 1 1 ) 1 2 ( 1
)) , ( ) , ( (
2j y i x I j y i x I
N N i N N j N
+ + − + +
∑ ∑
− = − = +
SSD : Invariance to photometric transformations? Intensity changes (I → I + b) => Normalizing with the mean of each patch Intensity changes (I → aI + b)
2 2 2 2 2 1 1 1 1 ) 1 2 ( 1
)) ) , ( ( ) ) , ( ((
2m j y i x I m j y i x I
N N i N N j N
− + + − − + +
∑ ∑
− = − = +
=> Normalizing with the mean of each patch
2 2 2 2 2 2 1 1 1 1 1 ) 1 2 ( 1
) , ( ) , (
2 ∑ ∑− = − = +
− + + − − + +
N N i N N j N
m j y i x I m j y i x I σ σ
=> Normalizing with the mean and standard deviation of each patch
Cross-correlation ZNCC
2 2 2 2 2 2 1 1 1 1 1 ) 1 2 ( 1
) , ( ) , (
2 ∑ ∑− = − = +
− + + − − + +
N N i N N j N
m j y i x I m j y i x I σ σ zero normalized SSD ZNCC: zero normalized cross correlation − + + ⋅ − + +
∑ ∑
− = − = + 2 2 2 2 2 1 1 1 1 1 ) 1 2 ( 1
) , ( ) , (
2σ σ m j y i x I m j y i x I
N N i N N j N
ZNCC values between -1 and 1, 1 when identical patches in practice threshold around 0.5
Local descriptors
Greyvalue derivatives: Image gradient
intensity
– how does this relate to the direction of the edge?
Differentiation and convolution
∂f ∂x = lim
ε→0
f x + ε, y
( )
ε − f x,y
( )
ε ∂f ∂x ≈ f xn+1,y
( )− f xn, y ( )
∆x
∂x ≈ ∆x
1
Finite difference filters
Effects of noise
– Plotting intensity as a function of position gives a signal
Solution: smooth first
f g
) ( g f dx d ∗ f * g
) ( g f dx d ∗
associative:
g dx d f g f dx d ∗ = ∗ ) (
Derivative theorem of convolution
g dx d f ∗
f
g dx d
Local descriptors
∗ ∗ ) ( * ) , ( ) ( ) , ( ) ( ) , ( σ σ σ
xG y x I G y x I G y x I
– Convolution with Gaussian derivatives
) 2 exp( 2 1 ) , , (
2 2 2 2
σ πσ σ y x y x G + − =
∫ ∫
∞ ∞ − ∞ ∞ −
′ ′ ′ − ′ − ′ ′ = ∗ y d x d y y x x I y x G G y x I ) , ( ) , , ( ) ( ) , ( σ σ
=
( * ) , ( ) ( * ) , ( ) ( * ) , ( ) ( * ) , ( ) , ( σ σ σ σ
yy xy xx yG y x I G y x I G y x I G y x I y x v
Local descriptors
∗ ∗ ) , ( ) , ( ) , ( ) ( * ) , ( ) ( ) , ( ) ( ) , ( y x L y x L y x L G y x I G y x I G y x I
y x y x
σ σ σ
Notation for greyvalue derivatives [Koenderink’87]
= =
, ( ) , ( ) , ( ) , ( ) ( * ) , ( ) ( * ) , ( ) ( * ) , ( ) ( * ) , ( ) , ( y x L y x L y x L y x L G y x I G y x I G y x I G y x I y x
yy xy xx y yy xy xx y
σ σ σ σ v
Invariance?
Local descriptors – rotation invariance
Invariance to image rotation : differential invariants [Koen87]
+ + +
y y x x
L L L L L L L L L L L L L 2
gradient magnitude
+ + + + +
yy xy xy xx xx yy xx yy yy y x xy x x xx
L L L L L L L L L L L L L L L L 2 2
Laplacian
Laplacian of Gaussian (LOG)
) ( ) ( σ σ
yy xx
G G LOG + =
SIFT descriptor [Lowe’99]
– 8 orientations of the gradient – 4x4 spatial grid – Dimension 128 – soft-assignment to spatial bins – normalization of the descriptor to norm one – normalization of the descriptor to norm one – comparison with Euclidean distance gradient 3D histogram
→ →
image patch
y x
Local descriptors - rotation invariance
– extract gradient orientation – histogram over gradient orientation – peak in this histogram
2π
Local descriptors – illumination change
in case of an affine transformation
b aI I + = ) ( ) (
2 1
x x
Local descriptors – illumination change
in case of an affine transformation
b aI I + = ) ( ) (
2 1
x x
y y x x yy xx
L L L L L L + + ) (
Local descriptors – illumination change
in case of an affine transformation
b aI I + = ) ( ) (
2 1
x x
y y x x yy xx
L L L L L L + + ) (
Invariance to scale changes
– In case of a convolution with Gaussian derivatives defined by
) 2 exp( 2 1 ) , , (
2 2 2 2
σ πσ σ y x y x G + − =
∫ ∫
∞ ∞ − ∞ ∞ −
′ ′ ′ − ′ − ′ ′ = ∗ y d x d y y x x I y x G G y x I ) , ( ) , , ( ) ( ) , ( σ σ
σ
Overview
Scale invariance - motivation
Harris detector + scale changes
|) | |, max(| | } ) ), ( ( | ) , {( | ) (
i i i i i i
H dist R b a b a b a ε ε < =
Repeatability rate
Scale adaptation
= =
1 1 2 2 2 2 1 1 1
sy sx I y x I y x I Scale change between two images Scale adapted derivative calculation
Scale adaptation
= =
1 1 2 2 2 2 1 1 1
sy sx I y x I y x I Scale change between two images
) ( ) (
1 12 2 2 1 1 1
σ σ s G y x I s G y x I
n ni i n i i
= ⊗
Scale adapted derivative calculation
σ s
n
s
Scale adaptation
) (σ
i
L where are the derivatives with Gaussian convolution
⊗ ) ( ) ( ) ( ) ( ) ~ (
2 2
σ σ σ σ σ
y y x y x x
L L L L L L G
Scale adaptation
) (σ
i
L where are the derivatives with Gaussian convolution
⊗ ) ( ) ( ) ( ) ( ) ~ (
2 2
σ σ σ σ σ
y y x y x x
L L L L L L G
⊗ ) ( ) ( ) ( ) ( ) ~ (
2 2 2
σ σ σ σ σ s L s L L s L L s L s G s
y y x y x x
Scale adapted auto-correlation matrix
Harris detector – adaptation to scale
} ) ), ( ( | ) , {( ) ( ε ε < =
i i i iH dist R b a b a
Multi-scale matching algorithm
1 = s 3 = s 5 = s
Multi-scale matching algorithm
1 = s
8 matches
Multi-scale matching algorithm
1 = s
3 matches
Robust estimation of a global affine transformation
Multi-scale matching algorithm
1 = s
3 matches
3 = s
4 matches
Multi-scale matching algorithm
1 = s
3 matches
3 = s 5 = s
4 matches 16 matches
correct scale
highest number of matches
Matching results
Scale change of 5.7
Matching results
100% correct matches (13 matches)
Scale selection
convolving it with Laplacians at several scales and looking for the maximum response
increases:
Why does this happen?
increasing σ
(radius=8)
Scale normalization
step edge decreases as σ increases
1 π σ 2 1
Scale normalization
step edge decreases as σ increases
multiply Gaussian derivative by σ
multiplied by σ2
Effect of scale normalization
Unnormalized Laplacian response Original signal Scale-normalized Laplacian response maximum
Blob detection in 2D
blob detection in 2D
2 2 2 2 2
y g x g g ∂ ∂ + ∂ ∂ = ∇
Blob detection in 2D
blob detection in 2D
∂ ∂ + ∂ ∂ = ∇
2 2 2 2 2 2 norm
y g x g g σ
Scale-normalized:
Scale selection
2 2 2
2 / ) ( 2 2 2
) 2 (
σ
σ
y x
e y x
+ −
− +
(up to scale)
maximum at
2 / r = σ
r
2 / r image Laplacian response scale (σ)
Characteristic scale
produces peak of Laplacian response
characteristic scale
International Journal of Computer Vision 30 (2): pp 77--116.
Scale selection
several scales
e.g. Laplacian
| ) ( |
2 yy xx
L L s +
| ) ( |
2 yy xx
L L s +
∗
s
scale
Scale selection
s
scale
Scale selection
s
∗ ∗ =
⋅
2 1
s s s
scale scale
Scale-invariant detectors
Harris-Laplace Laplacian
Harris-Laplace
multi-scale Harris points invariant points + associated regions [Mikolajczyk & Schmid’01] selection of points at maximum of Laplacian
Matching results
213 / 190 detected interest points
Matching results
58 points are initially matched
Matching results
32 points are matched after verification – all correct
LOG detector
Convolve image with scale- normalized Laplacian at several scales
)) ( ) ( (
2
σ σ
yy xx
G G s LOG + =
Detection of maxima and minima
Hessian detector
=
yy xy xy xx
L L L L x H ) (
Hessian matrix
2 xy yy xx
L L L DET − =
Determinant of Hessian matrix Penalizes/eliminates long structures
with small derivative in a single direction
Efficient implementation
Laplacian
) ( ) ( σ σ G k G DOG − =
DOG detector
time
David G. Lowe. "Distinctive image features from scale-invariant keypoints.”IJCV 60 (2).