Victor Adrian Prisacariu
http://www.robots.ox.ac.uk/~victor
C18 Computer Vision
Lecture 6
Salient feature detection: points, edges and SIFTs
C18 Computer Vision Lecture 6 Salient feature detection: points, - - PowerPoint PPT Presentation
C18 Computer Vision Lecture 6 Salient feature detection: points, edges and SIFTs Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor Computer Vision: This time 5. Imaging geometry, camera calibration. 6. 6. Sa Salie lient
Victor Adrian Prisacariu
http://www.robots.ox.ac.uk/~victor
Lecture 6
Salient feature detection: points, edges and SIFTs
6.
Salie lient feature detectio ion and desc scrip iptio ion.
1. Cameras as photometric devices (just a note) 2. Image convolution (in the context of image derivatives). 3. Edge detection. 4. Corner detection. 5. SIFT + LIFT.
geometry.
correspondences, triangulation, neural nets.
ground on the rectilinear propagation of light.
– The nature of the scene surface (reflecting, absorbing). – The relative orientations of the surface, light source and cameras. – The power and spectral properties of the source. – The spectral properties of the imaging system.
imag age irr rradia iance is s pr prop
ional l to
scene rad adia iance.
detailed models of reflectance properties of the scene and the imaging process itself.
allowed this image to de-fogged.
considering aspects of scene geometry that are step ch changes in or in invariant to image irradiance.
– Natural (e.g. shadows) or deliberately introduced via artificial illumination.
sudden changes in surface
– Arise at the intersection of two surfaces, so represent geometrical entities fixed on the object.
due to changes in surface albedo:
– Reflectance properties are scaled by a changing albedo arising from surface markings. Also fixed to the
spatial l ch changes in image irradiance because:
– They are likely to be tied to scene geometry; – They are likely to be salient (have high info content)
into areas that, locally, have
1D structure 2D structure Edge De Detectors Cor Corner De Detectors
local op
higher geometrical entities or objects (this is changing nowadays …)
map of a rotated image is identical to the rotated feature map of the
– Search for maxima in the gradient magnitude
𝜖𝐽 𝜖𝑦 2
+
𝜖𝐽 𝜖𝑧 2
– Search for zeros in the Laplacian
𝛼2𝐽 =
𝜖2𝐽 𝜖𝑦2 + 𝜖2𝐽 𝜖𝑧2 - linear, but 2nd order
t magnitu tude is attractive because it is first order in the derivatives. Differentiation enhances noise, and the 2nd derivatives in the Laplacian operator introduces even more.
it can be implemented by a succession of fast linear
with a pixelated image.
th approaches s have been use sed.
– How to compute the gradients, and – How to suppress noise (so that insignificant variations in pixel intensity are not flagged as edges).
lution in inter ergral in the time domain between an input signal 𝑗(𝑢) and im impulse res esponse fu funct ction ℎ(𝑢). 𝑝 𝑢 = 𝑗 𝑢 ∗ ℎ 𝑢 = න
−∞ +∞
𝑗 𝑢 − 𝜐 ℎ 𝜐 d𝜐 = න
−∞ +∞
𝑗 𝜐 ℎ 𝑢 − 𝜐 d𝜐
ℎ 𝑢 ∗ 𝑗(𝑢). It also associates.
tial con
integ egral is: 𝑝 𝑦, 𝑧 = 𝑗 𝑦, 𝑧 ∗ ℎ(𝑦, 𝑧) න
−∞ +∞
න
−∞ +∞
𝑗 𝑦 − 𝑏, 𝑧 − 𝑐 ℎ 𝑏, 𝑐 d𝑏d𝑐
t spread fu funct ction, the con
luti tion mask or the con
ernel.
iscrete con
tion: 𝑃 𝑦, 𝑧 = 𝐽 𝑦, 𝑧 ∗ ℎ 𝑦, 𝑧 =
𝑗
𝑘
𝐽 𝑦 − 𝑗, 𝑧 − 𝑘 ℎ(𝑗, 𝑘) for 𝑦, 𝑧 ranging over the image width and height respectively, and 𝑗, 𝑘 ensuring access is made to any and all non-zero entries in h.
ℎ −𝑗, −𝑘 𝑃 𝑦, 𝑧 = ∑∑𝐽 𝑦 − 𝑗, 𝑧 − 𝑘 ℎ 𝑗, 𝑘 = ∑∑𝐽 𝑦 + 𝑗, 𝑧 + 𝑘 ℎ −𝑗, −𝑘 = ∑∑𝐽 𝑦 + 𝑗, 𝑧 + 𝑗 ത ℎ 𝑗, 𝑘 This looks more like the expression for cross-correlation but, confusingly, it is still called convolution.
Computing partial derivatives using convolution
centr tral fi finit ite dif difference.
𝜖𝐽 𝜖𝑦 ≈ 1 2 𝐽 𝑦 + 1, 𝑧 − 𝐽 𝑦, 𝑧 + 𝐽 𝑦, 𝑧 − 𝐽 𝑦 − 1, 𝑧 = = 1 2 𝐽 𝑦 + 1, 𝑧 − 𝐽 𝑦 − 1, 𝑧
ℎ −1 = + 1 2 ℎ 0 = 0 ℎ 1 = − 1 2 𝐸 𝑦, 𝑧 = 𝐽 𝑦, 𝑧 ∗ ℎ 𝑦 =
𝑗=−1 1
𝐽 𝑦 − 𝑗, 𝑧 ℎ(𝑗)
naively expect from the expression.
Computing partial derivatives using convolution
𝜖𝐽 𝜖𝑦 ≈ 1 2 𝐽 𝑦 + 1, 𝑧 − 𝐽 𝑦 − 1, 𝑧
ത ℎ −1 = − 1 2 ത ℎ 0 = 0 ത ℎ 1 = + 1 2 𝐸 𝑦, 𝑧 = 𝐽 𝑦, 𝑧 ∗ ത ℎ 𝑦 =
𝑗=−1 1
𝐽 𝑦 + 𝑗, 𝑧 ത ℎ(𝑗)
and summed …
𝐽(𝑦, 𝑧) x-gradient “image” y-gradient “image”
pixel, and the sum of the products completed
eparable le” in that it can be broken up into two separate 1D convolutions in 𝑦 and 𝑧: 𝑃 = ℎ2𝑒 ∗ 𝐽 = 𝑔
𝑧 ∗ 𝑦 ∗ 𝐽
required, so for a small mask it might be cheaper to use it directly.
Example result – Laplacian (not-directional)
The actual image is used grey- level, not colour Laplacian
ifferentia iatio ion en enhances no nois ise – the edge appears clear enough in images, but less so in the gradient map.
filter (𝑦, 𝑧) ↔ 𝐻(𝑡) to suppress noise edges outside the signal edge band.
𝑦, 𝑧 - an Infinite Impulse Response filter – not doable.
Compromise in space and spatial-frequency
transform pair.
Ω where: 𝑌2 =
∫ 𝑦−𝑦𝑛 2 ℎ2 𝑦 𝑒𝑦 ∫ ℎ2 𝑦 𝑒𝑦
with 𝑦𝑛 =
∫ 𝑦 ℎ2 𝑦 𝑒𝑦 ∫ ℎ2 𝑦 𝑒𝑦
Ω2 =
∫ 𝜕−𝜕𝑛 2 𝐼2 𝜕 𝑒𝜕 ∫ 𝐼2 𝜕 𝑒𝜕
with 𝜕𝑛 =
∫ 𝜕𝐼2 𝜕 𝑒𝜕 ∫ 𝐼2 𝜕 𝑒𝜕
ℎ 𝑦 = a a Gaussian fu functi tion =
1 2𝜌𝜏 exp(− 𝑦2 𝜏2)
6.3 Edge Detection: Simple Approach: Sobel
ℎ𝑦 = −1 1 −2 2 −1 1 and ℎ𝑧 = −1 −2 −1 1 2 1
edge map = 𝐽 ∗ ℎ𝑦 2 + 𝐽 ∗ ℎ𝑧
2
maximum along gradient.
gradient magnitude; and orientation.
running along linked strings.
Problems using 1D image structure for Geometry
sparse but interpretable. Much of the salient information is retained.
matching is a 1D problem, for which edges are very well suited.
is much harder when en th the e camera motion is is unknown: known as the Aperture problem.
matching is largely uncertain. (Indeed only line orientation is useful for detailed geometrical work.)
unambiguity of 2D image features or ``cor
Core idea: we find a corner if shifting a window in any direction gives a large change in intensity.
flat region: no change when shifting window in all directions edge region: no change when shifting window along the edge direction corner region: significant change when shifting window in all directions
Suppose that we are interested in correlating a 2𝑜 + 1 2 pixel patch at (𝑦, 𝑧) in 𝐽 with a similar patch displaced from it by (𝑣, 𝑤). We would write the correlation between patches as: 𝐷𝑣𝑤 𝑦, 𝑧 =
𝑗=−𝑜 𝑜
𝑘=−𝑜 𝑜
𝐽 𝑦 + 𝑗, 𝑧 + 𝑘 𝐽(𝑦 + 𝑣 + 𝑗, 𝑧 + 𝑤 + 𝑘) As we keep (𝑦, 𝑧) fixed, but change (𝑣, 𝑤), we build up the auto-correlation surface around (𝑦, 𝑧).
Nothing special Straight edge Plain corner
cheaper than 𝐷𝑣𝑤 and gives comparable quality results.
f squared dif ifferences:
𝐹𝑣𝑤 𝑦, 𝑧 =
𝑗=−𝑜 +𝑜
𝑘=−𝑜 +𝑜
𝐽 𝑦 + 𝑣 + 𝑗, 𝑧 + 𝑤 + 𝑘 − 𝐽 𝑦 + 𝑗, 𝑧 + 𝑘
2
form:
𝐹𝑣𝑤 ≈ 𝑣 𝑤 𝐷 𝑣 𝑤 where 𝐷 = ∑𝑦,𝑧
𝜖𝐽 𝜖𝑦 2 𝜖𝐽 𝜖𝑦 𝜖𝐽 𝜖𝑧 𝜖𝐽 𝜖𝑦 𝜖𝐽 𝜖𝑧 𝜖𝐽 𝜖𝑧 2
eigenvalu lue of C as 𝜇1 and 𝜇2.
dir direction of
slo slowest ch change dir direction of
fas astest ch change
(max)-1/2 (min)-1/2
– Both 𝜇1 and 𝜇2 ≈ 0 – the auto-correlation is small in all directions: imag age mus ust be be fl flat. – 𝜇1 ≫ 0, 𝜇2 ≈ 0 – the auto-correlation is high just in one direction so we found a 1D ed edge. – 𝜇1 ≫ 0, 𝜇 ≫ 0 – the auto-correlation is high in all directions, so we found a 2D cor
and Stephens: 𝑇𝐼𝑏𝑠𝑠𝑗𝑡 = 𝜇1𝜇2 𝜇1 + 𝜇2 𝑇𝐼𝑇 = 𝜇1𝜇2 − 𝛽 𝜇1 + 𝜇2 2 4
response of edge, sometimes called the “edge-phobia” ☺.
1. Filter image with Gaussian to reduce noise. 2. Compute magnitude of 𝑦 and 𝑧 gradients at each pixel. 3. Construct 𝐷 in a window around each pixel. 4. Solve for response (using det 𝐷 = 𝜇1𝜇2 and traceC = 𝜇1 + 𝜇2) 5. Evaluate response to check if corner.
SIFT is a point feature detector and descriptor.
– Equ quiv ivaria iant to image translations, scalings and rotations. – Rob
ficie ient. – Better than Harris, FAST, etc.
scriptor:
– Rob
– Com
– The detector & descriptor are used in: wide-baseline matching, image retrieval, mobile robot localization, panorama stitching, … – The descriptor is (was) used in: recognition of object categories, face recognition, image segmentation …
Co-variant feature detection means that the same “physical features” are extracted regardless
transformations. Intuition: the extracted features track (“co-vary”) with the image transformation. SIFT achieves co-variant detection by selecting blob-like structures The implicit assumption is that blobs will still look like blobs after the image is transformed.
local maxima in the response of a blob lob-lik ike im image e filt filter – the Laplacian of Gaussian.
𝑀𝑝𝐻 𝑦, 𝑧, 𝜏 = 𝜏2𝛼2𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽 𝑦, 𝑧 with 𝐻 𝑦, 𝑧, 𝜏 =
1 2𝜌𝜏2 exp − 𝑦2+𝑧2 2𝜏2
𝜏2𝛼2G(x, y, 𝜏)
∗ 𝐻(⋅,⋅, 𝜏)
search at all scales 𝜏.
𝑇, where 𝑝 = octave and 𝑡 = octave
subdivision.
𝜏02−1
3
𝜏020 𝜏02
1 3
𝜏02
2 3
𝜏02
3 3
𝜏021−1
3
𝜏02 𝜏021+1
3
𝜏021+2
3
𝜏021+3
3
𝑝 = 0 𝑝 =1
aussian sc scale le spa space instead of the LoG scale space: 𝑀𝑝𝐻 𝑦, 𝑧, 𝜏 = 𝜏2𝛼2𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽 𝑦, 𝑧 → 𝑀 𝑦, 𝑧, 𝜏 = 𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽(𝑦, 𝑧)
difference of consecutive levels of the Gaussian scale space: 𝐸𝑝𝐻 𝑦, 𝑧, 𝜏 = 𝐻 𝑦, 𝑧, 𝜆𝜏 − 𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽 𝑦, 𝑧 = 𝑀 𝑦, 𝑧, 𝜆𝜏 − 𝑀 𝑦, 𝑧, 𝜏 where 𝜆 > 1 is a small multiplicative factor.
𝑀𝑝𝐻 𝑦, 𝑧, 𝜏 ≈ 1 𝜆 − 1 [𝑀 𝑦, 𝑧, 𝜆𝜏 − 𝑀 𝑦, 𝑧, 𝜏 ]
After each octave, the sampling resolution can be halved to avoid storing and processing redundant data:
scale level.
𝑀𝑝𝐻 𝑦, 𝑧, 𝜏 = 𝜏2𝛼2𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽 𝑦, 𝑧
local maxima and min inima of the LoG scale space
𝑀𝑝𝐻(𝑦, 𝑧, 𝜏), check the 3 × 3 × 3 neighbours in space and space.
than all the neighbors, mark (𝑦, 𝑧, 𝜏) as a blob.
makes the different scales comparable so that computing local maxima in scale is meaningful.
at select space.
histrogram.
, y, , scale,
rientation.
Keypoint localization with orientation, after some thresholds
an orientation.
region that is distinctive and in invaria iant wit ith rotation and scale le.
– Step 1: rotate the window to standard orientation. – Step 2: Scale the window size based on the scale at which the point was found.
normalized region around the keypoint.
ector of
lues.
– For each image scale:
– Find maximas across the DoG + image scales.
– For each detected location:
normalize wrt scale.
patch.
– A detector (DET above). – An orientation estimator (ORI above). – A descriptor (DESC above).
entire process is now end-to-end differentiable.
used to train the descriptor.
descriptor.
Descr criptor:
maximse for non-corresponding patches.
(see next lectures).
Orie rientation estim timator:
between descriptors for different views of the same 3D point.
pipeline (see next lectures), but LIFT descriptors.
Detector:
computes the center of mass given the weights from a softmax.
score map ap.
passed through the orie
ion es estim imator.
patch orie
ion.
through the de descrip iptor vec ector.
LIF LIFT: Lea Learned In Invariant Fea eature Transfor
Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, Pascal Fua https://arxiv.org/abs/1603.09114
correlation.
Sobel and Canny edge detectors.