C18 Computer Vision Lecture 6 Salient feature detection: points, - - PowerPoint PPT Presentation

c18 computer vision
SMART_READER_LITE
LIVE PREVIEW

C18 Computer Vision Lecture 6 Salient feature detection: points, - - PowerPoint PPT Presentation

C18 Computer Vision Lecture 6 Salient feature detection: points, edges and SIFTs Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor Computer Vision: This time 5. Imaging geometry, camera calibration. 6. 6. Sa Salie lient


slide-1
SLIDE 1

Victor Adrian Prisacariu

http://www.robots.ox.ac.uk/~victor

C18 Computer Vision

Lecture 6

Salient feature detection: points, edges and SIFTs

slide-2
SLIDE 2

Computer Vision: This time…

  • 5. Imaging geometry, camera calibration.

6.

  • 6. Sa

Salie lient feature detectio ion and desc scrip iptio ion.

1. Cameras as photometric devices (just a note) 2. Image convolution (in the context of image derivatives). 3. Edge detection. 4. Corner detection. 5. SIFT + LIFT.

  • 7. Recovering 3D from two images I: epipolar

geometry.

  • 8. Recovering 3D from two images II: stereo

correspondences, triangulation, neural nets.

slide-3
SLIDE 3

Feature points are useful …

slide-4
SLIDE 4

6.1 Cameras as Photometric Devices

  • In Lecture 5, we considered the camera as a geometric abstraction

ground on the rectilinear propagation of light.

  • But they are also photometric devices.
  • Important to consider the way image formation depends on:

– The nature of the scene surface (reflecting, absorbing). – The relative orientations of the surface, light source and cameras. – The power and spectral properties of the source. – The spectral properties of the imaging system.

  • The important overall outcome (e.g. Forsyth & Ponce, p62) is that

imag age irr rradia iance is s pr prop

  • portio

ional l to

  • sce

scene rad adia iance.

  • A relief! This means the image really can tell us about the scene.
slide-5
SLIDE 5

6.1 Cameras as Photometric Devices

  • But the study of photometry (often called physics-based vision) requires

detailed models of reflectance properties of the scene and the imaging process itself.

  • E.g. understanding (or learning) how light scatters on water droplets

allowed this image to de-fogged.

  • Can we avoid such detail when aiming to get geometry? … Yes – by

considering aspects of scene geometry that are step ch changes in or in invariant to image irradiance.

slide-6
SLIDE 6

Step irradiance changes are due to …

  • Changes in Scene Radiance:

– Natural (e.g. shadows) or deliberately introduced via artificial illumination.

  • Changes in scene reflectance at

sudden changes in surface

  • rientation:

– Arise at the intersection of two surfaces, so represent geometrical entities fixed on the object.

  • Changes in reflectance properties

due to changes in surface albedo:

– Reflectance properties are scaled by a changing albedo arising from surface markings. Also fixed to the

  • bject.
slide-7
SLIDE 7

6.2 Feature detection

  • We are looking for step sp

spatial l ch changes in image irradiance because:

– They are likely to be tied to scene geometry; – They are likely to be salient (have high info content)

  • A simple classification of changes in image irradiance 𝐽(𝑦, 𝑧) is

into areas that, locally, have

1D structure 2D structure Edge De Detectors Cor Corner De Detectors

slide-8
SLIDE 8

Image operations for Feature Detection

  • Feature detection is often a loc

local op

  • peration, working without knowledge of

higher geometrical entities or objects (this is changing nowadays …)

  • We use pixel values 𝐽(𝑦, 𝑧) and derivatives 𝜖𝐽/𝜖𝑦 and 𝜖𝐽/𝜖𝑧.
  • It is useful to have a non
  • n-directional com
  • mbination of these, so that a feature

map of a rotated image is identical to the rotated feature map of the

  • riginal image.
  • Considering edge detection, two possibilities are:

– Search for maxima in the gradient magnitude

𝜖𝐽 𝜖𝑦 2

+

𝜖𝐽 𝜖𝑧 2

  • 1st order, but non-linear

– Search for zeros in the Laplacian

𝛼2𝐽 =

𝜖2𝐽 𝜖𝑦2 + 𝜖2𝐽 𝜖𝑧2 - linear, but 2nd order

slide-9
SLIDE 9

Which to chose?

  • The gradient

t magnitu tude is attractive because it is first order in the derivatives. Differentiation enhances noise, and the 2nd derivatives in the Laplacian operator introduces even more.

  • The Laplacian is attractive because it is linear, which means

it can be implemented by a succession of fast linear

  • perations -- effectively matrix operations as we are dealing

with a pixelated image.

  • Both

th approaches s have been use sed.

  • For both approaches we need to consider:

– How to compute the gradients, and – How to suppress noise (so that insignificant variations in pixel intensity are not flagged as edges).

slide-10
SLIDE 10

Preamble: Spatial Convolution

  • You are familiar with the 1D con
  • nvolu

lution in inter ergral in the time domain between an input signal 𝑗(𝑢) and im impulse res esponse fu funct ction ℎ(𝑢). 𝑝 𝑢 = 𝑗 𝑢 ∗ ℎ 𝑢 = න

−∞ +∞

𝑗 𝑢 − 𝜐 ℎ 𝜐 d𝜐 = න

−∞ +∞

𝑗 𝜐 ℎ 𝑢 − 𝜐 d𝜐

  • The second equality reminds us that convolution commutes 𝑗 𝑢 ∗ ℎ 𝑢 =

ℎ 𝑢 ∗ 𝑗(𝑢). It also associates.

  • In the frequency domain we would write 𝑃 𝑡 = 𝐼 𝑡 𝐽(𝑡).
  • Now in the continuous 2D domain, the spatia

tial con

  • nvolution in

integ egral is: 𝑝 𝑦, 𝑧 = 𝑗 𝑦, 𝑧 ∗ ℎ(𝑦, 𝑧) න

−∞ +∞

−∞ +∞

𝑗 𝑦 − 𝑏, 𝑧 − 𝑐 ℎ 𝑏, 𝑐 d𝑏d𝑐

  • In the spatial domain you’ll often see ℎ(𝑦, 𝑧) referred as the poin
  • int

t spread fu funct ction, the con

  • nvolu

luti tion mask or the con

  • nvolution kern

ernel.

slide-11
SLIDE 11

Discrete Spatial Convolution

  • For pixelated images 𝐽(𝑦, 𝑧), we need a dis

iscrete con

  • nvoluti

tion: 𝑃 𝑦, 𝑧 = 𝐽 𝑦, 𝑧 ∗ ℎ 𝑦, 𝑧 = ෍

𝑗

𝑘

𝐽 𝑦 − 𝑗, 𝑧 − 𝑘 ℎ(𝑗, 𝑘) for 𝑦, 𝑧 ranging over the image width and height respectively, and 𝑗, 𝑘 ensuring access is made to any and all non-zero entries in h.

  • Many authors rewrite the convolution by replacing ℎ(𝑗, 𝑘) with ത

ℎ −𝑗, −𝑘 𝑃 𝑦, 𝑧 = ∑∑𝐽 𝑦 − 𝑗, 𝑧 − 𝑘 ℎ 𝑗, 𝑘 = ∑∑𝐽 𝑦 + 𝑗, 𝑧 + 𝑘 ℎ −𝑗, −𝑘 = ∑∑𝐽 𝑦 + 𝑗, 𝑧 + 𝑗 ത ℎ 𝑗, 𝑘 This looks more like the expression for cross-correlation but, confusingly, it is still called convolution.

slide-12
SLIDE 12

Computing partial derivatives using convolution

  • We can approximate 𝜖𝐽/𝜖𝑦 at image pixel (𝑦, 𝑧) using ce

centr tral fi finit ite dif difference.

𝜖𝐽 𝜖𝑦 ≈ 1 2 𝐽 𝑦 + 1, 𝑧 − 𝐽 𝑦, 𝑧 + 𝐽 𝑦, 𝑧 − 𝐽 𝑦 − 1, 𝑧 = = 1 2 𝐽 𝑦 + 1, 𝑧 − 𝐽 𝑦 − 1, 𝑧

  • Writing this as a “proper” convolution would set:

ℎ −1 = + 1 2 ℎ 0 = 0 ℎ 1 = − 1 2 𝐸 𝑦, 𝑧 = 𝐽 𝑦, 𝑧 ∗ ℎ 𝑦 = ෍

𝑗=−1 1

𝐽 𝑦 − 𝑗, 𝑧 ℎ(𝑗)

  • Notice how the “proper” mask is revered from what we might

naively expect from the expression.

slide-13
SLIDE 13

Computing partial derivatives using convolution

  • Now, as ever:

𝜖𝐽 𝜖𝑦 ≈ 1 2 𝐽 𝑦 + 1, 𝑧 − 𝐽 𝑦 − 1, 𝑧

  • Writing this as a “sort of correlation”

ത ℎ −1 = − 1 2 ത ℎ 0 = 0 ത ℎ 1 = + 1 2 𝐸 𝑦, 𝑧 = 𝐽 𝑦, 𝑧 ∗ ത ℎ 𝑦 = ෍

𝑗=−1 1

𝐽 𝑦 + 𝑗, 𝑧 ത ℎ(𝑗)

  • Note how we can just lay this mask directly on the pixels to be multiplied

and summed …

slide-14
SLIDE 14

Example Results

𝐽(𝑦, 𝑧) x-gradient “image” y-gradient “image”

slide-15
SLIDE 15

In 2 dimensions:

  • As before, one imagines the flipped “correlation-like” mask centered on the

pixel, and the sum of the products completed

  • Often a 2D mask is “sep

eparable le” in that it can be broken up into two separate 1D convolutions in 𝑦 and 𝑧: 𝑃 = ℎ2𝑒 ∗ 𝐽 = 𝑔

𝑧 ∗ 𝑕𝑦 ∗ 𝐽

  • The computational complexity is lower, but intermediate storage is

required, so for a small mask it might be cheaper to use it directly.

slide-16
SLIDE 16

Example result – Laplacian (not-directional)

The actual image is used grey- level, not colour Laplacian

slide-17
SLIDE 17

Preamble: Noise and Smoothing

  • Dif

ifferentia iatio ion en enhances no nois ise – the edge appears clear enough in images, but less so in the gradient map.

  • If we know the noise spectrum, we might find an optimal brickwall

filter 𝑕(𝑦, 𝑧) ↔ 𝐻(𝑡) to suppress noise edges outside the signal edge band.

  • But a sharp cut-off in spatial-frequency requires a wide spatial

𝑕 𝑦, 𝑧 - an Infinite Impulse Response filter – not doable.

  • Can we compromise spread in space and signal-frequency in some
  • ptimal way?
slide-18
SLIDE 18

Compromise in space and spatial-frequency

  • Suppose IR function is ℎ(𝑦) and ℎ ⇌ 𝐼 is a Fourier

transform pair.

  • Define the spreads in space and spatial-frequency as 𝑌 and

Ω where: 𝑌2 =

∫ 𝑦−𝑦𝑛 2 ℎ2 𝑦 𝑒𝑦 ∫ ℎ2 𝑦 𝑒𝑦

with 𝑦𝑛 =

∫ 𝑦 ℎ2 𝑦 𝑒𝑦 ∫ ℎ2 𝑦 𝑒𝑦

Ω2 =

∫ 𝜕−𝜕𝑛 2 𝐼2 𝜕 𝑒𝜕 ∫ 𝐼2 𝜕 𝑒𝜕

with 𝜕𝑛 =

∫ 𝜕𝐼2 𝜕 𝑒𝜕 ∫ 𝐼2 𝜕 𝑒𝜕

  • Now vary ℎ to minimize the product of the spreads 𝑉 = 𝑌Ω.
  • An uncertainty principle indicates that 𝑉𝑛𝑗𝑜 = 1/2 when:

ℎ 𝑦 = a a Gaussian fu functi tion =

1 2𝜌𝜏 exp(− 𝑦2 𝜏2)

slide-19
SLIDE 19

6.3 Edge Detection: Simple Approach: Sobel

  • Convolve with kernels:

ℎ𝑦 = −1 1 −2 2 −1 1 and ℎ𝑧 = −1 −2 −1 1 2 1

  • Compute magnitudes:

edge map = 𝐽 ∗ ℎ𝑦 2 + 𝐽 ∗ ℎ𝑧

2

  • (optionally) smooth.
slide-20
SLIDE 20

6.3 Edge Detection: Fancy: Canny

  • Step 1: Gaussian smooth images 𝑇 = 𝐻𝑦 ∗ 𝐻𝑧 ∗ 𝐽.
  • Step 2: Compute gradients 𝜖𝑇/𝜖𝑦 and 𝜖𝑇/𝜖𝑧.
  • Step 3: Non-maximal suppression by searching for

maximum along gradient.

  • Step 4: Store edge’s sub-pixel position (𝑦, 𝑧);

gradient magnitude; and orientation.

  • Step 5: Link edges into strings using orientation.
  • Step 6: Perform thresholding with hysteresis, by

running along linked strings.

slide-21
SLIDE 21

6.3 Edge Detection: Fancy: Canny

slide-22
SLIDE 22

Problems using 1D image structure for Geometry

  • Computing edges make the feature map

sparse but interpretable. Much of the salient information is retained.

  • If the camera motion is known, feature

matching is a 1D problem, for which edges are very well suited.

  • However, matching is

is much harder when en th the e camera motion is is unknown: known as the Aperture problem.

  • End points are unstable, hence line

matching is largely uncertain. (Indeed only line orientation is useful for detailed geometrical work.)

  • In general, matching requires the

unambiguity of 2D image features or ``cor

  • rners''.
slide-23
SLIDE 23

6.4 Corner Detection: Harris

Core idea: we find a corner if shifting a window in any direction gives a large change in intensity.

flat region: no change when shifting window in all directions edge region: no change when shifting window along the edge direction corner region: significant change when shifting window in all directions

slide-24
SLIDE 24

Auto-correlation

Suppose that we are interested in correlating a 2𝑜 + 1 2 pixel patch at (𝑦, 𝑧) in 𝐽 with a similar patch displaced from it by (𝑣, 𝑤). We would write the correlation between patches as: 𝐷𝑣𝑤 𝑦, 𝑧 = ෍

𝑗=−𝑜 𝑜

𝑘=−𝑜 𝑜

𝐽 𝑦 + 𝑗, 𝑧 + 𝑘 𝐽(𝑦 + 𝑣 + 𝑗, 𝑧 + 𝑤 + 𝑘) As we keep (𝑦, 𝑧) fixed, but change (𝑣, 𝑤), we build up the auto-correlation surface around (𝑦, 𝑧).

slide-25
SLIDE 25

Auto-correlation

Nothing special Straight edge Plain corner

slide-26
SLIDE 26

Sum of Squared Differences

  • There is an expression which can be computed

cheaper than 𝐷𝑣𝑤 and gives comparable quality results.

  • This is the sum of

f squared dif ifferences:

𝐹𝑣𝑤 𝑦, 𝑧 = ෍

𝑗=−𝑜 +𝑜

𝑘=−𝑜 +𝑜

𝐽 𝑦 + 𝑣 + 𝑗, 𝑧 + 𝑤 + 𝑘 − 𝐽 𝑦 + 𝑗, 𝑧 + 𝑘

2

slide-27
SLIDE 27

Harris Corner Detector

  • Use 1st order Taylor expansion on 𝐹𝑣𝑤(𝑦, 𝑧) and write in matrix

form:

𝐹𝑣𝑤 ≈ 𝑣 𝑤 𝐷 𝑣 𝑤 where 𝐷 = ∑𝑦,𝑧

𝜖𝐽 𝜖𝑦 2 𝜖𝐽 𝜖𝑦 𝜖𝐽 𝜖𝑧 𝜖𝐽 𝜖𝑦 𝜖𝐽 𝜖𝑧 𝜖𝐽 𝜖𝑧 2

  • We can compute the eig

eigenvalu lue of C as 𝜇1 and 𝜇2.

dir direction of

  • f the

slo slowest ch change dir direction of

  • f the

fas astest ch change

(max)-1/2 (min)-1/2

slide-28
SLIDE 28

Harris Corner Detector

  • We can therefore classify the image structure:

– Both 𝜇1 and 𝜇2 ≈ 0 – the auto-correlation is small in all directions: imag age mus ust be be fl flat. – 𝜇1 ≫ 0, 𝜇2 ≈ 0 – the auto-correlation is high just in one direction so we found a 1D ed edge. – 𝜇1 ≫ 0, 𝜇 ≫ 0 – the auto-correlation is high in all directions, so we found a 2D cor

  • rner.
  • Harris’ original “interest” score was later modified by Harris

and Stephens: 𝑇𝐼𝑏𝑠𝑠𝑗𝑡 = 𝜇1𝜇2 𝜇1 + 𝜇2 𝑇𝐼𝑇 = 𝜇1𝜇2 − 𝛽 𝜇1 + 𝜇2 2 4

  • 𝛽 is a positive constant 0 ≤ 𝛽 ≤ 1 which decreases the

response of edge, sometimes called the “edge-phobia” ☺.

slide-29
SLIDE 29

Harris Corners Summary

1. Filter image with Gaussian to reduce noise. 2. Compute magnitude of 𝑦 and 𝑧 gradients at each pixel. 3. Construct 𝐷 in a window around each pixel. 4. Solve for response (using det 𝐷 = 𝜇1𝜇2 and traceC = 𝜇1 + 𝜇2) 5. Evaluate response to check if corner.

slide-30
SLIDE 30

Scale Invariant Feature Transform (SIFT)

SIFT is a point feature detector and descriptor.

  • Local feature detector:

– Equ quiv ivaria iant to image translations, scalings and rotations. – Rob

  • bust and effic

ficie ient. – Better than Harris, FAST, etc.

  • Local feature desc

scriptor:

– Rob

  • bust to detector inaccuracies, variable illumination, small
  • bject deformations, etc.

– Com

  • mpact (128 bytes, can be made smaller).
  • SIFT has numerous applications:

– The detector & descriptor are used in: wide-baseline matching, image retrieval, mobile robot localization, panorama stitching, … – The descriptor is (was) used in: recognition of object categories, face recognition, image segmentation …

slide-31
SLIDE 31

Co-variant detection with SIFT

Co-variant feature detection means that the same “physical features” are extracted regardless

  • f the viewpoint-induced image

transformations. Intuition: the extracted features track (“co-vary”) with the image transformation. SIFT achieves co-variant detection by selecting blob-like structures The implicit assumption is that blobs will still look like blobs after the image is transformed.

slide-32
SLIDE 32

Blob detection

  • SIFT detects blobs as loc

local maxima in the response of a blob lob-lik ike im image e filt filter – the Laplacian of Gaussian.

  • The LoF filter is convolved with the image to obtain a blob response or LoG:

𝑀𝑝𝐻 𝑦, 𝑧, 𝜏 = 𝜏2𝛼2𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽 𝑦, 𝑧 with 𝐻 𝑦, 𝑧, 𝜏 =

1 2𝜌𝜏2 exp − 𝑦2+𝑧2 2𝜏2

𝜏2𝛼2G(x, y, 𝜏)

∗ 𝐻(⋅,⋅, 𝜏)

slide-33
SLIDE 33

Multi-scale blob response

  • Convolution searches for blobs at all locations (𝑦, 𝑧). We must all

search at all scales 𝜏.

  • Repeat for 𝜏 𝑝, 𝑡 = 𝜏02𝑝+𝑡

𝑇, where 𝑝 = octave and 𝑡 = octave

subdivision.

𝜏02−1

3

𝜏020 𝜏02

1 3

𝜏02

2 3

𝜏02

3 3

𝜏021−1

3

𝜏02 𝜏021+1

3

𝜏021+2

3

𝜏021+3

3

𝑝 = 0 𝑝 =1

slide-34
SLIDE 34

Gaussian vs Laplacian scale spaces

  • For efficiency, it is often better to start from the Gau

aussian sc scale le spa space instead of the LoG scale space: 𝑀𝑝𝐻 𝑦, 𝑧, 𝜏 = 𝜏2𝛼2𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽 𝑦, 𝑧 → 𝑀 𝑦, 𝑧, 𝜏 = 𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽(𝑦, 𝑧)

  • We then define a Difference of Gaussian space (DoG), as the

difference of consecutive levels of the Gaussian scale space: 𝐸𝑝𝐻 𝑦, 𝑧, 𝜏 = 𝐻 𝑦, 𝑧, 𝜆𝜏 − 𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽 𝑦, 𝑧 = 𝑀 𝑦, 𝑧, 𝜆𝜏 − 𝑀 𝑦, 𝑧, 𝜏 where 𝜆 > 1 is a small multiplicative factor.

  • The LoG scale space can be obtained by:

𝑀𝑝𝐻 𝑦, 𝑧, 𝜏 ≈ 1 𝜆 − 1 [𝑀 𝑦, 𝑧, 𝜆𝜏 − 𝑀 𝑦, 𝑧, 𝜏 ]

slide-35
SLIDE 35

Gaussian scale space

slide-36
SLIDE 36

Scale space pyramid

After each octave, the sampling resolution can be halved to avoid storing and processing redundant data:

slide-37
SLIDE 37

Scale space pyramid computation

  • Blur (convolve with a Gaussian kernel) to go to the next

scale level.

  • Down-sample by half to go to the next scale octave.
  • Subtract pairs to go from Gaussian scale space to DoG
slide-38
SLIDE 38

Multi-scale blob localisation

  • So far we have computed the (approximate) LoG scale space

𝑀𝑝𝐻 𝑦, 𝑧, 𝜏 = 𝜏2𝛼2𝐻 𝑦, 𝑧, 𝜏 ∗ 𝐽 𝑦, 𝑧

  • Blobs = lo

local maxima and min inima of the LoG scale space

  • Computation: for each value

𝑀𝑝𝐻(𝑦, 𝑧, 𝜏), check the 3 × 3 × 3 neighbours in space and space.

  • If the value is larger (or smaller)

than all the neighbors, mark (𝑦, 𝑧, 𝜏) as a blob.

  • The 𝜏2 factor in the LoG defition

makes the different scales comparable so that computing local maxima in scale is meaningful.

slide-39
SLIDE 39

Orientation assignment

  • Create histogram of local gradient directions computed

at select space.

  • Assign canonical orientation at peak of smoothed

histrogram.

  • Each keypoint specifies stable coordinates x,

, y, , scale,

  • rie

rientation.

slide-40
SLIDE 40

Keypoint localization with orientation, after some thresholds

slide-41
SLIDE 41

Keypoint Feature Descriptor

  • Now each keypoint has a location, a scale and

an orientation.

  • Next we compute a descriptor for the local

region that is distinctive and in invaria iant wit ith rotation and scale le.

– Step 1: rotate the window to standard orientation. – Step 2: Scale the window size based on the scale at which the point was found.

slide-42
SLIDE 42

Keypoint Feature Descriptor

  • Compute gradient magnitude and orientation at each location in the

normalized region around the keypoint.

  • Weight them by a Gaussian window overlaid on the circle.
  • Create and orientation histogram over the 4 × 4 subregions of the window.
  • 4 × 4 descriptors over 16 × 16 sample array are used in practice.
  • 4 × 4 × 8 directions → vect

ector of

  • f 128 valu

lues.

slide-43
SLIDE 43

SIFT Summary

  • Detection:

– For each image scale:

  • Build Gaussian blurred image for each scale of the kernel.
  • Subtract consecutive scales to get DoG images.

– Find maximas across the DoG + image scales.

  • Descriptor:

– For each detected location:

  • Get normalized patch: rotate to canonical orientation and

normalize wrt scale.

  • Compute histogram of oriented gradient descriptor for each

patch.

slide-44
SLIDE 44

LIFT: Learned Invariant Feature Transform

  • A learned counterpart to SIFT, consisting of:

– A detector (DET above). – An orientation estimator (ORI above). – A descriptor (DESC above).

  • Stages effectively mirror SIFT, except that the

entire process is now end-to-end differentiable.

  • Is interchangeable with SIFT (and others).
slide-45
SLIDE 45

LIFT: Training

  • P1 and P2: different views of the same 3D point,

used to train the descriptor.

  • P3: different 3D point, also used to train

descriptor.

  • P4: contains no features, used to train detector.
slide-46
SLIDE 46

LIFT: Training

Descr criptor:

  • Loss: minimise the distance for corresponding matches,

maximse for non-corresponding patches.

  • Trained using points from standard geometric pipeline

(see next lectures).

slide-47
SLIDE 47

LIFT: Training

Orie rientation estim timator:

  • Loss: the generated orientation minimises the distance

between descriptors for different views of the same 3D point.

  • Trained using detections from standard geometric

pipeline (see next lectures), but LIFT descriptors.

slide-48
SLIDE 48

LIFT: Training

Detector:

  • Classifies each pixel in a patch as feature point.
  • Actual location extracted using softargmax, which

computes the center of mass given the weights from a softmax.

slide-49
SLIDE 49

LIFT: Forward

  • Given an input image, the detector produces a sc

score map ap.

  • Standard NMS (SIFT) used to select only the keypoints.
  • A patch of size 64x64 is extracted around each feature point and is

passed through the orie

  • rientatio

ion es estim imator.

  • The orientation estimator predicts a pa

patch orie

  • rientatio

ion.

  • The patch is rotated with the estimated orientation and passed

through the de descrip iptor vec ector.

slide-50
SLIDE 50

LIFT: Results

LIF LIFT: Lea Learned In Invariant Fea eature Transfor

  • rm

Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, Pascal Fua https://arxiv.org/abs/1603.09114

slide-51
SLIDE 51

Summary of Lecture 6

  • As enabling techniques, we have described convolution and

correlation.

  • Described how to smooth imagery and detect gradients.
  • Described how to edges in an image and glossed over the

Sobel and Canny edge detectors.

  • Described how to recover corners in the image and glossed
  • ver the Harris corner detector.
  • Described the SIFT and LIFT detectors + descriptors.