Theoretical Bounds on Image Search
16-423 - Designing Computer Vision Apps
Instructor - Simon Lucey
Theoretical Bounds on Image Search Instructor - Simon Lucey 16-423 - - PowerPoint PPT Presentation
Theoretical Bounds on Image Search Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today Exhaustive Search & Sampling Motivation for Descriptors Overcoming the Curse of Dimensionality in Search 2 3 Biggest
Theoretical Bounds on Image Search
16-423 - Designing Computer Vision Apps
Instructor - Simon Lucey
Today
2
3 Biggest Problems in Computer Vision?
3
3 Biggest Problems in Computer Vision?
3
(Prof. Takeo Kanade - Robotics Institute - CMU.)
x
What is Registration?
4
“Source” “Template”
Our goal is to find the warp parameter vector !
W(x; p) x W(x; p) = warping function such that x0 = W(x; p) p = parameter vector describing warp x = coordinate in template [x, y]T x0 = corresponding coordinate in source [x0, y0]T
p
What is Registration?
4
x0
“Source” “Template”
Warp Functions
5
W(x; p) = x + p p = [p1 p2]T W(x; p) = x + p p = [p1 p2]T
translation
translation
aspect
W(x; p) = M x 1
rotation
M = 1 − p1 p2 p3 p4 1 − p5 p6
Warp Functions
6
W(x; p)
Warp Functions
6
W(x; p)
(p)
Naive Approach
“Images at various warps ”
7
p
“Template image”
(p)
Naive Approach
“Images at various warps ” “Vectors of pixel values at each warp position”
[255,134,45,.......,34,12,124,67] [123,244,12,.......,134,122,24,02] [67,13,245,.......,112,51,92,181] [65,09,67,.......,78,66,76,215]
...........
7
p
“Template image”
≥ < background Th
Naive Approach
you might suggest,
“Vectors of pixel values at each warp position”
[255,134,45,.......,34,12,124,67] [123,244,12,.......,134,122,24,02] [67,13,245,.......,112,51,92,181] [65,09,67,.......,78,66,76,215]
...........
D( )
“matching function”
8
≥ < background Th
Naive Approach
you might suggest,
“Vectors of pixel values at each warp position”
[255,134,45,.......,34,12,124,67] [123,244,12,.......,134,122,24,02] [67,13,245,.......,112,51,92,181] [65,09,67,.......,78,66,76,215]
...........
D( )
“matching function”
8
We refer to this as Exhaustive Search!!!
p = {p1, p2} p1 p2
Sampling?
“Possible Source Warps”
9
p = {p1, p2} p1 p2
Sampling?
“Possible Source Warps”
∆p2
∆p1
9
I T
W(x1; p)
x1
D(p) =
M
X
i=1
||I(W(xi; p)) − T(xi)||2
I T
Measures of Similarity
“Model” “Source Image”
10
Measures of Similarity
“Vector Form”
11
D(p) = ||I(p) − T(0)||2
I T
“Model” “Source Image”
I T z
D(p) = −I(p)T T(0)
Measures of Similarity
“Template” “Source Image”
12
“Can be done efficiently using 2D convolutions....”
Sampling
13
if you sample densely enough (at the Nyquist rate) you can perfectly reconstruct the original data.
dependent on the “centre frequency” of the salient edges.
Sampling
13
if you sample densely enough (at the Nyquist rate) you can perfectly reconstruct the original data.
dependent on the “centre frequency” of the salient edges.
Nyquist Rate:- Signal must be sampled at twice the highest frequency!!!
Under Sampling = “Aliasing”
14
Under Sampling = “Aliasing”
14
Image = Summation of Oriented Edges
linear summation of oriented edges,
15
x y...
x y... ... ... ... ...
“Useful as edges capture ONLY relative local change in intensity....”
Sensitivity to Shift
16 Warp Pixel Intensity Pixel Intensity Pixel Coordinates
D(p)
(p)
Warp(p) Pixel Coordinates
D(p)
T(0)
I(p) “image”
“model”
Sensitivity to Shift
16 Warp Pixel Intensity Pixel Intensity Pixel Coordinates
D(p)
(p)
Warp(p) Pixel Coordinates
D(p)
T(0)
I(p) “image”
“model”
Sensitivity to Shift
16 Warp Pixel Intensity Pixel Intensity Pixel Coordinates
D(p)
(p)
Warp(p) Pixel Coordinates
D(p)
∆ph
∆pl
T(0)
I(p) “image”
“model”
p = {p1, p2} p1 p2
Sensitivity to Shift
“Possible Source Warps”
17
∆pl
p = {p1, p2} p1 p2
Sensitivity to Shift
“Possible Source Warps”
18
∆ph
Sensitivity to Shift
19
sample size:-
λ ∝ ∆p
Sensitivity to Shift
19
λ
sample size:-
λ ∝ ∆p
Sensitivity to Shift
19
λ
sample size:-
λ ∝ ∆p
Beyond Translation
“Motion Field for Translation”
20
Beyond Translation
“Motion Field for Scale”
21
Beyond Translation
“Motion Field for Rotation”
22
Beyond Translation
23
Today
24
Primary Visual Cortex
25
Spatial Sensitivity
26
Spatial Sensitivity
27
Kingdom, Field, Olmos, 2007
Hierarchical Learning
28
View-tuned cells Complex Simple
Bob Crimi
Hierarchical Learning
28
View-tuned cells Complex Simple
Bob Crimi
V1
V2/V4
IT
Ventral Visual Stream
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint change.
29
“1D Patch” “Distorted 1D Patch”
Source: A. C. Berg
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint change.
29
“1D Patch” “Distorted 1D Patch”
Source: A. C. Berg
“match”
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint change.
30
“1D Patch” “Distorted 1D Patch”
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint change.
30
“1D Patch” “Distorted 1D Patch” “align”
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint change.
30
“1D Patch” “Distorted 1D Patch” “align”
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint change.
30
“1D Patch” “Distorted 1D Patch” “align” “match”
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint and/or illumination change.
31
“1D Patch” “Distorted 1D Patch”
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint and/or illumination change.
31
“1D Patch” “Distorted 1D Patch” “blur”
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint and/or illumination change.
31
“1D Patch” “Distorted 1D Patch” “blur” “match”
Handling Geometric Distortion
the effectiveness of SSD will degrade with significant viewpoint and/or illumination change.
31
“1D Patch” “Distorted 1D Patch” “blur” “match”
Option 2 is attractive, low computational cost!
x
x
x
P
Not Always Zero Always Zero
x
P Px
Not Always Zero Always Zero
x
x
φ{x} = Px
x
φ{x} = Px
x
φ{x} = Px
x
φ{x} = Px
“Similarity” “Rotation” “Translation” “Circular Shift”
“Similarity” “Rotation” “Translation” “Circular Shift”
“Similarity” “Rotation” “Translation” “Circular Shift”
φ{x} = 1 |G| X
P∈G
Px
“Similarity” “Rotation” “Translation” “Circular Shift”
φ{x} = 1 |G| X
P∈G
Px
“Similarity” “Rotation” “Translation” “Circular Shift”
φ{x} = X
P∈G
Px
“Similarity” “Rotation” “Translation” “Circular Shift”
φ{x} = X
P∈G
Px
“throws away information”
What About Blurring?
37
“Blur Kernel” “Edge Filter” “Edge Blur Filter”
∗
What About Blurring?
37
“Blur Kernel” “Edge Filter” “Edge Blur Filter”
∗
What About Blurring?
38
“Blur Kernel” “Edge Filter” “Edge Blur Filter”
∗
“Power Normalized”
What About Blurring?
38
“Blur Kernel” “Edge Filter” “Edge Blur Filter”
∗
“Power Normalized”
What About Blurring
centre frequency (not what we want).
39
“High Frequency Edge Wavelength” “Blurred Edge Wavelength”
Sparseness and Positiveness
and positive.
remedy this problem with little loss in performance.
x y...
x y... ... ... ... ...
“Rectification”
40
r
Sparseness and Positiveness
and positive.
remedy this problem with little loss in performance.
“Rectification”
41
r · r
“Non-Linearly sets Centre Frequency to Zero”
Sensitivity to Shift
42 Warp Edge Energy Pixel Coordinates
D(p)
(p)
No Blurring
Sensitivity to Shift
42 Warp Edge Energy Pixel Coordinates
D(p)
(p)
No Blurring
Sensitivity to Shift
43 Warp Rectified Edge Pixel Coordinates
D(p)
(p)
Gaussian Blur
Sensitivity to Shift
43 Warp Rectified Edge Pixel Coordinates
D(p)
(p)
Gaussian Blur
Sensitivity to Shift
44 Warp Rectified Edge Pixel Coordinates
D(p)
(p)
Histogram Blur
Sensitivity to Shift
44 Warp Rectified Edge Pixel Coordinates
D(p)
(p)
Histogram Blur
I(p) φ{I(p)}
Sparseness and Positiveness
representation is F times larger (where F is the number of filters employed).
45
φ{} = image descriptor function
Reminder - SIFT Descriptor
46
Relationship to Deep Learning
47
Relationship to Deep Learning
47
Sensitivity of VGG to Geometric Variation
48
conv1 64@ (54x54) conv2 256@ (27x27) conv3 384@ (13x13) conv4 384@ (13x13) conv5 256@ (13x13) fc-6 (4096)
image patch 3@ (224x224)
SNR (dB) conv1 conv2 conv3 conv4 conv5 fc-6 fc-7
fc-7 (4096)
Sensitivity of VGG to Geometric Variation
48
conv1 64@ (54x54) conv2 256@ (27x27) conv3 384@ (13x13) conv4 384@ (13x13) conv5 256@ (13x13) fc-6 (4096)
image patch 3@ (224x224)
SNR (dB) conv1 conv2 conv3 conv4 conv5 fc-6 fc-7
fc-7 (4096)
Sensitivity of VGG to Geometric Variation
48
conv1 64@ (54x54) conv2 256@ (27x27) conv3 384@ (13x13) conv4 384@ (13x13) conv5 256@ (13x13) fc-6 (4096)
image patch 3@ (224x224)
SNR (dB) conv1 conv2 conv3 conv4 conv5 fc-6 fc-7
fc-7 (4096)
Today
49
Exhaustive Search
p = {p1, p2} p1 p2
“Possible Source Warps”
arg min
p ||I(p) − T (0)||2 2
p = 0 p 6= 0
d = dim(p)
where:
Exhaustive Search
p = {p1, p2} p1 p2
“Possible Source Warps”
arg min
p ||I(p) − T (0)||2 2
p = 0 p 6= 0
∆p
||∆p||2 = ✏
d = dim(p)
where:
Exhaustive Search
d
O(Cd log 1/ϵ)
O(Cd
1 + C2 log 1/ϵ)
1/ϵ
Tian &
1/✏
d = dim(p)
where:
Can we do better?
O(Cd log 1/ϵ)
O(Cd
1 + C2 log 1/ϵ)
1/ϵ
Tian & Narasimhan [ICCV 2013]
Can we do better?
O(Cd log 1/ϵ)
O(Cd
1 + C2 log 1/ϵ)
1/ϵ
Tian & Narasimhan [ICCV 2013]
Data Driven Descent
Tian & Narasimhan 2012
Data Driven Descent
{T (∆pk)}K
k=1
Tian & Narasimhan 2012
Data Driven Descent
{T (∆pk)}K
k=1
T (0)
Tian & Narasimhan 2012
Data Driven Descent
Tian & Narasimhan 2012
Data Driven Descent
T (∆pi) I(p) T (0)
Tian & Narasimhan 2012
Data Driven Descent
Tian & Narasimhan 2012
Data Driven Descent
“inverse composition”
p ! p −1 ∆pi
Tian & Narasimhan 2012
Data Driven Descent
Tian & Narasimhan 2012
Data Driven Descent
Tian & Narasimhan 2012
Data Driven Descent
Tian & Narasimhan 2012
Tian & Narasimhan 2012
Tian & Narasimhan 2012
Can we do better?
O(Cd log 1/ϵ)
O(Cd
1 + C2 log 1/ϵ)
1/ϵ
Tian & Narasimhan [ICCV 2013]
Can we do better?
O(Cd log 1/ϵ)
O(Cd
1 + C2 log 1/ϵ)
1/ϵ
Tian & Narasimhan [ICCV 2013]