IMAGE REPRESENTATION
Xinyi Fan
COS598c Spring2014
Monday, April 7, 14
IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April - - PowerPoint PPT Presentation
IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April 7, 14 IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April 7, 14 APPROACHES Bag of Words Spatial Pyramid Matching Descriptor Encoding Monday, April 7,
Xinyi Fan
COS598c Spring2014
Monday, April 7, 14
Xinyi Fan
COS598c Spring2014
Monday, April 7, 14
Monday, April 7, 14
Monday, April 7, 14
Monday, April 7, 14
Monday, April 7, 14
Adapted from slides by Fei-Fei Li
Monday, April 7, 14
Monday, April 7, 14
Monday, April 7, 14
Adapted from slides by Fei-Fei Li
Monday, April 7, 14
Adapted from slides by Fei-Fei Li
Monday, April 7, 14
Adapted from slides by Fei-Fei Li
Category models/ classifiers Category decision
Feature detection & representation Codewords dictionary formation Image representation
Monday, April 7, 14
Adapted from slides by Fei-Fei Li
Category models/ classifiers Category decision
Feature detection & representation Codewords dictionary formation Image representation
Monday, April 7, 14
Adapted from slides by Fei-Fei Li
Category models/ classifiers Category decision
Feature detection & representation Codewords dictionary formation Image representation
Monday, April 7, 14
Adapted from slides by Fei-Fei Li
Feature detection & representation Codewords dictionary formation Image representation
Category models/ classifiers Category decision
Monday, April 7, 14
dense regular grids points of interest random
Monday, April 7, 14
…
Monday, April 7, 14
…
Monday, April 7, 14
…
Vector quantization
Monday, April 7, 14
…
Vector quantization
Monday, April 7, 14
Monday, April 7, 14
feature frequency
…..
codewords
Monday, April 7, 14
Use BoW as feature vector for standard classifier
Cluster BoW vectors over image collection
Use BoW to build hierarchical models
Monday, April 7, 14
Issues
dense uniform, interest points, random...
supervised/unsupervised, size...
SVM, Pyramid Matching
Monday, April 7, 14
Issues
dense uniform, interest points, random...
supervised/unsupervised, size...
SVM, Pyramid Matching
Monday, April 7, 14
Issues
dense uniform, interest points, random...
supervised/unsupervised, size...
SVM, Pyramid Matching
Monday, April 7, 14
Monday, April 7, 14
Monday, April 7, 14
matching between sets of features
Monday, April 7, 14
matching between sets of features
Y = {~ y1, . . . , ~ yn} ~ yi ∈ Rd X = {~ x1, . . . , ~ xm} ~ xi ∈ Rd
Adapted from slides by Grauman and Darrell
Monday, April 7, 14
X = {~ x1, . . . , ~ xm} ~ xi ∈ Rd
Histogram pyramid: level has bins of size `
2`
H0(X) H1(X) H2(X) H3(X)
d = 1, L = 3
Ψ(X) = [H0(X), . . . , HL(X)]
Monday, April 7, 14
Adapted from slides by Grauman and Darrell
Histogram intersection:
I(H(X), H(Y )) =
r
X
j=1
min(H(X)j, H(Y )j)
H(X) H(Y ) I(H(X), H(Y )) = 4 X Y
Monday, April 7, 14
Adapted from slides by Grauman and Darrell
Histogram intersection:
I(H(X), H(Y )) =
r
X
j=1
min(H(X)j, H(Y )j)
N` = I(H`(X), H`(Y )) − I(H`−1(X), H`−1(Y ))
matches at current level matches at previous level Difference in histogram intersections across levels counts number of new pairs matched
Monday, April 7, 14
Adapted from slides by Grauman and Darrell
K∆(Ψ(X), Ψ(Y )) =
L
X
`=0
1 2` (I(H`(X), H`(Y )) − I(H`−1(X), H`−1(Y )))
histogram pyramids number of newly matched pairs at level ` measure of difficulty of a match at level `
Monday, April 7, 14
Adapted from slides by Grauman and Darrell
100 sets with 2D points, cardinalities vary between 5 and 100 Trial number (sorted by optimal distance)
[Indyk & Thaper]
Matching output
Approximation of the optimal partial matching
Monday, April 7, 14
Adapted from slides by Grauman and Darrell
labeled training examples
values against support vectors
Monday, April 7, 14
Issues
dense uniform, interest points, random...
supervised/unsupervised, size...
SVM, Pyramid Matching
Monday, April 7, 14
Spatial information BoW removes spatial layout
translation/deformation Sacrifices discriminative power
Monday, April 7, 14
Monday, April 7, 14
Image credit: Lazebnik et al
Monday, April 7, 14
Monday, April 7, 14
Monday, April 7, 14
Monday, April 7, 14
Category models/ classifiers Category decision
Feature detection & representation Codewords dictionary formation Image representation
Monday, April 7, 14
Category models/ classifiers Category decision
Feature detection & representation Codewords dictionary formation Image representation
Monday, April 7, 14
each channel m = 1, ..., M
matched to each other
Monday, April 7, 14
M
X
m=1
κL(Xm, Ym)
Image credit: Lazebnik et al 2006
Monday, April 7, 14
Adapted from slides by Grauman and Darrell
K∆(Ψ(X), Ψ(Y )) =
L
X
`=0
1 2` (I(H`(X), H`(Y )) − I(H`−1(X), H`−1(Y )))
histogram pyramids number of newly matched pairs at level ` measure of difficulty of a match at level `
Monday, April 7, 14
Adapted from slides by Grauman and Darrell
κL(X, Y ) = K∆(Ψ(X), Ψ(Y )) =
L
X
`=0
1 2` (I(H`(X), H`(Y )) − I(H`−1(X), H`−1(Y )))
M
X
m=1
κL(Xm, Ym)
The final kernel is sum of the separate channel kernels:
Monday, April 7, 14
Monday, April 7, 14
Category models/ classifiers Category decision
Feature detection & representation Codewords dictionary formation Image representation
Monday, April 7, 14
Category models/ classifiers Category decision
Feature detection & representation Codewords dictionary formation Image representation
Monday, April 7, 14
Monday, April 7, 14
Image credit: Yang et al 2009
Monday, April 7, 14
Image credit: Yang et al 2009
Monday, April 7, 14
min
V M
X
m=1
min
k=1,. . . ,K kxm vkk2
codebook: V = [v1, . . . , vK]>
Monday, April 7, 14
min
V M
X
m=1
min
k=1,. . . ,K kxm vkk2
min
V,U M
X
m=1
kxm umVk2 s.t. Card(um) = 1, |um| = 1, um ⌫ 0, 8m U = [u1, . . . , uM]>
Monday, April 7, 14
min
V,U M
X
m=1
kxm umVk2 s.t. Card(um) = 1, |um| = 1, um ⌫ 0, 8m
min
V M
X
m=1
min
k=1,. . . ,K kxm vkk2
min
V,U M
X
m=1
kxm umVk2 + λ|um| s.t. kvkk 1, 8k
Monday, April 7, 14
min kx uVk2
2 + λkuk1
min kx uVk2
2 + λkuk2
Monday, April 7, 14
min
V,U M
X
m=1
kxm umVk2 s.t. Card(um) = 1, |um| = 1, um ⌫ 0, 8m
min
V M
X
m=1
min
k=1,. . . ,K kxm vkk2
min
V,U M
X
m=1
kxm umVk2 + λ|um| s.t. kvkk 1, 8k
Monday, April 7, 14
min
V,U M
X
m=1
kxm umVk2 s.t. Card(um) = 1, |um| = 1, um ⌫ 0, 8m
min
V M
X
m=1
min
k=1,. . . ,K kxm vkk2
min
V,U M
X
m=1
kxm umVk2 + λ|um| s.t. kvkk 1, 8k
Implementation: feature-sign search algorithm [Lee et al 2006] http://ai.stanford.edu/~hllee/softwares/nips06-sparsecoding.htm
Monday, April 7, 14
Image credit: Yang et al 2009
Monday, April 7, 14
U = [u1, . . . , uM]> z = F(U) define F as: zj = max{|u1j|, |u2j, . . . , |uMj||} κ(zi, zj) = z>
i zj = 2
X
l=0 2l
X
s=1 2l
X
t=1
hzl
i(s, t), zl j(s, t)i
Image credit: Yang et al 2009
Monday, April 7, 14
Monday, April 7, 14
Image credit: Wang et al 2010
Monday, April 7, 14
min
V,U M
X
m=1
kxm umVk2 + λ|um| s.t. kvkk 1, 8k
SC
Monday, April 7, 14
min
V,U M
X
m=1
kxm umVk2 + λ|um| s.t. kvkk 1, 8k
min
V,U M
X
m=1
kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m
SC LLC
Monday, April 7, 14
min
V,U M
X
m=1
kxm umVk2 + λ|um| s.t. kvkk 1, 8k
min
V,U M
X
m=1
kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m
SC LLC
locality adaptor: dm = exp ✓dist(xm, V) σ ◆ where dist(xm, V) = [dist(xm, v1), . . . , dist(xm, vK)]> dist(xm, vk) is the Euclidean distance between xm and vk σ is for adjusting the decay speed
Monday, April 7, 14
V = {vk} V = {vk} V = {vk}
Image credit: Wang et al 2010
Monday, April 7, 14
V = {vk} V = {vk} V = {vk}
˜ um =
m)(V − 1x> m)> + λdiag(d)
um = ˜ um/1>˜ um
Image credit: Wang et al 2010
Monday, April 7, 14
min
V,U M
X
m=1
kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m
Monday, April 7, 14
min
V,U M
X
m=1
kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m
Select local bases of each descriptor to form a local coordinate system
min
˜ U M
X
m=1
kxm ˜ umVmk2 s.t. 1>˜ um = 1, 8m
Monday, April 7, 14
min
V,U M
X
m=1
kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m
Select local bases of each descriptor to form a local coordinate system the K nearest neighbors of xm forms the local basis Vm
min
˜ U M
X
m=1
kxm ˜ umVmk2 s.t. 1>˜ um = 1, 8m
Monday, April 7, 14
Monday, April 7, 14
X = {x1, . . . , xT }: a sample of T observations xt ∈ X uλ: the pd f λ = [λ1, . . . , λM]> ∈ RM
GX
λ = rλ log uλ(X)
score function: similarity measurement: [Jaakkola and Haussler, 1998]
KF K(X, Y ) = GX>
λ
F 1
λ GY λ
Monday, April 7, 14
similarity measurement:
KF K(X, Y ) = GX>
λ
F 1
λ GY λ
Fisher Information Matrix: Fλ = Ex⇠uλ(GX
λ GX> λ
) F 1
λ
= L>
λ Lλ
Monday, April 7, 14
similarity measurement:
KF K(X, Y ) = GX>
λ
F 1
λ GY λ
Fisher Information Matrix: Fλ = Ex⇠uλ(GX
λ GX> λ
) F 1
λ
= L>
λ Lλ
Fisher Kernel re-written as: KF K(X, Y ) = GX>
λ
GY
λ
where GX
λ = Lλrλ log uλ(X)
Fisher Vector
Monday, April 7, 14
GX
λ = T
X
t=1
Lλrλ log uλ(xt)
uλ(x) =
K
X
k=1
wkuk(x) where uk(x) = 1 (2π)D/2|Σk|1/2 exp ⇢ −1 2(x − µk)>Σ1
k (x − µk)
GMM:
Monday, April 7, 14
GX
λ = T
X
t=1
Lλrλ log uλ(xt)
uλ(x) =
K
X
k=1
wkuk(x) where uk(x) = 1 (2π)D/2|Σk|1/2 exp ⇢ −1 2(x − µk)>Σ1
k (x − µk)
GMM:
λ = {wk, µk, Σk, k = 1, . . . , K}
EM algorithm to estimate the parameters:
Monday, April 7, 14
GX
λ = T
X
t=1
Lλrλ log uλ(xt) Fisher Vector: Gradients:
rαk log uλ(xt) = γt(k) wk rµk log uλ(xt) = γt(k) ✓xt µt σ2
k
◆ rσk log uλ(xt) = γt(k) (xt µk)2 σ4
k
1 σk
GX
λ = T
X
t=1
Lλrλ log uλ(xt) Fisher Vector: Gradients: Posterior Probability:
rαk log uλ(xt) = γt(k) wk rµk log uλ(xt) = γt(k) ✓xt µt σ2
k
◆ rσk log uλ(xt) = γt(k) (xt µk)2 σ4
k
1 σk
wkuk(xk) PK
j=1 wjuj(xt)
Monday, April 7, 14
GX
λ = T
X
t=1
Lλrλ log uλ(xt) Fisher Vector: Gradients:
rαk log uλ(xt) = γt(k) wk rµk log uλ(xt) = γt(k) ✓xt µt σ2
k
◆ rσk log uλ(xt) = γt(k) (xt µk)2 σ4
k
1 σk
exp (αk) PK
j=1 exp(αj)
Monday, April 7, 14
GX
λ = T
X
t=1
Lλrλ log uλ(xt) Fisher Vector: Gradients:
rαk log uλ(xt) = γt(k) wk rµk log uλ(xt) = γt(k) ✓xt µt σ2
k
◆ rσk log uλ(xt) = γt(k) (xt µk)2 σ4
k
1 σk
Monday, April 7, 14
GX
λ = T
X
t=1
Lλrλ log uλ(xt) Fisher Vector: Fisher Information Matrix: Fλ = Ex⇠uλ(GX
λ GX> λ
) F 1
λ
= L>
λ Lλ
Monday, April 7, 14
GX
λ = T
X
t=1
Lλrλ log uλ(xt) Fisher Vector: Fisher Information Matrix: Fλ = Ex⇠uλ(GX
λ GX> λ
) F 1
λ
= L>
λ Lλ
Assume almost hard assignment FIM diagonal coordinate-wise normalization on gradient vectors
Monday, April 7, 14
Assume almost hard assignment FIM diagonal coordinate-wise normalization on gradient vectors Normalized Gradients:
GX
αk =
1 √wk
T
X
t=1
(γt(k) − wk) GX
µk =
1 √wk
T
X
t=1
γt(k) ✓xt − µk σk ◆ GX
σk =
1 √wk
T
X
t=1
γt(k) 1 √ 2 (xt − µk)2 σ2
k
− 1
vectors: Dimension = (2D+1)K
Monday, April 7, 14
Assume almost hard assignment FIM diagonal coordinate-wise normalization on gradient vectors Normalized Gradients:
GX
αk =
1 √wk
T
X
t=1
(γt(k) − wk) GX
µk =
1 √wk
T
X
t=1
γt(k) ✓xt − µk σk ◆ GX
σk =
1 √wk
T
X
t=1
γt(k) 1 √ 2 (xt − µk)2 σ2
k
− 1
vectors: Dimension = (2D+1)K
GX
λ ← 1
T GX
λ
T : patch size
Monday, April 7, 14
γt(k) = wkuk(xk) PK
j=1 wjuj(xt)
To make it work with linear classifier
Image credit: Sanchez et al 2013
Monday, April 7, 14
[Sanchez et al 2013]
[Simonyan et al 2013]
[Krapac et al, 2011, Sanchez et al, 2012]
Monday, April 7, 14
[Sanchez et al 2013]
[Simonyan et al 2013]
[Krapac et al, 2011, Sanchez et al, 2012]
Monday, April 7, 14
[Sanchez et al 2013]
[Simonyan et al 2013]
[Krapac et al, 2011, Sanchez et al, 2012]
Monday, April 7, 14
Image credit: Simonyan et al 2013
Monday, April 7, 14
Image credit: Simonyan et al 2013
Monday, April 7, 14
Image credit: Simonyan et al 2013
more details: Deep Fisher Networks for Large-Scale Image Classification http://www.robots.ox.ac.uk/~vgg/publications/2013/Simonyan13b/simonyan13b.pdf
Monday, April 7, 14
Results from: Chatfield et al 2011
Monday, April 7, 14
Results from: Chatfield et al 2011
Monday, April 7, 14
Image credit: Sanchez et al 2013
Monday, April 7, 14
Issues
dense uniform, interest points, random...
supervised/unsupervised, size...
SVM, Pyramid Matching
Monday, April 7, 14
Monday, April 7, 14
Motivation
Monday, April 7, 14
Problem definition: Given a database of images and a distance function , seek a binary feature vector that preserves the nearest neighbor relationships using a Hamming distance. {xi} D(i, j) yi = f(xi)
Monday, April 7, 14
What makes a good code
Monday, April 7, 14
Approaches
Monday, April 7, 14
0" 1" 0" 1" 0" 1"
101"
No"learning"involved" Gist"descriptor"
Adapted from slides by Weiss et al
Monday, April 7, 14
Adapted from slides by Weiss et al
0" 1" 0" 1" 0" 1"
" Learn"threshold"&" dimension"for"each" bit"(weak"classifier)"
Monday, April 7, 14
Hidden units Visible units Symmetric weights
Single' RBM' layer'
W"
Units'are' binary'&' stochas6c'
Adapted from slides by Weiss et al
Monday, April 7, 14
512$ 512$
w1$
Input$Gist$vector$(512$dimensions)$
Layer$1$
512$ 256$
w2$ Layer$2$
256$ N$
w3$ Layer$3$
Output$binary$code$(N$dimensional)$
Linear$units$$ at$first$layer$
Adapted from slides by Weiss et al
Monday, April 7, 14
Address&Space&
Seman-cally&& similar&& images&
Query&address&
Seman-c&& Hash& Func-on&
Query&& Image&
Binary&& code&
Images&in&database&
Quite&different& to&a&(conven-onal)& randomizing&hash&
Adapted from slides by Weiss et al
Monday, April 7, 14
Address&Space&
Seman-cally&& similar&& images&
Query&address& Non6linear& dimensionality& reduc-on&
Query&& Image&
Binary&& code&
Images&in&database&
Quite&different& to&a&(conven-onal)& randomizing&hash&
Spectral& Hash&
Real6valued& vectors&
Adapted from slides by Weiss et al
Monday, April 7, 14
Input:'Data'{xi}'of'dimensionality'd;'desired'#'bits,'k
Adapted from slides by Weiss et al
Monday, April 7, 14
!
Adapted from slides by Weiss et al
Monday, April 7, 14
!
Adapted from slides by Weiss et al
Monday, April 7, 14
!
Adapted from slides by Weiss et al
Monday, April 7, 14
Adapted from slides by Weiss et al
Monday, April 7, 14
Adapted from slides by Weiss et al
Monday, April 7, 14
Adapted from slides by Weiss et al
Monday, April 7, 14
Adapted from slides by Weiss et al
Monday, April 7, 14
Adapted from slides by Weiss et al
Monday, April 7, 14
e.g.$k=3$
Eigenvalues$
Adapted from slides by Weiss et al
Monday, April 7, 14
Adapted from slides by Weiss et al
Monday, April 7, 14
more details: Spectral Hashing, Yair Weiss, Antonio Torralba, Rob Fergus http://people.csail.mit.edu/torralba/publications/spectralhashing.pdf
Monday, April 7, 14
Image representation
Monday, April 7, 14
Monday, April 7, 14