IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April - - PowerPoint PPT Presentation

image representation
SMART_READER_LITE
LIVE PREVIEW

IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April - - PowerPoint PPT Presentation

IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April 7, 14 IMAGE REPRESENTATION Xinyi Fan COS598c Spring2014 Monday, April 7, 14 APPROACHES Bag of Words Spatial Pyramid Matching Descriptor Encoding Monday, April 7,


slide-1
SLIDE 1

IMAGE REPRESENTATION

Xinyi Fan

COS598c Spring2014

Monday, April 7, 14

slide-2
SLIDE 2

IMAGE REPRESENTATION

Xinyi Fan

COS598c Spring2014

Monday, April 7, 14

slide-3
SLIDE 3

APPROACHES

  • Bag of Words
  • Spatial Pyramid Matching
  • Descriptor Encoding

Monday, April 7, 14

slide-4
SLIDE 4

APPROACHES

  • Bag of Words
  • Spatial Pyramid Matching
  • Descriptor Encoding

Monday, April 7, 14

slide-5
SLIDE 5

Object

Monday, April 7, 14

slide-6
SLIDE 6

Object Feature Words

Monday, April 7, 14

slide-7
SLIDE 7

Object Bag of Feature Words

Adapted from slides by Fei-Fei Li

Monday, April 7, 14

slide-8
SLIDE 8

BAG OF WORDS

  • Definition
  • Independent, orderless features

Monday, April 7, 14

slide-9
SLIDE 9

BAG OF WORDS

  • Definition
  • Independent, orderless features

Monday, April 7, 14

slide-10
SLIDE 10

BAG OF WORDS

  • Definition
  • Independent, orderless features
  • Histogram representation

Adapted from slides by Fei-Fei Li

Monday, April 7, 14

slide-11
SLIDE 11

BAG OF WORDS

  • Definition
  • Independent, orderless features
  • Histogram representation

1 3 2

Adapted from slides by Fei-Fei Li

Monday, April 7, 14

slide-12
SLIDE 12

Adapted from slides by Fei-Fei Li

Category models/ classifiers Category decision

Feature detection & representation Codewords dictionary formation Image representation

Monday, April 7, 14

slide-13
SLIDE 13

Adapted from slides by Fei-Fei Li

Category models/ classifiers Category decision

Representation

Feature detection & representation Codewords dictionary formation Image representation

Monday, April 7, 14

slide-14
SLIDE 14

Adapted from slides by Fei-Fei Li

Category models/ classifiers Category decision

Learning

Feature detection & representation Codewords dictionary formation Image representation

Monday, April 7, 14

slide-15
SLIDE 15

Adapted from slides by Fei-Fei Li

Feature detection & representation Codewords dictionary formation Image representation

Category models/ classifiers Category decision

Learning Recognition

Monday, April 7, 14

slide-16
SLIDE 16

Feature Detection & Representation

dense regular grids points of interest random

Monday, April 7, 14

slide-17
SLIDE 17

Feature Detection & Representation

Monday, April 7, 14

slide-18
SLIDE 18

Codewords Dictionary Formation

Monday, April 7, 14

slide-19
SLIDE 19

Codewords Dictionary Formation

Vector quantization

Monday, April 7, 14

slide-20
SLIDE 20

Codewords Dictionary Formation

Vector quantization

Monday, April 7, 14

slide-21
SLIDE 21

Codewords Dictionary Formation

Monday, April 7, 14

slide-22
SLIDE 22

Image Representation

feature frequency

…..

codewords

Monday, April 7, 14

slide-23
SLIDE 23

Using BoW Representation

Use BoW as feature vector for standard classifier

  • Naive Bayesian
  • SVM

Cluster BoW vectors over image collection

  • Category classification (supervised)
  • Object discovery (unsupervised)

Use BoW to build hierarchical models

  • Decompose scene/object

Monday, April 7, 14

slide-24
SLIDE 24

BoW Summary

Issues

  • Sampling strategy

dense uniform, interest points, random...

  • Codebook learning

supervised/unsupervised, size...

  • Similarity measurement

SVM, Pyramid Matching

  • Spatial information
  • Scalability

Monday, April 7, 14

slide-25
SLIDE 25

BoW Summary

Issues

  • Sampling strategy

dense uniform, interest points, random...

  • Codebook learning

supervised/unsupervised, size...

  • Similarity measurement

SVM, Pyramid Matching

  • Spatial information
  • Scalability

Monday, April 7, 14

slide-26
SLIDE 26

BoW Summary

Issues

  • Sampling strategy

dense uniform, interest points, random...

  • Codebook learning

supervised/unsupervised, size...

  • Similarity measurement

SVM, Pyramid Matching

  • Spatial information
  • Scalability

Monday, April 7, 14

slide-27
SLIDE 27

Pyramid Matching

Monday, April 7, 14

slide-28
SLIDE 28

Pyramid Matching

Monday, April 7, 14

slide-29
SLIDE 29

Pyramid Matching

  • ptimal partial

matching between sets of features

Monday, April 7, 14

slide-30
SLIDE 30

Pyramid Matching

  • ptimal partial

matching between sets of features

Y = {~ y1, . . . , ~ yn} ~ yi ∈ Rd X = {~ x1, . . . , ~ xm} ~ xi ∈ Rd

Adapted from slides by Grauman and Darrell

Monday, April 7, 14

slide-31
SLIDE 31

Feature Extraction

X = {~ x1, . . . , ~ xm} ~ xi ∈ Rd

Histogram pyramid: level has bins of size `

2`

H0(X) H1(X) H2(X) H3(X)

d = 1, L = 3

Ψ(X) = [H0(X), . . . , HL(X)]

Monday, April 7, 14

slide-32
SLIDE 32

Counting Matches

Adapted from slides by Grauman and Darrell

Histogram intersection:

I(H(X), H(Y )) =

r

X

j=1

min(H(X)j, H(Y )j)

H(X) H(Y ) I(H(X), H(Y )) = 4 X Y

Monday, April 7, 14

slide-33
SLIDE 33

Counting New Matches

Adapted from slides by Grauman and Darrell

Histogram intersection:

I(H(X), H(Y )) =

r

X

j=1

min(H(X)j, H(Y )j)

N` = I(H`(X), H`(Y )) − I(H`−1(X), H`−1(Y ))

matches at current level matches at previous level Difference in histogram intersections across levels counts number of new pairs matched

Monday, April 7, 14

slide-34
SLIDE 34

Pyramid Match Kernel

Adapted from slides by Grauman and Darrell

K∆(Ψ(X), Ψ(Y )) =

L

X

`=0

1 2` (I(H`(X), H`(Y )) − I(H`−1(X), H`−1(Y )))

histogram pyramids number of newly matched pairs at level ` measure of difficulty of a match at level `

Monday, April 7, 14

slide-35
SLIDE 35

Approximation of Optimal Partial Matching

Adapted from slides by Grauman and Darrell

100 sets with 2D points, cardinalities vary between 5 and 100 Trial number (sorted by optimal distance)

[Indyk & Thaper]

Matching output

Approximation of the optimal partial matching

Monday, April 7, 14

slide-36
SLIDE 36

Building a Classifier

Adapted from slides by Grauman and Darrell

  • Train SVM by computing kernel values between all

labeled training examples

  • Classify novel examples by computing kernel

values against support vectors

  • One-versus-all for multi-class classification

Monday, April 7, 14

slide-37
SLIDE 37

BoW Summary

Issues

  • Sampling strategy

dense uniform, interest points, random...

  • Codebook learning

supervised/unsupervised, size...

  • Similarity measurement

SVM, Pyramid Matching

  • Spatial information
  • Scalability

Monday, April 7, 14

slide-38
SLIDE 38

BoW Summary

Spatial information BoW removes spatial layout

+

  • Increases the invariance to scale/

translation/deformation Sacrifices discriminative power

Monday, April 7, 14

slide-39
SLIDE 39

APPROACHES

  • Bag of Words
  • Spatial Pyramid Matching
  • Descriptor Encoding

Monday, April 7, 14

slide-40
SLIDE 40

Spatial Pyramid Matching

Image credit: Lazebnik et al

Monday, April 7, 14

slide-41
SLIDE 41

Spatial Pyramid Matching

Monday, April 7, 14

slide-42
SLIDE 42

Spatial Pyramid Matching

Monday, April 7, 14

slide-43
SLIDE 43

Spatial Pyramid Matching

Monday, April 7, 14

slide-44
SLIDE 44

Category models/ classifiers Category decision

Feature detection & representation Codewords dictionary formation Image representation

Monday, April 7, 14

slide-45
SLIDE 45

Category models/ classifiers Category decision

Feature detection & representation Codewords dictionary formation Image representation

Spatial Pyramid Matching

Monday, April 7, 14

slide-46
SLIDE 46

Spatial Pyramid Matching

  • Quantize feature vectors into M discrete types
  • Perform pyramid matching in 2D image space for

each channel m = 1, ..., M

  • Assume features of the same type m can be

matched to each other

Monday, April 7, 14

slide-47
SLIDE 47

Spatial Pyramid Matching

M

X

m=1

κL(Xm, Ym)

Image credit: Lazebnik et al 2006

Monday, April 7, 14

slide-48
SLIDE 48

Pyramid Match Kernel

Adapted from slides by Grauman and Darrell

K∆(Ψ(X), Ψ(Y )) =

L

X

`=0

1 2` (I(H`(X), H`(Y )) − I(H`−1(X), H`−1(Y )))

histogram pyramids number of newly matched pairs at level ` measure of difficulty of a match at level `

Monday, April 7, 14

slide-49
SLIDE 49

Pyramid Match Kernel

Adapted from slides by Grauman and Darrell

κL(X, Y ) = K∆(Ψ(X), Ψ(Y )) =

L

X

`=0

1 2` (I(H`(X), H`(Y )) − I(H`−1(X), H`−1(Y )))

M

X

m=1

κL(Xm, Ym)

The final kernel is sum of the separate channel kernels:

Monday, April 7, 14

slide-50
SLIDE 50

APPROACHES

  • Bag of Words
  • Spatial Pyramid Matching
  • Descriptor Encoding
  • Linear SPM using Sparse Coding
  • Locality-constrained Linear Coding
  • Fisher Vector

Monday, April 7, 14

slide-51
SLIDE 51

Category models/ classifiers Category decision

Feature detection & representation Codewords dictionary formation Image representation

(Spatial Pyramid Matching)

Monday, April 7, 14

slide-52
SLIDE 52

Category models/ classifiers Category decision

Feature detection & representation Codewords dictionary formation Image representation

(Spatial Pyramid Matching) Other ways of encoding local feature descriptors For better performance

Monday, April 7, 14

slide-53
SLIDE 53

APPROACHES

  • Bag of Words
  • Spatial Pyramid Matching
  • Descriptor Encoding
  • Linear SPM using Sparse Coding
  • Locality-constrained Linear Coding
  • Fisher Vector

Monday, April 7, 14

slide-54
SLIDE 54

Linear SPM using Sparse Coding

Image credit: Yang et al 2009

Monday, April 7, 14

slide-55
SLIDE 55

Linear SPM using Sparse Coding

Image credit: Yang et al 2009

Monday, April 7, 14

slide-56
SLIDE 56

Encoding SIFT: From VQ to SC

min

V M

X

m=1

min

k=1,. . . ,K kxm vkk2

codebook: V = [v1, . . . , vK]>

Monday, April 7, 14

slide-57
SLIDE 57

min

V M

X

m=1

min

k=1,. . . ,K kxm vkk2

Encoding SIFT: From VQ to SC

min

V,U M

X

m=1

kxm umVk2 s.t. Card(um) = 1, |um| = 1, um ⌫ 0, 8m U = [u1, . . . , uM]>

Monday, April 7, 14

slide-58
SLIDE 58

min

V,U M

X

m=1

kxm umVk2 s.t. Card(um) = 1, |um| = 1, um ⌫ 0, 8m

min

V M

X

m=1

min

k=1,. . . ,K kxm vkk2

Encoding SIFT: From VQ to SC

min

V,U M

X

m=1

kxm umVk2 + λ|um| s.t. kvkk  1, 8k

Monday, April 7, 14

slide-59
SLIDE 59

Why L1 encourages sparsity

min kx uVk2

2 + λkuk1

min kx uVk2

2 + λkuk2

Monday, April 7, 14

slide-60
SLIDE 60

min

V,U M

X

m=1

kxm umVk2 s.t. Card(um) = 1, |um| = 1, um ⌫ 0, 8m

min

V M

X

m=1

min

k=1,. . . ,K kxm vkk2

Encoding SIFT: From VQ to SC

min

V,U M

X

m=1

kxm umVk2 + λ|um| s.t. kvkk  1, 8k

Monday, April 7, 14

slide-61
SLIDE 61

min

V,U M

X

m=1

kxm umVk2 s.t. Card(um) = 1, |um| = 1, um ⌫ 0, 8m

min

V M

X

m=1

min

k=1,. . . ,K kxm vkk2

Encoding SIFT: From VQ to SC

min

V,U M

X

m=1

kxm umVk2 + λ|um| s.t. kvkk  1, 8k

Implementation: feature-sign search algorithm [Lee et al 2006] http://ai.stanford.edu/~hllee/softwares/nips06-sparsecoding.htm

Monday, April 7, 14

slide-62
SLIDE 62

Algorithm Architecture

Image credit: Yang et al 2009

Monday, April 7, 14

slide-63
SLIDE 63

Linear SPM

U = [u1, . . . , uM]> z = F(U) define F as: zj = max{|u1j|, |u2j, . . . , |uMj||} κ(zi, zj) = z>

i zj = 2

X

l=0 2l

X

s=1 2l

X

t=1

hzl

i(s, t), zl j(s, t)i

Image credit: Yang et al 2009

Monday, April 7, 14

slide-64
SLIDE 64

APPROACHES

  • Bag of Words
  • Spatial Pyramid Matching
  • Descriptor Encoding
  • Linear SPM using Sparse Coding
  • Locality-constrained Linear Coding
  • Fisher Vector

Monday, April 7, 14

slide-65
SLIDE 65

Locality-constrained Linear Coding

Image credit: Wang et al 2010

Monday, April 7, 14

slide-66
SLIDE 66

Locality-constrained Linear Coding

min

V,U M

X

m=1

kxm umVk2 + λ|um| s.t. kvkk  1, 8k

SC

Monday, April 7, 14

slide-67
SLIDE 67

min

V,U M

X

m=1

kxm umVk2 + λ|um| s.t. kvkk  1, 8k

Locality-constrained Linear Coding

min

V,U M

X

m=1

kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m

SC LLC

Monday, April 7, 14

slide-68
SLIDE 68

min

V,U M

X

m=1

kxm umVk2 + λ|um| s.t. kvkk  1, 8k

Locality-constrained Linear Coding

min

V,U M

X

m=1

kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m

SC LLC

locality adaptor: dm = exp ✓dist(xm, V) σ ◆ where dist(xm, V) = [dist(xm, v1), . . . , dist(xm, vK)]> dist(xm, vk) is the Euclidean distance between xm and vk σ is for adjusting the decay speed

Monday, April 7, 14

slide-69
SLIDE 69

Properties of LLC

  • Better reconstruction
  • Local smooth sparsity
  • Analytical solution

V = {vk} V = {vk} V = {vk}

Image credit: Wang et al 2010

Monday, April 7, 14

slide-70
SLIDE 70

Properties of LLC

  • Better reconstruction
  • Local smooth sparsity
  • Analytical solution

V = {vk} V = {vk} V = {vk}

˜ um =

  • (V − 1x>

m)(V − 1x> m)> + λdiag(d)

  • \1

um = ˜ um/1>˜ um

Image credit: Wang et al 2010

Monday, April 7, 14

slide-71
SLIDE 71

Approximated LLC for Fast Encoding

min

V,U M

X

m=1

kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m

Monday, April 7, 14

slide-72
SLIDE 72

Approximated LLC for Fast Encoding

min

V,U M

X

m=1

kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m

Select local bases of each descriptor to form a local coordinate system

min

˜ U M

X

m=1

kxm ˜ umVmk2 s.t. 1>˜ um = 1, 8m

Monday, April 7, 14

slide-73
SLIDE 73

Approximated LLC for Fast Encoding

min

V,U M

X

m=1

kxm umVk2 + λkdm umk2 s.t. 1>um = 1, 8m

Select local bases of each descriptor to form a local coordinate system the K nearest neighbors of xm forms the local basis Vm

min

˜ U M

X

m=1

kxm ˜ umVmk2 s.t. 1>˜ um = 1, 8m

Monday, April 7, 14

slide-74
SLIDE 74

APPROACHES

  • Bag of Words
  • Spatial Pyramid Matching
  • Descriptor Encoding
  • Linear SPM using Sparse Coding
  • Locality-constrained Linear Coding
  • Fisher Vector

Monday, April 7, 14

slide-75
SLIDE 75

Fisher Kernel

X = {x1, . . . , xT }: a sample of T observations xt ∈ X uλ: the pd f λ = [λ1, . . . , λM]> ∈ RM

GX

λ = rλ log uλ(X)

score function: similarity measurement: [Jaakkola and Haussler, 1998]

KF K(X, Y ) = GX>

λ

F 1

λ GY λ

Monday, April 7, 14

slide-76
SLIDE 76

Fisher Kernel

similarity measurement:

KF K(X, Y ) = GX>

λ

F 1

λ GY λ

Fisher Information Matrix: Fλ = Ex⇠uλ(GX

λ GX> λ

) F 1

λ

= L>

λ Lλ

Monday, April 7, 14

slide-77
SLIDE 77

Fisher Kernel

similarity measurement:

KF K(X, Y ) = GX>

λ

F 1

λ GY λ

Fisher Information Matrix: Fλ = Ex⇠uλ(GX

λ GX> λ

) F 1

λ

= L>

λ Lλ

Fisher Kernel re-written as: KF K(X, Y ) = GX>

λ

GY

λ

where GX

λ = Lλrλ log uλ(X)

Fisher Vector

Monday, April 7, 14

slide-78
SLIDE 78

Fisher Vector on Images

GX

λ = T

X

t=1

Lλrλ log uλ(xt)

uλ(x) =

K

X

k=1

wkuk(x) where uk(x) = 1 (2π)D/2|Σk|1/2 exp ⇢ −1 2(x − µk)>Σ1

k (x − µk)

  • Fisher Vector:

GMM:

Monday, April 7, 14

slide-79
SLIDE 79

Fisher Vector on Images

GX

λ = T

X

t=1

Lλrλ log uλ(xt)

uλ(x) =

K

X

k=1

wkuk(x) where uk(x) = 1 (2π)D/2|Σk|1/2 exp ⇢ −1 2(x − µk)>Σ1

k (x − µk)

  • Fisher Vector:

GMM:

λ = {wk, µk, Σk, k = 1, . . . , K}

EM algorithm to estimate the parameters:

Monday, April 7, 14

slide-80
SLIDE 80

Soft Assignment

GX

λ = T

X

t=1

Lλrλ log uλ(xt) Fisher Vector: Gradients:

rαk log uλ(xt) = γt(k) wk rµk log uλ(xt) = γt(k) ✓xt µt σ2

k

◆ rσk log uλ(xt) = γt(k) (xt µk)2 σ4

k

1 σk

  • Monday, April 7, 14
slide-81
SLIDE 81

Soft Assignment

GX

λ = T

X

t=1

Lλrλ log uλ(xt) Fisher Vector: Gradients: Posterior Probability:

rαk log uλ(xt) = γt(k) wk rµk log uλ(xt) = γt(k) ✓xt µt σ2

k

◆ rσk log uλ(xt) = γt(k) (xt µk)2 σ4

k

1 σk

  • γt(k) =

wkuk(xk) PK

j=1 wjuj(xt)

Monday, April 7, 14

slide-82
SLIDE 82

Soft Assignment

GX

λ = T

X

t=1

Lλrλ log uλ(xt) Fisher Vector: Gradients:

rαk log uλ(xt) = γt(k) wk rµk log uλ(xt) = γt(k) ✓xt µt σ2

k

◆ rσk log uλ(xt) = γt(k) (xt µk)2 σ4

k

1 σk

  • wk =

exp (αk) PK

j=1 exp(αj)

Monday, April 7, 14

slide-83
SLIDE 83

Soft Assignment

GX

λ = T

X

t=1

Lλrλ log uλ(xt) Fisher Vector: Gradients:

rαk log uλ(xt) = γt(k) wk rµk log uλ(xt) = γt(k) ✓xt µt σ2

k

◆ rσk log uλ(xt) = γt(k) (xt µk)2 σ4

k

1 σk

  • BoW

Monday, April 7, 14

slide-84
SLIDE 84

Soft Assignment

GX

λ = T

X

t=1

Lλrλ log uλ(xt) Fisher Vector: Fisher Information Matrix: Fλ = Ex⇠uλ(GX

λ GX> λ

) F 1

λ

= L>

λ Lλ

Monday, April 7, 14

slide-85
SLIDE 85

Soft Assignment

GX

λ = T

X

t=1

Lλrλ log uλ(xt) Fisher Vector: Fisher Information Matrix: Fλ = Ex⇠uλ(GX

λ GX> λ

) F 1

λ

= L>

λ Lλ

Assume almost hard assignment FIM diagonal coordinate-wise normalization on gradient vectors

Monday, April 7, 14

slide-86
SLIDE 86

Fisher Vector

Assume almost hard assignment FIM diagonal coordinate-wise normalization on gradient vectors Normalized Gradients:

GX

αk =

1 √wk

T

X

t=1

(γt(k) − wk) GX

µk =

1 √wk

T

X

t=1

γt(k) ✓xt − µk σk ◆ GX

σk =

1 √wk

T

X

t=1

γt(k) 1 √ 2 (xt − µk)2 σ2

k

− 1

  • Concatenate the gradient

vectors: Dimension = (2D+1)K

Monday, April 7, 14

slide-87
SLIDE 87

Fisher Vector

Assume almost hard assignment FIM diagonal coordinate-wise normalization on gradient vectors Normalized Gradients:

GX

αk =

1 √wk

T

X

t=1

(γt(k) − wk) GX

µk =

1 √wk

T

X

t=1

γt(k) ✓xt − µk σk ◆ GX

σk =

1 √wk

T

X

t=1

γt(k) 1 √ 2 (xt − µk)2 σ2

k

− 1

  • Concatenate the gradient

vectors: Dimension = (2D+1)K

GX

λ ← 1

T GX

λ

T : patch size

Monday, April 7, 14

slide-88
SLIDE 88

γt(k) = wkuk(xk) PK

j=1 wjuj(xt)

To make it work with linear classifier

Image credit: Sanchez et al 2013

Monday, April 7, 14

slide-89
SLIDE 89

Extension on FV

  • Spatial Pyramid

[Sanchez et al 2013]

  • Deep Fisher Networks

[Simonyan et al 2013]

  • Other methods account scene geometry in FV framework

[Krapac et al, 2011, Sanchez et al, 2012]

Monday, April 7, 14

slide-90
SLIDE 90

Extension on FV

  • Spatial Pyramid

[Sanchez et al 2013]

  • Deep Fisher Networks

[Simonyan et al 2013]

  • Other methods account scene geometry in FV framework

[Krapac et al, 2011, Sanchez et al, 2012]

Monday, April 7, 14

slide-91
SLIDE 91

Extension on FV

  • Spatial Pyramid

[Sanchez et al 2013]

  • Deep Fisher Networks

[Simonyan et al 2013]

  • Other methods account scene geometry in FV framework

[Krapac et al, 2011, Sanchez et al, 2012]

Monday, April 7, 14

slide-92
SLIDE 92

Deep Fisher Networks

Image credit: Simonyan et al 2013

Monday, April 7, 14

slide-93
SLIDE 93

Single Fisher Layer

Image credit: Simonyan et al 2013

Monday, April 7, 14

slide-94
SLIDE 94

Single Fisher Layer

Image credit: Simonyan et al 2013

more details: Deep Fisher Networks for Large-Scale Image Classification http://www.robots.ox.ac.uk/~vgg/publications/2013/Simonyan13b/simonyan13b.pdf

Monday, April 7, 14

slide-95
SLIDE 95

Evaluations - on Pascal VOC 2007

Results from: Chatfield et al 2011

Monday, April 7, 14

slide-96
SLIDE 96

Evaluations - on Pascal VOC 2007

Results from: Chatfield et al 2011

Monday, April 7, 14

slide-97
SLIDE 97

Evaluations - on SUN 397

Image credit: Sanchez et al 2013

Monday, April 7, 14

slide-98
SLIDE 98

BoW Summary

Issues

  • Sampling strategy

dense uniform, interest points, random...

  • Codebook learning

supervised/unsupervised, size...

  • Similarity measurement

SVM, Pyramid Matching

  • Spatial information
  • Scalability

Monday, April 7, 14

slide-99
SLIDE 99

From Vectors to Codes Given global image representation, want to learn compact binary codes for image retrieval task on large dataset

Monday, April 7, 14

slide-100
SLIDE 100

From Vectors to Codes

Motivation

  • Tractable memory usage
  • Constant lookup time
  • Similarity preserved by hamming distance

Monday, April 7, 14

slide-101
SLIDE 101

From Vectors to Codes

Problem definition: Given a database of images and a distance function , seek a binary feature vector that preserves the nearest neighbor relationships using a Hamming distance. {xi} D(i, j) yi = f(xi)

Monday, April 7, 14

slide-102
SLIDE 102

What makes a good code

  • Easily computed for a novel input
  • Requires a small number of bits to code the full dataset
  • Maps similar items to similar binary codewords

From Vectors to Codes

Monday, April 7, 14

slide-103
SLIDE 103

From Vectors to Codes

Approaches

  • Locality Sensitive Hashing (LSH) [Andoni and Indyk]
  • Boosting [Torralba et al 2008]
  • Restricted Boltzmann Machines (RBM) [Torralba et al 2008]
  • Spectral Hashing [Weiss et al 2008]
  • Multidimensional Spectral Hashing [Weiss et al 2012]

Monday, April 7, 14

slide-104
SLIDE 104

Locality Sensitive Hashing

0" 1" 0" 1" 0" 1"

101"

No"learning"involved" Gist"descriptor"

  • Take random projections of data
  • Quantize each projection with few bits

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-105
SLIDE 105

Boosting

  • Positive examples are pairs of similar images
  • Negative examples are pairs of unrelated images

Adapted from slides by Weiss et al

0" 1" 0" 1" 0" 1"

" Learn"threshold"&" dimension"for"each" bit"(weak"classifier)"

Monday, April 7, 14

slide-106
SLIDE 106

Restricted Boltzmann machines

Hidden units Visible units Symmetric weights

Single' RBM' layer'

W"

Units'are' binary'&' stochas6c'

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-107
SLIDE 107

512$ 512$

w1$

Input$Gist$vector$(512$dimensions)$

Layer$1$

512$ 256$

w2$ Layer$2$

256$ N$

w3$ Layer$3$

Output$binary$code$(N$dimensional)$

Linear$units$$ at$first$layer$

Restricted Boltzmann machines

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-108
SLIDE 108

Address&Space&

Seman-cally&& similar&& images&

Query&address&

Seman-c&& Hash& Func-on&

Query&& Image&

Binary&& code&

Images&in&database&

Quite&different& to&a&(conven-onal)& randomizing&hash&

Retrieval Algorithm: Semantic Hashing

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-109
SLIDE 109

Spectral Hashing

Address&Space&

Seman-cally&& similar&& images&

Query&address& Non6linear& dimensionality& reduc-on&

Query&& Image&

Binary&& code&

Images&in&database&

Quite&different& to&a&(conven-onal)& randomizing&hash&

Spectral& Hash&

Real6valued& vectors&

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-110
SLIDE 110

The Algorithm

Input:'Data'{xi}'of'dimensionality'd;'desired'#'bits,'k

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-111
SLIDE 111

The Algorithm

  • Fit multi-dimensional rectangle
  • !Run!PCA!to!align!axes!

!

  • !Bound!uniform!distribu7on

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-112
SLIDE 112

The Algorithm

  • Fit multi-dimensional rectangle
  • !Run!PCA!to!align!axes!

!

  • !Bound!uniform!distribu7on

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-113
SLIDE 113

The Algorithm

  • Fit multi-dimensional rectangle
  • !Run!PCA!to!align!axes!

!

  • !Bound!uniform!distribu7on

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-114
SLIDE 114

The Algorithm

  • Calculate Eigenfunctions

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-115
SLIDE 115

The Algorithm

  • Calculate Eigenfunctions

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-116
SLIDE 116

The Algorithm

  • Calculate Eigenfunctions

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-117
SLIDE 117

The Algorithm

  • Calculate Eigenfunctions

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-118
SLIDE 118

The Algorithm

  • Calculate Eigenfunctions

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-119
SLIDE 119

The Algorithm

  • Pick the k smallest Eigenfunctions

e.g.$k=3$

Eigenvalues$

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-120
SLIDE 120

The Algorithm

  • Threshold the chosen Eigenfunctions

Adapted from slides by Weiss et al

Monday, April 7, 14

slide-121
SLIDE 121

The Algorithm

  • Threshold the chosen Eigenfunctions

more details: Spectral Hashing, Yair Weiss, Antonio Torralba, Rob Fergus http://people.csail.mit.edu/torralba/publications/spectralhashing.pdf

Monday, April 7, 14

slide-122
SLIDE 122

Summary

Image representation

  • Bag of Words
  • Spatial Pyramid Matching
  • Descriptor Encoding
  • Sparse Coding
  • Locality-constrained Linear Coding
  • Fisher Vector
  • Binary Code

Monday, April 7, 14

slide-123
SLIDE 123

Thank you

Monday, April 7, 14