Sparse similarity-preserving hashing Jonathan Masci , Alex M. - - PowerPoint PPT Presentation

sparse similarity preserving hashing
SMART_READER_LITE
LIVE PREVIEW

Sparse similarity-preserving hashing Jonathan Masci , Alex M. - - PowerPoint PPT Presentation

Sparse similarity-preserving hashing Jonathan Masci , Alex M. Bronstein, Michael M. Bronstein, Pablo Sprechmann, Guillermo Sapiro The Swiss AI Lab. IDSIA University of Lugano, Switzerland Tel Aviv University Duke University ICLR 2014 1 / 30


slide-1
SLIDE 1

Sparse similarity-preserving hashing

Jonathan Masci, Alex M. Bronstein, Michael M. Bronstein, Pablo Sprechmann, Guillermo Sapiro

The Swiss AI Lab. IDSIA University of Lugano, Switzerland Tel Aviv University Duke University ICLR 2014

1 / 30

slide-2
SLIDE 2

Visual world in numbers

in the world

Source: Mashed, Tech Radar, YouTube, 2012; Google 2010; Instagram 2013.

2 / 30

slide-3
SLIDE 3

Similarity-sensitive hashing

sX=1 sX=0

Features space X ⊆ Rn

3 / 30

slide-4
SLIDE 4

Similarity-sensitive hashing

h

1101 1100 1001 1011 0000 1111

r=1 sX=1 sX=0

Features space X ⊆ Rn Hamming space Hm = {0, 1}m h can be any parametric function, e.g., h(x) = sign(Ax + b)

4 / 30

slide-5
SLIDE 5

Similarity-sensitive hashing

h

1101 1100 1001 1011 0000 1111

r=1 sX=1 sX=0

Features space X ⊆ Rn Hamming space Hm = {0, 1}m h can be any parametric function, e.g., h(x) = sign(Ax + b) Hamming ball r: all items with dH ≤ r

5 / 30

slide-6
SLIDE 6

Similarity-sensitive hashing

h

1101 1100 1001 1011 0000 1111

r=1 sX=1 sX=0

Features space X ⊆ Rn Hamming space Hm = {0, 1}m h can be any parametric function, e.g., h(x) = sign(Ax + b) Hamming ball r: all items with dH ≤ r r = 0 most efficient search (LUT)

6 / 30

slide-7
SLIDE 7

Motivation

Typical hash behavior for different length m and Hamming radius r

50 100 150 200 250 300 −4 −3 −2 −1 1

Code length m Speed (sec / query)

Radius = 4 Brute force Radius = 1 Radius = 2 Radius = 3

50 100 150 200 250 300 0.5 1 1.5 2

Code length m mean # neighbors per query

Radius = 1 Radius = 2 Radius = 3 Radius = 4 10 10 10 10 10 10 10 10 10 10 10

Search complexity for Hamming radius 0 (collisions): O(m)

Plot: Grauman, Fergus 2013

7 / 30

slide-8
SLIDE 8

Motivation

Typical hash behavior for different length m and Hamming radius r

50 100 150 200 250 300 −4 −3 −2 −1 1

Code length m Speed (sec / query)

Radius = 4 Brute force Radius = 1 Radius = 2 Radius = 3

50 100 150 200 250 300 0.5 1 1.5 2

Code length m mean # neighbors per query

Radius = 1 Radius = 2 Radius = 3 Radius = 4 10 10 10 10 10 10 10 10 10 10 10

Search complexity for Hamming radius 0 (collisions): O(m) Search complexity for Hamming radius r: O( m

r

  • )

Plot: Grauman, Fergus 2013

8 / 30

slide-9
SLIDE 9

SparseHash

L = s(x, x′)h(x) − h(x′)1 + λ(1 − s(x, x′)) max{0, M − h(x) − h(x′)1} + α(h(x)1 + h(x′)1) Loss function promoting sparsity of the codes

9 / 30

slide-10
SLIDE 10

SparseHash

L = s(x, x′)h(x) − h(x′)1 + λ(1 − s(x, x′)) max{0, M − h(x) − h(x′)1} + α(h(x)1 + h(x′)1) Loss function promoting sparsity of the codes Siamese Architecture

s=1 s=0

h h

L

Features space X ⊆ Rn

10 / 30

slide-11
SLIDE 11

SparseHash

L = s(x, x′)h(x) − h(x′)1 + λ(1 − s(x, x′)) max{0, M − h(x) − h(x′)1} + α(h(x)1 + h(x′)1) Loss function promoting sparsity of the codes Siamese Architecture

Features space (Rn, L2) s=1 s=0

h h

L

W x S bin zin bout zout bin zin bout · · · · · · τ

= σ (bin) − bout= bin + S z out

τ

z out zin ( ) = σ (bin) − bout= bin + S z out

τ

z out zin ( ) ξ(x) στ

  • τ

tanh ISTA Network Binarizer

Hashing: Masci, Migliore, Bronstein, Schmidhuber 2011; Siamese: Bromley et al. 1993; Hadsell et al. 2006; ISTA net: Gregor et al. 2010, Sprechmann et al. 2012 11 / 30

slide-12
SLIDE 12

Effect of sparsity on the codes

Table : Total number of unique codes for the entire CIFAR10 dataset and average number of retrieved results for various Hamming radii search. Hashes of length 48.

  • Avg. # of r-neighbors

Method Unique codes r = 0 r = 1 r = 2 KSH 57368 3.95 12.38 27.21 AGH2 55863 1.42 2.33 4.62 SSH 59733 1.01 1.12 1.88 DH 59999 1.00 1.00 1.00 NN 54259 4.83 20.12 56.70 Sparse 9828 798.47 2034.73 3249.86

Methods: Shakhnarovich 2005; Liu et al. 2011; Liu et al. 2012; Masci, Bronstein2, Schmidhuber 2012; Data: Torralba et al. 2008, Krizhevsky 2009

12 / 30

slide-13
SLIDE 13

Precision-Recall on CIFAR10

10

−3

10

−2

10

−1

10 10

−4

10

−3

10

−2

10

−1

10

Recall Precision

CIFAR10 SparseHash KSH DiffHash AGH2 SSH NNhash

r = m = 48 (full length)

13 / 30

slide-14
SLIDE 14

Precision-Recall on CIFAR10

10

−3

10

−2

10

−1

10 10

−4

10

−3

10

−2

10

−1

10

Recall Precision

CIFAR10 SparseHash KSH DiffHash AGH2 SSH NNhash

r = 2

14 / 30

slide-15
SLIDE 15

Precision-Recall on CIFAR10

10

−3

10

−2

10

−1

10 10

−4

10

−3

10

−2

10

−1

10

Recall Precision

CIFAR10 SparseHash KSH DiffHash AGH2 SSH NNhash

r = 0

15 / 30

slide-16
SLIDE 16

Recall vs r on CIFAR10

SparseHash KSH DiffHash AGH2 SSH NNhash CIFAR10

Hamming Radius Recall

10 20 30 40 10

−8

10

−6

10

−4

10

−2

10

m = 48

16 / 30

slide-17
SLIDE 17

Recall vs r on CIFAR10

SparseHash KSH DiffHash AGH2 SSH NNhash CIFAR10

Hamming Radius Recall

10 20 30 40 10

−8

10

−6

10

−4

10

−2

10

m = 128

17 / 30

slide-18
SLIDE 18

Time vs Precision/Recall

10

−6

10

−4

10

−2

10 10

−3

10

−2

10

−1

10 10

−1

10 10

1

Recall Precision Time

SparseHash KSH AGH2 SSH NNhash r = 0 1 2

18 / 30

slide-19
SLIDE 19

Retrieval examples

CIFAR10 Top-10 nearest neighbors

19 / 30

slide-20
SLIDE 20

Multimodal data

20 / 30

slide-21
SLIDE 21

Multimodal data

Create a mutually comparable representation

21 / 30

slide-22
SLIDE 22

Multimodal SparseHash

h

1101 1001 1111

g

notredame paris NYC coliseum rome france italy NYC rome coliseum notredame paris

sX=1 sX=0 sY=1 sY=0 sXY=0 sXY=1

Image Modality X ⊆ Rn Hamming space Hm = {0, 1}m Text Modality Y ⊆ Rn′

22 / 30

slide-23
SLIDE 23

Multimodal SparseHash

h

1101 1001 1111

g

notredame paris NYC coliseum rome france italy NYC rome coliseum notredame paris

sX=1 sX=0 sY=1 sY=0 sXY=0 sXY=1

Image Modality X ⊆ Rn Hamming space Hm = {0, 1}m Text Modality Y ⊆ Rn′ Intra- and inter- modal binary similarities

23 / 30

slide-24
SLIDE 24

Multimodal SparseHash

h

1101 1001 1111

g

notredame paris NYC coliseum rome france italy NYC rome coliseum notredame paris

sX=1 sX=0 sY=1 sY=0 sXY=0 sXY=1

Image Modality X ⊆ Rn Hamming space Hm = {0, 1}m Text Modality Y ⊆ Rn′ Intra- and inter- modal binary similarities Two coupled siamese ISTA networks

24 / 30

slide-25
SLIDE 25

Multimodal SparseHash

h

1101 1001 1111

g

notredame paris NYC coliseum rome france italy NYC rome coliseum notredame paris

sX=1 sX=0 sY=1 sY=0 sXY=0 sXY=1

Image Modality X ⊆ Rn Hamming space Hm = {0, 1}m Text Modality Y ⊆ Rn′ Intra- and inter- modal binary similarities Two coupled siamese ISTA networks Embeddings h and g jointly learned

25 / 30

slide-26
SLIDE 26

Text-based image retrieval on NUS dataset

CM-SSH SparseHash people portrait art

Data: Chua et al. 2009

26 / 30

slide-27
SLIDE 27

Text-based image retrieval on NUS dataset

CM-SSH SparseHash flower art

Data: Chua et al. 2009

27 / 30

slide-28
SLIDE 28

Image annotation on NUS dataset

nature, sky, water, landscape, sunset, light, white, trees, color, reflection, black, animal, tree, sun, orange, winter, snow, beautiful, river, wildlife, photography, lake, bird, dark, forest, birds, ice, reflections, wood, flying, evening,

  • utdoors, photographer, dusk

nature, sky, water, clouds, green, explore, sunset, people, sea, art, beach, ocean, asia, sand, rocks, airplane, aircraft, boats, flying, plane, rural, waves, flight, aviation, breathtaking, bush, thailand, vivid, twilight, glow, cliff, landscapes, airplanes sunset, tree, orange, old, abandoned, car, autumn, road, forest, fall, truck, rust, colourful, woods, antique, vehicle, halloween clouds, sunset, sea, beach, sun, ocean, summer, sand, rocks, evening, holiday, peace, happy, dunes Query CM-SSH: MM-SparseHash: england, italy, island, ship, italia, hawaii, interesting, cow, islands, elephants, maui nature, sky, blue, water, clouds, red, sea, yellow, beach, california, winter, ocean, building, old, sand, sunrise, spain, cloud, wall, coast, sepia, stone, eaves, mist, perspective, fence, school, fly, oregon, jump, monument, perfect, surf, alley 28 / 30

slide-29
SLIDE 29

Conclusions

Sparsity improves recall without compromising precision by restricting the number of degrees of freedom for the codes High recall at small radii allows fast retrieval using LUTs SparseHash scales well to large databases Coupling several SparseHash nets multimodal embeddings can be learned

29 / 30

slide-30
SLIDE 30

Thank you!

30 / 30