and Hash Coding with Deep Neural Networks Presenter: MinKu Kang 1 - - PowerPoint PPT Presentation

and hash coding
SMART_READER_LITE
LIVE PREVIEW

and Hash Coding with Deep Neural Networks Presenter: MinKu Kang 1 - - PowerPoint PPT Presentation

Simultaneous Feature Learning and Hash Coding with Deep Neural Networks Presenter: MinKu Kang 1 Why did I choose this paper? Efficient image retrieval via binary encoding of images - efficient bitwise operations. - space-efficient


slide-1
SLIDE 1

Simultaneous Feature Learning and Hash Coding with Deep Neural Networks

Presenter: MinKu Kang

1

slide-2
SLIDE 2

Why did I choose this paper?

  • Efficient image retrieval via binary encoding of images
  • efficient bitwise operations.
  • space-efficient storage.
  • There are many useful techniques which can also readily be used in
  • ther research fields.
  • An advanced Neural Net (shared network) structure is used from

which I can learn a lot.

2

slide-3
SLIDE 3

Background - Similarity-Preserving Hashing

3

slide-4
SLIDE 4

Related Work

Two-Stage Framework

Learn binary hash codes Learn Binary hashing functions

4

slide-5
SLIDE 5

Decomposing Similarity Matrix

5

slide-6
SLIDE 6

Learning Hash Functions

Used as ground truth

6

slide-7
SLIDE 7

Two Stage Framework

Analytical

7

slide-8
SLIDE 8

Optimization

Cost Function

8

slide-9
SLIDE 9

Augmenting Output Layer with Binary Class Labels

9

slide-10
SLIDE 10

Dataset for Experiments

10

slide-11
SLIDE 11

Results on CIFAR-10

For 48 bits hash codes

11

slide-12
SLIDE 12

For 48 bits hash codes

Results on NUS-WIDE

12

as much as two orders of magnitude

slide-13
SLIDE 13

13

Related Work – Metric Learning Based Hashing

Haomiao et al., Deep Supervised Hashing for Fast Image Retrieval, CVPR 2016

slide-14
SLIDE 14

14

Similarity-Preserving Loss Function

Loss for similar pairs Loss for dissimilar pairs

slide-15
SLIDE 15

15

Relaxation of the Loss Function

But they didn’t use sigmoid layer because it slows down the convergence Typically, this layer is replaced by a sigmoid activation layer for binary-like outputs A regularizer encouraging the output values in the vicinity of range (-1 ~ 1) Weaker constraints compared to [0, 1]-sigmoid layer, but shows better performance.

slide-16
SLIDE 16

16

More peaked, More binary-like # values in the Output layer Output value

Effect of Regularizer

Results with sigmoid output layer

slide-17
SLIDE 17

17

Effect of Regularizer

Retrieval performance (mAP) of models under different settings of parameters

slide-18
SLIDE 18

Results on CIFAR-10

18

slide-19
SLIDE 19

19

Background - Metric Learning

Siamese Network Triplet Network x is more similar to x+ than to x- (high-order relationship) Similarity of x1 and x2 (pairwise)

slide-20
SLIDE 20

Triplet Loss based Network

less-similar

20

Hanjiang et al., Simultaneous Feature Learning and Hash Coding with Deep Neural Networks, CVPR 2015

slide-21
SLIDE 21

Pairwise versus Triplet Ranking

21

Pairwise Similarity Triplet Ranking Similar Dissimilar More-similar Less-similar query

𝐽 𝐽+ 𝐽−

𝑗𝑛𝑏𝑕𝑓 𝐽 𝑗𝑡 𝒏𝒑𝒔𝒇 𝒕𝒋𝒏𝒋𝒎𝒃𝒔 𝑢𝑝 𝑗𝑛𝑏𝑕𝑓 𝐽+ 𝑢ℎ𝑏𝑜 𝑢𝑝 𝑗𝑛𝑏𝑕𝑓𝐽−

slide-22
SLIDE 22

22

Training Architecture

CNN

. . .

CNN

. . .

CNN

. . .

𝐽 𝐽+ 𝐽− 𝐺(𝐽) 𝐺(𝐽+) 𝐺(𝐽−)

Weights are shared. Weights are shared. Triplet Ranking Loss Sigmoid activation layer restricts the output values in the range [0, 1].

slide-23
SLIDE 23

Triplet Ranking Loss

23

  • 1

loss The term

slide-24
SLIDE 24

24

CNN

. . .

CNN

. . .

CNN

. . .

𝐽 𝐽+ 𝐽− 𝐺(𝐽) 𝐺(𝐽+) 𝐺(𝐽−)

Weights are shared. Weights are shared. Triplet Ranking Loss

Training Architecture

𝑋

𝑗𝑘

𝑋

𝑗𝑘

𝑋

𝑗𝑘

𝜖𝑋

𝑗𝑘

𝜖 𝑋

𝑗𝑘 = 𝑋 𝑗𝑘 + 𝛽

slide-25
SLIDE 25

Weight Update

25

𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝑋

𝑗𝑘

= 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝐺 𝐽 ∗ 𝜖𝐺 𝐽 𝜖 … ∗ 𝜖 … 𝜖𝑋

𝑗𝑘

+ 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝐺 𝐽+ ∗ 𝜖𝐺 𝐽+ 𝜖 … ∗ 𝜖 … 𝜖𝑋

𝑗𝑘

+ 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝐺 𝐽− ∗ 𝜖𝐺 𝐽− 𝜖 … ∗ 𝜖 … 𝜖𝑋

𝑗𝑘

Updating 𝑋

𝑗𝑘 requires

values from the three networks !

𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝑋

𝑗𝑘

= 𝒈(𝑶𝑶, 𝑶𝑶+ , 𝑶𝑶−) Analytically differentiated.

slide-26
SLIDE 26

26

Weight Update

Updating 𝑋

𝑗𝑘 requires

values from the three networks !

𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝑋

𝑗𝑘

= 𝒈(𝑶𝑶, 𝑶𝑶+ , 𝑶𝑶−)

  • We need three forward-propagations for each training triplet.
  • We need to maintain the three weight-shared copies of the network in the memory
  • Weight update is computationally expensive compared to a typical single network structure.
  • Possible combinations of triplets given a training set are many.
  • There is a dedicated paper (*) – training time improvement about two-orders of magnitude.

* Bohan et al, Fast Training of Triplet-based Deep Binary Embedding Networks, CVPR 2016 .

slide-27
SLIDE 27

Divide-and-Encode Module

27

Enforces Independency property

  • Each hash bit is generated from a separated slice of features
  • the output hash codes may be less redundant to each other.
  • No Mathematical Proof.

hash codes hash codes

slide-28
SLIDE 28

28

Overall Structure

Quantization Divide-and-Encode CNN Input Image

In test time, a trained single network is used

slide-29
SLIDE 29

29

Results on SVNH

slide-30
SLIDE 30

Results on CIFAR10

30

slide-31
SLIDE 31

31

Results on NUS-WIDE

slide-32
SLIDE 32

Divide-and-Encode versus Fully-Connected-Encode

32

slide-33
SLIDE 33

DSH(pairwise) verse DNNH(triplet)

33

Triplet Pairwise In DSH paper, they said they implemented DNNH themselves. In DSH paper, they said divide and encode structure largely degraded the retrieval mAP on CIFAR-10 Training inefficiencies in Training Triplet Network may have resulted in inferior performance

slide-34
SLIDE 34

Conclusion

  • While Triplet Network can learn higher-order relationship between

training samples, there are training inefficiencies

  • Practically, pairwise metric learning based method shows better

performance

  • Efficient sampling strategies for triplets are needed.
  • Solving training inefficiencies in Triplet Network could be a key for

better results

  • End-to-End architecture is preferred.

34

slide-35
SLIDE 35

References & Acknowledgement

  • RongkaiXiae et al., Supervised Hashing for Image Retrieval via Image

Representation Learning

  • Haomiao et al., Deep Supervised Hashing for Fast Image Retrieval
  • Hanjiang et al., Simultaneous Feature Learning and Hash Coding with

Deep Neural Networks

35

slide-36
SLIDE 36

Quiz

  • 1. What is the advantage of Triplet Network over Pairwise

Network? a) fast training speed b) low complexity of the architecture c) capturing high-order relationships between training samples

  • 2. Why did the authors design the Divide and Encode Module?

a) to enhance the training speed b) to enforce the independency property between hash functions c) to lower down the complexity of the problem

36