Simultaneous Feature Learning and Hash Coding with Deep Neural Networks
Presenter: MinKu Kang
1
and Hash Coding with Deep Neural Networks Presenter: MinKu Kang 1 - - PowerPoint PPT Presentation
Simultaneous Feature Learning and Hash Coding with Deep Neural Networks Presenter: MinKu Kang 1 Why did I choose this paper? Efficient image retrieval via binary encoding of images - efficient bitwise operations. - space-efficient
Presenter: MinKu Kang
1
2
3
Learn binary hash codes Learn Binary hashing functions
4
5
Used as ground truth
6
Analytical
7
Cost Function
8
9
10
For 48 bits hash codes
11
For 48 bits hash codes
12
as much as two orders of magnitude
13
Haomiao et al., Deep Supervised Hashing for Fast Image Retrieval, CVPR 2016
14
Loss for similar pairs Loss for dissimilar pairs
15
But they didn’t use sigmoid layer because it slows down the convergence Typically, this layer is replaced by a sigmoid activation layer for binary-like outputs A regularizer encouraging the output values in the vicinity of range (-1 ~ 1) Weaker constraints compared to [0, 1]-sigmoid layer, but shows better performance.
16
More peaked, More binary-like # values in the Output layer Output value
Results with sigmoid output layer
17
Retrieval performance (mAP) of models under different settings of parameters
18
19
Siamese Network Triplet Network x is more similar to x+ than to x- (high-order relationship) Similarity of x1 and x2 (pairwise)
less-similar
20
Hanjiang et al., Simultaneous Feature Learning and Hash Coding with Deep Neural Networks, CVPR 2015
21
Pairwise Similarity Triplet Ranking Similar Dissimilar More-similar Less-similar query
𝐽 𝐽+ 𝐽−
𝑗𝑛𝑏𝑓 𝐽 𝑗𝑡 𝒏𝒑𝒔𝒇 𝒕𝒋𝒏𝒋𝒎𝒃𝒔 𝑢𝑝 𝑗𝑛𝑏𝑓 𝐽+ 𝑢ℎ𝑏𝑜 𝑢𝑝 𝑗𝑛𝑏𝑓𝐽−
22
CNN
. . .
CNN
. . .
CNN
. . .
𝐽 𝐽+ 𝐽− 𝐺(𝐽) 𝐺(𝐽+) 𝐺(𝐽−)
Weights are shared. Weights are shared. Triplet Ranking Loss Sigmoid activation layer restricts the output values in the range [0, 1].
23
loss The term
24
CNN
. . .
CNN
. . .
CNN
. . .
𝐽 𝐽+ 𝐽− 𝐺(𝐽) 𝐺(𝐽+) 𝐺(𝐽−)
Weights are shared. Weights are shared. Triplet Ranking Loss
𝑋
𝑗𝑘
𝑋
𝑗𝑘
𝑋
𝑗𝑘
𝜖𝑋
𝑗𝑘
𝜖 𝑋
𝑗𝑘 = 𝑋 𝑗𝑘 + 𝛽
25
𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝑋
𝑗𝑘
= 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝐺 𝐽 ∗ 𝜖𝐺 𝐽 𝜖 … ∗ 𝜖 … 𝜖𝑋
𝑗𝑘
+ 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝐺 𝐽+ ∗ 𝜖𝐺 𝐽+ 𝜖 … ∗ 𝜖 … 𝜖𝑋
𝑗𝑘
+ 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝐺 𝐽− ∗ 𝜖𝐺 𝐽− 𝜖 … ∗ 𝜖 … 𝜖𝑋
𝑗𝑘
𝑗𝑘 requires
𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝑋
𝑗𝑘
= 𝒈(𝑶𝑶, 𝑶𝑶+ , 𝑶𝑶−) Analytically differentiated.
26
𝑗𝑘 requires
𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽+ , 𝐺 𝐽− 𝜖𝑋
𝑗𝑘
= 𝒈(𝑶𝑶, 𝑶𝑶+ , 𝑶𝑶−)
* Bohan et al, Fast Training of Triplet-based Deep Binary Embedding Networks, CVPR 2016 .
27
Enforces Independency property
hash codes hash codes
28
Quantization Divide-and-Encode CNN Input Image
In test time, a trained single network is used
29
30
31
32
33
Triplet Pairwise In DSH paper, they said they implemented DNNH themselves. In DSH paper, they said divide and encode structure largely degraded the retrieval mAP on CIFAR-10 Training inefficiencies in Training Triplet Network may have resulted in inferior performance
34
35
36