Discrete Hashing Fast, scalable retrieval and classification Fumin - - PowerPoint PPT Presentation

discrete hashing
SMART_READER_LITE
LIVE PREVIEW

Discrete Hashing Fast, scalable retrieval and classification Fumin - - PowerPoint PPT Presentation

Discrete Hashing Fast, scalable retrieval and classification Fumin Shen Center for Future Media, University of Electronic Science and Technology of China Outline Introduction to Hashing Discrete optimization for Hashing Applications


slide-1
SLIDE 1

Discrete Hashing

Fast, scalable retrieval and classification

Fumin Shen

Center for Future Media, University of Electronic Science and Technology of China

slide-2
SLIDE 2

Outline

  • Introduction to Hashing
  • Discrete optimization for Hashing
  • Applications of Discrete Hashing
  • Classification by Hamming Retrieval
slide-3
SLIDE 3

Background: Hashing

Extremely Extremely fast! ast!

Hamming distance

slide-4
SLIDE 4
  • Locality-Sensitive Hashing (LSH):

[Gionis, Indyk, and Motwani 1999], [Datar et al. 2004], etc.

Data Vector hash function for a bit random

101 101

Query

1 1 1

Background: Hashing

Recent study: learn to hash

Learned from data

slide-5
SLIDE 5

Background: Hashing

The main application: Approximate Nearest Neighbor Search (ANNS)

visually relevant query

Image Database

slide-6
SLIDE 6

LSH

  • LSH has a lot of variants suitable for

( ), Cosine similarity, Gaussian, kernels (KLSH), etc.

  • Good: Sublinear search time for < 1.
  • Bad: Long hash bits (~) and hundreds of hash tables (big

memory).

slide-7
SLIDE 7

Learning based Hashing

slide-8
SLIDE 8

Unsupervised Hashing

Learning binary codes preserving data similarities

  • PCAH: generate W by principal component analysis (PCA)
  • SH: (Weiss et al., 2008) introduce unsupervised graph hashing
  • ITQ: (Gong and Lazebnik, 2011) orthogonal rotation matrix to

refine the initial projection matrix by PCA

  • AGH: (Liu et al., 2011) solve SH by anchor graphs
  • IMH: (Shen et al., 2013) generate binary codes form general

data manifolds

  • DGH: (Liu et al., 2014) Solve SH by discrete optimization
  • AIBC: (Shen et al., 2015) asymmetric hashing
slide-9
SLIDE 9

Supervised Hashing

Learning binary codes supervised by piecewise or pairwise/ranking labels

  • SSH: (Wang et al., 2010) exploits both labeled and

unlabeled data for hashing

  • MLH: (Norouzi and Fleet, 2011) based on structural SVM
  • KSH: (Liu et al., 2012) kernel based supervised hashing
  • FastH: (Lin et al., 2014) solve hashing by Graph cuts
  • SDH: (Shen et al., 2015) generate binary codes by discrete
  • ptimization
  • COSDISH: (Kang, et al., 2016) column sampling based

discrete supervised hashing

  • DSeRH (Liu et al., 2017) deep ranking hashing
slide-10
SLIDE 10

Deep learning based Hashing

  • Lots of supervised methods
  • DAPH (Shen et al., MM’17)
  • DSeRH (Liu et al., CVPR’17)
  • DPSH (Li et al., IJCAI’16)
  • VDSH (Zhang et al., CVPR’16)
  • DSH (Liu et al., CVPR’16)
  • CNNH (Xia et al., AAAI’15)
  • Very few unsupervised ones
  • DH (Liong et al., CVPR’15)
  • Deepbit (Lin et al., CVPR’16)
  • UH-BDNN (Do et al., ECCV’16)
slide-11
SLIDE 11

Deep vs. Shallow

Deep learning boost supervised hashing Long way for unsupervised deep hashing

Method ITQ IMH CNN+ITQ DH UH-BDNN MAP 17.76 18.38 0.255 16.62 18.35

slide-12
SLIDE 12

Manifold learning vs. Hashing

Optimal hash codes

Spectral Hashing

  • Very similar formulation
  • Key difference: discrete constraint
slide-13
SLIDE 13

The hashing problem

  • Mixed Integer Program; Normally NP hard
  • Difficult to optimize due to the discrete variables
slide-14
SLIDE 14

Solution in literature

  • Step 1: Relaxation -- discard the discrete constraints
  • Mimic sign function by continuous Sigmoid
  • Hard to achieve good (local) optima
  • Step 2: Rounding – thresholding after learning
  • Quantization techniques: ITQ (Gong and Lazebnik,

2011)

  • Increasing quantization distortion with long hash codes
slide-15
SLIDE 15

Our solution

slide-16
SLIDE 16

(I) Supervised Discrete Hashing

  • F. Shen, C. Shen, W. Liu, H. T. Shen, “Supervised Discrete Hashing”, CVPR’15.

Formulation: Joint learning of binary codes , feature representation and the linear classifier

is the ground truth label matrix

Algorithm: Alternating minimization until convergence

  • solve the W-subproblem (multi-class classification);
  • solve the F-subproblem (feature learning);
  • solve the B-subproblem (hash learning) – the key problem
slide-17
SLIDE 17

(I) Supervised Discrete Hashing

Algorithm: Discrete Cyclic Coordinate descent (DCC)

learn bit-by-bit

Optimal, closed-form solution in each iteration!

The key binary code optimization problem

slide-18
SLIDE 18

(I) Supervised Discrete Hashing

Discrete optimization vs. Relaxed

  • ptimization CIFAR-10 dataset

Results

Discrete Optimization is Important for Hashing!

slide-19
SLIDE 19

(I) Supervised Discrete Hashing

  • SDH supports other losses such as hinge loss. Then the B-

subproblem still has a closed-form update, while the W- subproblem is the multi-class SVM.

  • SDH scales linearly with the number of labeled examples,

so it can incorporate massive labeled data into training.

slide-20
SLIDE 20

Binary optimization

How to solve the general binary code learning problem?

  • Design a new algorithm for every different loss?
  • The loss can be too complex to design feasible discrete
  • ptimization algorithm.
slide-21
SLIDE 21

(II) Discrete Proximal Linearized Minimization

Motivation: Minimize an equivalent smooth + non- smooth loss Algorithm: Discrete Proximal Linearized Minimization Each iteration: closed-form, optimal solution!

  • F. Shen, X. Zhou, Y. Yang, J. Song, H. T. Shen and D. Tao, ʺA Fast Optimization Method for General Binary

Code Learningʺ, IEEE Transactions on Image Processing (TIP), 2016.

slide-22
SLIDE 22
  • Theoretical

Guaranteed to converge!

  • Practical:
  • Very Fast, even faster than DCC in SDH
  • Successfully applied to supervised and unsupervised

Hashing

(II) Discrete Proximal Linearized Minimization

slide-23
SLIDE 23
slide-24
SLIDE 24

(III) Asymmetric Inner-product Binary Coding

Hashing for Maximum Inner Product Search (MIPS):

Retrieve the datum having the largest inner product with query q from database A

Algorithm: Inner product fitting by asymmetric hash functions

  • F. Shen, W. Liu, S. Zhang, Y. Yang, and H. T. Shen, “Learning Binary Codes for Maximum Inner Product

Search”, ICCV 2015

Decomposed this hard problem into two sub-problems with each solved by DCC, as in SDH. is the inner products of and

slide-25
SLIDE 25

Results: unsupervised hashing

Asymmetric Inner-product Binary Coding (AIBC)

slide-26
SLIDE 26

(IV) Discrete Collaborative Filtering

Collaborative Filtering Our proposal: Discrete Collaborative Filtering

  • H. Zhang, F. Shen, L. Liu, W. Liu, X. He, H. Luan and T.‐S. Chua, “Discrete Collaborative Filtering”, SIGIR 2016.

Best Paper Award Honorable Mention

slide-27
SLIDE 27

(IV) Discrete Collaborative Filtering

slide-28
SLIDE 28

(V) Classification by Hamming Retrieval

  • Very few learn-to-hash work for classification!
  • Existing classification methods treat hash

codes as real-valued features Boost even linear classification by hashing Motivation

slide-29
SLIDE 29

Idea: Classify binary data with binary weights

Floating-point multiplications XNOR operations

(V) Classification by Hamming Retrieval

  • F. Shen, Y. Mu, Y. Yang, W. Liu, L. Liu, J. Song, H. T. Shen, “Classification by Retrieval:

Binarizing Data and Classifier”, SIGIR 2017. Best Paper Award Honorable Mention

slide-30
SLIDE 30

Framework

Classifying an image reduces to retrieving its nearest class codes in the Hamming space.

(V) Classification by Hamming Retrieval

slide-31
SLIDE 31

Fomulation: Joint learning of binary codes and binary weights

  • The loss

can be any proper empirical loss. We particularly study the Exponential loss and Linear loss.

Inter-class margin

(V) Classification by Hamming Retrieval

slide-32
SLIDE 32
  • W-subproblem: Binary Quadratic Program (BQP)

bit-by-bit

Sequential bit flipping algorithm – local optimal

  • B-subproblem

bit-by-bit

Solution: Exponential loss

  • P-subproblem

(V) Classification by Hamming Retrieval

slide-33
SLIDE 33

Results:

LibLinear vs. our method on SUN 397

(V) Classification by Hamming Retrieval

slide-34
SLIDE 34

Results:

Comparison in accuracy (%), training and testing time (seconds).

(V) Classification by Hamming Retrieval

slide-35
SLIDE 35

Results:

Accuracy (%) with increasing binary code length

(V) Classification by Hamming Retrieval

slide-36
SLIDE 36
  • Convert linear classification to Hamming retrieval
  • Binarize both data and classifier in a joint problem
  • Support many empirical loss functions
  • Significant reduction on storage, training and

testing computation

(V) Classification by Hamming Retrieval

Conclusions:

slide-37
SLIDE 37

(VI) Deep Sketch Hashing

Sketch based image retrieval

Existing methods:

  • Hand-crafted feature engineering. (e.g., SIFT, HOG,

HELO[1], LKS[2])

  • Deep learning based feature extraction

Li Liu, Fumin Shen, Yuming Shen, Xianglong Liu, Ling Shao, “Deep Sketch Hashing: Fast Free‐hand Sketch‐Based Image Retrieval,”, CVPR 2017

slide-38
SLIDE 38

Framework of DSH

We integrate a convolutional neural network and discrete binary code learning into a unified framework.

(VI) Deep Sketch Hashing

Li Liu, Fumin Shen, Yuming Shen, Xianglong Liu, Ling Shao, “Deep Sketch Hashing: Fast Free‐hand Sketch‐Based Image Retrieval,”, CVPR 2017

slide-39
SLIDE 39

Objective Formulation of DSH

non-convex and non-smooth

(VI) Deep Sketch Hashing

slide-40
SLIDE 40

Alternating Optimization

(VI) Deep Sketch Hashing

slide-41
SLIDE 41

Comparison with previous SBIR methods

(VI) Deep Sketch Hashing

slide-42
SLIDE 42

Comparison with cross-modality methods

Experimental results of DSH

(VI) Deep Sketch Hashing

slide-43
SLIDE 43

Successful Cases of DSH:

(VI) Deep Sketch Hashing

slide-44
SLIDE 44

(VI) Hashing for Partial Action Recognition

Motivation:

  • Most action recognition approaches analyze after-the-fact actions. However,

capturing complete actions is often difficult due to occlusions, interruptions, etc.

  • Partial action recognition (PAR) has a wide range of applications in intelligent

surveillance, smart homes, retrieval systems, etc.

Traditional Action Recognition Action Prediction Partial Action Recognition (Ours)

slide-45
SLIDE 45

Preserving similarity Feature reconstruction Learning coding matrix

The flowchart of Partial Reconstructive Binary Coding (PRBC) Objective:

Discrete Alternating Optimization

(VI) Hashing for Partial Action Recognition

slide-46
SLIDE 46
  • Quantitative results on three tasks:

1) Action prediction 2) Partial action retrieval 3) Partial action recognition

  • J. Qin, L. Liu, L. Shao, B. Ni,
  • C. Chen, F. Shen and Y. Wang,

“Binary Coding for Partial Action Analysis with Limited Observation Ratios”, in CVPR 2017.

(VI) Hashing for Partial Action Recognition

slide-47
SLIDE 47

Our work on discrete hashing

  • Deep Asymmetric Pairwise Hashing (ACM MM’17)
  • DSeRH for deep ranking hashing (CVPR’17)
  • Asymmetric Binary Coding (TMM 2016)
  • Discrete Cross-modal Hashing (TIP 2016)
  • ZSECOC for Action Recognition (CVPR’17)
  • Compressed K-means (AAAI’17)
  • Discrete Spectral Clustering (IJCAI’16)
  • Attribute Hashing (ICME’17) Best Paper Award – Platinum Award
  • Zero-shot Hashing (MM’16)
  • AIBC for Medical Image Retrieval (ISBI’16)
slide-48
SLIDE 48

Thank you! & Any questions?