Background Unprecedented growth of multimedia data on the Internet. - - PowerPoint PPT Presentation

background
SMART_READER_LITE
LIVE PREVIEW

Background Unprecedented growth of multimedia data on the Internet. - - PowerPoint PPT Presentation

Supervised Hierarchical Cross-Modal Hashing Changchang Sun , Xuemeng Song , Fuli Feng , Wayne Xin Zhao $ , Hao Zhang * , Liqiang Nie School of Computer Science and Technology, Shandong University School of Computing, National


slide-1
SLIDE 1

Supervised Hierarchical Cross-Modal Hashing

Changchang Sun†, Xuemeng Song†, Fuli Feng‡, Wayne Xin Zhao$, Hao Zhang*, Liqiang Nie†

†School of Computer Science and Technology, Shandong University

‡School of Computing, National University of Singapore

$School of Information, Renmin University of China *Mercari, Inc, Japan

1

slide-2
SLIDE 2

Ø Unprecedented growth of multimedia data on the Internet. Ø Application: cross-modal retrieval. Ø Solution: supervised cross-modal hashing.

Background

Image Text

UNIQLO Women Cotton Mini Skirt. Chloé Frayed High-rise Wide-leg Jeans. Chicwish Endless Blooming Rose Max Skirt.

Hamming Space

Mini-skirt Long Skirt Wide-leg Jeans

Labels

2

slide-3
SLIDE 3

Related Work

ØDefine cross-modal similarity matrix

3

Qingyuan Jiang and Wujun Li. Deep Cross-Modal Hashing. In CVPR, 2017

slide-4
SLIDE 4

Related Work

ØLearn semantic information from multiple labels

4

Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval. In CVPR, 2018

slide-5
SLIDE 5

Motivation

ØExplore the rich semantic information conveyed by the label hierarchy.

Figure 1: Illustration of the label hierarchy.

Ø Finest-grained layer Ø Less finer-grained layer Similar

5

1

I

3

I

Dissimilar 1

I

3

I

slide-6
SLIDE 6

Challenges

ØHow to employ the label hierarchy to guide the cross-modal hashing and preserve the underlying correlations from

  • riginal space to hamming space.

Original Space Hamming Space Mapping A B C

6

slide-7
SLIDE 7

Hash Code Skirt Jeans Hash Code Mini-Skirt Wide-leg Jeans

Challenges

ØHow to enhance the hierarchical discriminative power of hash codes.

7

slide-8
SLIDE 8

Challenges

ØThe lack of benchmark dataset, whose data points should involve multiple modalities and are hierarchically labeled.

Table 1: Hierarchical labels of benchmark dataset CIFAR-100.

Super-class Class Flowers Rose, Sunflower, Lily... Fish Goldfish, Shark, Dolphin... Insect Bee, Butterfly, Caterpillar... Fruit Apple, Peach, Pear... ... ...

Unimodal Data Points

8

slide-9
SLIDE 9

Framework

Figure 2: Illustration of the proposed scheme, HiCHNet. VGG-F

9

Concatenation

slide-10
SLIDE 10

K k g v W s h

k v i k v k vi

,..., 1 ), (

~

   K k g t W s h

k t j j t k t j

,..., 1 ), (

~

  

Framework

i

v ~

j

t ~

ØRegularized Cross-modal Hashing

pLayer-wise Hash Representation

K k h sign b

k v k v

i i

,..., 1 ), (   K k h sign b

k t k t

j j

,..., 1 ), (  

) (

k t k v

j i h

h

: layer-wise hash representation

) (

k t k v

j i b

b

: layer-wise binary hash codes K Fully Connected Networks K Layers

10

slide-11
SLIDE 11

ØRegularized Cross-modal Hashing

pLayer-wise Semantic Similarity Preserving

Framework

k t T k v k ij

j i

h h ) ( 2 1  

 

 

    

K k N j i k ij k ij k

k ij

e S

1 1 , 1

)) 1 log( (

 

  • Objective function (negative log likelihood):

Semantic Similarity Layer Confidence

11

Ground Truth

1 

k ij

S 

k ij

S

Same label at the k-th layer Different label at the k-th layer

slide-12
SLIDE 12

ØRegularized Cross-modal Hashing

pBinarization Difference Penalizing

Framework

) sgn( ) sgn(

k t k t k v k v

H B H B  

) ( ) (

2 2 2 2 1 2 2 2

a H a H H B H B

k t k v k K F k t k t F k v k v

      

  Binarization Difference Regularization Information Maximization

12

To derive the optimal continuous surrogates of the hash codes  

T

a 1 , , 1 , 1  

slide-13
SLIDE 13

ØHierarchical Discriminative Learning

K k q h U soft p

k v k v k v k v

i i

,..., 1 ), max(    K k g h U soft p

k t k t j t k t

j j

,..., 1 ), max(   

 

 

 

  

K k N i k t T k i k v T k i k h

i i

p y p y

1 1

) log( ) ( ) log( ) (  

Framework

k vi

p

k t j

p

  • Objective function (negative log likelihood):

k vi

h

k t j

h

13

Layer Confidence Ground-truth

slide-14
SLIDE 14

ØFinal Objective Function

Framework

14

h r B

t v k

   ) 1 (

min

, ,

 

 

Regularized Cross-modal Hashing Hierarchical Discriminative Learning Non-negative Tradeoff Parameter

slide-15
SLIDE 15

Experiment

ØDataset

15

  • Two datasets: FashionVC (public) and Ssense (created by ourselves).
  • Ssense: Collected from the online fashion platform Ssense.

(2018.12.14--2018.12.16).

  • Raw data: 25,974 image-text instances with hierarchical labels.
  • Preprocessing: Removed the noisy instances that involve multiple items.

Filtered out the categories with less than 70 instances.

Noisy Instances

slide-16
SLIDE 16

Experiment

ØDataset

16

  • Two datasets: FashionVC (public) and Ssense (created by ourselves).

Table 1: Statistics of our datasets.

slide-17
SLIDE 17

Experiment

ØDataset

17

  • FashionVC Label Hierarchy: 35 categories with two layers
slide-18
SLIDE 18

Experiment

ØDataset

18

  • Ssense Label Hierarchy: 32 categories with two layers
slide-19
SLIDE 19

Experiment

ØExperiment Setting

Protocol: Mean Average Precision

Image to Text Text to Image

Task

Shallow Learning: CCA, SCM-Or, SCM-Se, DCH Deep Learning: CDQ, SSAH, DCMH

Baselines

500-D SIFT Features and 4096-D Deep Features

19

slide-20
SLIDE 20

ØOn Model Comparison

Experiment

Ta b l e 2 : T h e M A P s c o r e s o f d i ff e r e n t methods on two datasets. The shallow learning baselines use the SIFT features. Table 3: The MAP scores

  • f different methods on

two datasets. The shallow learning baselines use the VGG-F features.

20

slide-21
SLIDE 21

Figure 3: HiCHNet-flat: One derivative of our HiCHNet model.

ØOn Label Hierarchy Experiment

21

slide-22
SLIDE 22

ØOn Label Hierarchy

Figure 4: Performance of HiCHNet and HiCHNet-flat on FashionVC.

Experiment

22

slide-23
SLIDE 23

ØOn Case Study 1

Experiment

Figure 5: Illustration of ranking results from the whole retrieval set. The irrelevant images are highlighted in red boxes.

23

  • Retrieve from the whole retrieval set
slide-24
SLIDE 24

ØOn Case Study 2

Figure 6: Illustration of ranking results from the constrained retrieval set.

Experiment

24

  • Retrieve from the constrained subset of 10 images of different categories.
slide-25
SLIDE 25

Conclusion

lWe first validate the benefits of utilizing the category hierarchy in cross-modal. lWe propose a novel supervised hierarchical cross-modal hashing framework. lWe build a large-scale benchmark dataset from the global fashion platform Ssense. Extensive experiments demonstrate the superiority

  • f HiCHNet over the state-of-the-art methods.

25

slide-26
SLIDE 26

Thanks Q&A

26

Thanks for the travel grant from SIGIR.

Email: sunchangchang123@gmail.com

slide-27
SLIDE 27

Back Up

27

slide-28
SLIDE 28

ØOn Category Analysis

Figure 7: Performance of HiCHNet and DCMH on different categories of FashionVC and Ssense in the task of “Text→Image”.

Experiment

28

slide-29
SLIDE 29

ØOn Component Analysis

Figure 8: Sensitivity analysis of the hyper-parameters.

Experiment

29