background
play

Background Unprecedented growth of multimedia data on the Internet. - PowerPoint PPT Presentation

Supervised Hierarchical Cross-Modal Hashing Changchang Sun , Xuemeng Song , Fuli Feng , Wayne Xin Zhao $ , Hao Zhang * , Liqiang Nie School of Computer Science and Technology, Shandong University School of Computing, National


  1. Supervised Hierarchical Cross-Modal Hashing Changchang Sun † , Xuemeng Song † , Fuli Feng ‡ , Wayne Xin Zhao $ , Hao Zhang * , Liqiang Nie † † School of Computer Science and Technology, Shandong University ‡ School of Computing, National University of Singapore $ School of Information, Renmin University of China * Mercari, Inc, Japan 1

  2. Background Ø Unprecedented growth of multimedia data on the Internet. Ø Application: cross-modal retrieval. Ø Solution: supervised cross-modal hashing. Mini-skirt UNIQLO Women Cotton Mini Skirt. Hamming Long Skirt Chicwish Endless Blooming Rose Max Skirt. 0 Space Wide-leg Jeans Chloé Frayed High-rise Wide-leg Jeans. Labels Image Text 2

  3. Related Work Ø Define cross-modal similarity matrix Qingyuan Jiang and Wujun Li. Deep Cross-Modal Hashing. In CVPR, 2017 3

  4. Related Work Ø Learn semantic information from multiple labels Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval. In CVPR, 2018 4

  5. Motivation Ø Explore the rich semantic information conveyed by the label hierarchy. Ø Finest-grained layer I I Dissimilar 3 1 Ø Less finer-grained layer I I Similar 3 1 Figure 1: Illustration of the label hierarchy. 5

  6. Challenges Ø How to employ the label hierarchy to guide the cross-modal hashing and preserve the underlying correlations from original space to hamming space. Mapping A B C Original Space 6 Hamming Space

  7. Challenges Ø How to enhance the hierarchical discriminative power of hash codes. Skirt Mini-Skirt Hash Code Jeans Wide-leg Jeans Hash Code 7

  8. Challenges Ø The lack of benchmark dataset, whose data points should involve multiple modalities and are hierarchically labeled. Super-class Class Flowers Rose, Sunflower, Lily... Fish Goldfish, Shark, Dolphin... Unimodal Insect Bee, Butterfly, Caterpillar... Data Points Fruit Apple, Peach, Pear... ... ... Table 1: Hierarchical labels of benchmark dataset CIFAR-100. 8

  9. Framework Concatenation VGG-F Figure 2: Illustration of the proposed scheme, HiCHNet. 9

  10. Framework Ø Regularized Cross-modal Hashing p Layer-wise Hash Representation K Fully Connected Networks K Layers ~ v ~ i    k k k h s ( W v g ), k 1 ,..., K v i v i v ~    k j k h s ( W t g ), k 1 ,..., K t j t j t   k k b sign ( h ), k 1 ,..., K v v i i   k k b sign ( h ), k 1 ,..., K t t j j k k h i h ( ) : layer-wise hash representation v t j ~ k k b i b ( ) : layer-wise binary hash codes t v t j j 10

  11. Framework Ø Regularized Cross-modal Hashing p Layer-wise Semantic Similarity Preserving • Objective function (negative log likelihood):  k S 1 Same label at the k-th layer Ground ij  k S 0 Truth Different label at the k-th layer ij K N   k         k k ( S log( 1 e )) ij 1 k ij ij   k 1 i , j 1 1 Layer Semantic   k k T k ( h ) h ij v t Confidence Similarity 2 i j 11

  12. Framework Ø Regularized Cross-modal Hashing p Binarization Difference Penalizing To derive the optimal continuous surrogates of the hash codes  k k B sgn( H ) v v   T  a 1  , 1 , , 1  k k B sgn( H ) t t k  2 2 2 2          k k k k k k ( B H B H ) ( H a H a ) 2 v v t t v t F F 2 2  K 1 Binarization Difference Information Regularization Maximization 12

  13. Framework Ø Hierarchical Discriminative Learning • Objective function (negative log likelihood): k k h p    k k k k p soft max( U h q ), k 1 ,..., K v i v i v v v v i i    k j k k p soft max( U h g ), k 1 ,..., K t t t t j j   K N        k T k k T k ( y ) log( p ) ( y ) log( p ) k k h k i v i t h p i i   k 1 i 1 t j t j Layer Ground-truth Confidence 13

  14. Framework Ø Final Objective Function Non-negative Tradeoff Parameter min      ( 1 ) r h k   B , , v t Regularized Hierarchical Discriminative Cross-modal Hashing Learning 14

  15. Experiment Ø Dataset • Two datasets: FashionVC (public) and Ssense (created by ourselves). • Ssense: Collected from the online fashion platform Ssense. (2018.12.14--2018.12.16). • Raw data: 25,974 image-text instances with hierarchical labels. • Preprocessing: Removed the noisy instances that involve multiple items. Filtered out the categories with less than 70 instances. Noisy Instances 15

  16. Experiment Ø Dataset • Two datasets: FashionVC (public) and Ssense (created by ourselves). Table 1: Statistics of our datasets. 16

  17. Experiment Ø Dataset • FashionVC Label Hierarchy: 35 categories with two layers 17

  18. Experiment Ø Dataset • Ssense Label Hierarchy: 32 categories with two layers 18

  19. Experiment Ø Experiment Setting Image to Text Task Text to Image Protocol: Mean Average Precision Shallow Learning: CCA, SCM-Or, SCM-Se, DCH Baselines Deep Learning: CDQ, SSAH, DCMH 500-D SIFT Features and 4096-D Deep Features 19

  20. Experiment Ø On Model Comparison Ta b l e 2 : T h e M A P s c o r e s o f d i ff e r e n t methods on two datasets. The shallow learning baselines use the SIFT features. Table 3: The MAP scores of different methods on two datasets. The shallow learning baselines use the VGG-F features. 20

  21. Experiment Ø On Label Hierarchy Figure 3: HiCHNet-flat : One derivative of our HiCHNet model. 21

  22. Experiment Ø On Label Hierarchy Figure 4: Performance of HiCHNet and HiCHNet-flat on FashionVC. 22

  23. Experiment Ø On Case Study 1 • Retrieve from the whole retrieval set Figure 5: Illustration of ranking results from the whole retrieval set. The irrelevant images are highlighted in red boxes. 23

  24. Experiment Ø On Case Study 2 • Retrieve from the constrained subset of 10 images of different categories. Figure 6: Illustration of ranking results from the constrained retrieval set. 24

  25. Conclusion l We first validate the benefits of utilizing the category hierarchy in cross-modal. l We propose a novel supervised hierarchical cross-modal hashing framework. l We build a large-scale benchmark dataset from the global fashion platform Ssense. Extensive experiments demonstrate the superiority of HiCHNet over the state-of-the-art methods. 25

  26. Thanks Q&A Thanks for the travel grant from SIGIR. Email: sunchangchang123@gmail.com 26

  27. Back Up 27

  28. Experiment Ø On Category Analysis Figure 7: Performance of HiCHNet and DCMH on different categories of FashionVC and Ssense in the task of “Text→Image”. 28

  29. Experiment Ø On Component Analysis Figure 8: Sensitivity analysis of the hyper-parameters. 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend