Composite Correlation Qantization for Efficient Multimodal Retrieval - PowerPoint PPT Presentation

Composite Correlation Qantization for Efficient Multimodal Retrieval Mingsheng Long 1 , Yue Cao 1 , Jianmin Wang 1 , and Philip S. Yu 12 1 School of Sofware Tsinghua University 2 Department of Computer Science University of Illinois, Chicago ACM Conference on Research and Development in Information Retrieval, SIGIR 2016 M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 1 / 28

Outline Introduction 1 Problem Effectiveness and Efficiency Previous Work Composite Correlation Qantization 2 Multimodal Correlation Composite Qantization Optimization Framework Evaluation 3 Results Discussion Summary 4 M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 2 / 28

Introduction Problem Multimodal Understanding How to utilize multimodal data to understand our real world? Isomorphic space: integration, fusion, correlation, transfer, ... M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 3 / 28

Introduction Problem Multimodal Retrieval Nearest Neighbor (NN) similarity retrieval across modalities Database: X img = { x img , . . . , x img N } and Qery: q txt 1 Cross-modal NN: NN ( q txt ) = min x img ∈X img d x img , q txt � � Top 16 Returned Images Top 16 Returned Tags Image Query Tags Query [‘sky sun’] [‘lake’] Precision: 0.625 Precision: 0.625 (a) I → T (Image Qery on Text DB) (b) T → I (Text Qery on Image DB) Figure: Cross-modal retrieval: similarity retrieval across media modalities. M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 4 / 28

Introduction Effectiveness and Efficiency Multimodal Embedding Multimodal embedding reduces cross-modal heterogeneity gap N d ( z img , z txt � i ) → more flexible Coupling: min i i = 1 Fusion: z i = f ( z img , z txt i ) → tighter relationship i Multimodal Coupling Embedding — Image Mapping 011 001 “A Tabby cat is leaning Multimodal on a wooden table, with Embedding one paw on a laser + Text mouse and the other on 001 Mapping a black laptop” Fusion 011 M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 5 / 28

Introduction Effectiveness and Efficiency Indexing and Hashing Approximate Nearest Neighbor (ANN) Search Exact Nearest Neighbor Search: linear scan O ( NP ) Efficient, acceptable accuracy, practical solutions Reduce the number of distance computations: O ( N ′ P ) , N ′ ≪ N Indexing: tree, neighborhood graph, inverted index, ... Reduce the cost of each distance computation: O ( NP ′ ) , P ′ ≪ P Hashing: Locality-Sensitive Hashing, Spectral Hashing, ... Produce a few distinct distances (curse of dimensionality) Limited ability and flexibility of distance approximation Qantization: Vector Qantization (VQ), Iterative Qantization (ITQ), Product Qantization (PQ), Composite Qantization (CQ) K-means: Impossible for medium and long codes (large K ) M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 6 / 28

Introduction Previous Work Multimodal Hashing 512-dim 128-bits floats two- 20GB 1M images 160M stage Previous work: separate pipeline for Multimodal Embedding and Binary Encoding → large information loss, unbalanced encoding M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 7 / 28

Composite Correlation Qantization Problem Definition Definition (Composite Correlation Qantization, CCQ) n = 1 ∈ R P 1 and a text set { x 2 n } N 1 n } N 2 Given an image set { x 1 n = 1 ∈ R P 2 , learn two correlation mappings f 1 : R P 1 �→ R D and f 2 : R P 2 �→ R D that transform images and texts into a D -dimensional isomorphic latent space, and jointly learn two composite quantizers q 1 : R D �→ { 0 , 1 } H and q 2 : R D �→ { 0 , 1 } H that quantize latent embeddings into compact H -bits binary codes. M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 8 / 28

Composite Correlation Qantization Overview A Latent Semantic Analysis (LSA) optimization framework n , where R v is correlation-maximal mapping, C v is x v n ≈ R v C v b v similarity-preserving codebook, b v n is compact binary code Multimodal Embedding: Correlation Mapping & Code Fusion Composite Qantization: Isomorphic Space (shared codebook) A “simple and reliable” approach to efficient multimodal retrieval Composite Image 4 Quantization 111 Mapping 000 001 010 011 100 101 110 111 4 Hash + Multimodal 011 100 Code 110 2 2 101 Embedding “A Tabby cat is leaning 000 001 010 011 100 101 110 111 001 on a wooden table, with 2 one paw on a laser 010 8 2 Text mouse and the other on Isomorphic 8 a black laptop” Mapping Codebook 000 001 010 011 100 101 110 111 M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 9 / 28

Composite Correlation Qantization Multimodal Correlation Multimodal Correlation Paired data matrices: X 1 = [ x 1 N ] , X 2 = [ x 2 1 , . . . , x 1 1 , . . . , x 2 N ] Fusion representation matrix: Z = [ z 1 , . . . , z N ] Transformation matrices: R 1 , R 2 , which transform X into Z 2 2 � R 1T X 1 − Z � � � R 2T X 2 − Z � � F + λ 2 R 1 , R 2 , Z λ 1 min (1) � � � � � � F Composite Image 4 Quantization X 1 111 Mapping 000 001 010 011 100 101 110 111 R 1 4 Hash + Multimodal 011 100 2 2 Code 110 101 Embedding “A Tabby cat is leaning Z R 2 000 001 010 011 100 101 110 111 001 on a wooden table, with 2 X 2 one paw on a laser 010 8 2 Text mouse and the other on Isomorphic 8 a black laptop” Mapping Codebook 000 001 010 011 100 101 110 111 M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 10 / 28

Composite Correlation Qantization Multimodal Correlation Multimodal Correlation This problem is ill-posed, which cannot be solved successfully 2 2 � � � � � R 1T X 1 − Z � R 2T X 2 − Z R 1 , R 2 , Z λ 1 min F + λ 2 (2) � � � � � � F Z = λ 1 R 1T X 1 + λ 2 R 2T X 2 (3) λ 1 + λ 2 X 1 X 1T � − 1 X 2 X 2T � − 1 R 1 = � R 2 = � X 1 Z T X 2 Z T (4) Composite Image 4 Quantization X 1 111 Mapping 000 001 010 011 100 101 110 111 R 1 4 Hash + Multimodal 011 100 2 2 Code 110 101 Embedding “A Tabby cat is leaning Z R 2 000 001 010 011 100 101 110 111 001 on a wooden table, with 2 X 2 one paw on a laser 010 8 2 Text mouse and the other on Isomorphic 8 a black laptop” Mapping Codebook 000 001 010 011 100 101 110 111 M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 11 / 28

Composite Correlation Qantization Multimodal Correlation Multimodal Correlation Add the covariance maximization with orthogonal constraints �� 2 2 � � T X 1 � � R 1T X 1 − Z � R 1 F + R 1 , R 2 , Z λ 1 min � � � � ⊥ � � F (5) �� 2 2 � R 2T X 2 − Z � � T X 2 � � R 2 + λ 2 F + � � � � ⊥ � � F � X 1 − R 1 Z � X 2 − R 2 Z � 2 � 2 � � � � R 1 , R 2 , Z λ 1 min F + λ 2 (6) F Composite Image 4 Quantization X 1 111 Mapping 000 001 010 011 100 101 110 111 R 1 4 Hash + Multimodal 011 100 Code 110 2 2 101 Embedding “A Tabby cat is leaning R 2 Z 000 001 010 011 100 101 110 111 001 on a wooden table, with 2 X 2 one paw on a laser 010 8 2 Text mouse and the other on Isomorphic 8 a black laptop” Mapping Codebook 000 001 010 011 100 101 110 111 M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 12 / 28

Composite Correlation Qantization Composite Qantization Composite Qantization Learn M codebooks: C = [ C 1 , . . . , C M ] , each codebook has K codewords C m = [ c m 1 , . . . , c mK ] (cluster centroids of K-means) Each z i is approximated by the addtion of M codewords One per codebook, each selected by the binary assignment b mi Code representation: i 1 i 2 . . . i M , where i m = nz ( b mi ) Code length: M log 2 K (1-of- K encoding) z ≈ ˆ z = C 1 b 1 + C 2 b 2 + . . . + C M b M (7) = c 1 i 1 + c 2 i 2 + . . . + c Mi M C 1 = [ c 11 , . . . , c 1 K ] C 2 = [ c 21 , . . . , c 2 K ] . . . C M = [ c M 1 , . . . , c MK ] M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 13 / 28

Composite Correlation Qantization Composite Qantization Composite Qantization Learn M codebooks: C = [ C 1 , . . . , C M ] , each codebook has K codewords C m = [ c m 1 , . . . , c mK ] (cluster centroids of K-means) Binary code matrices: B = [ B 1 ; . . . ; B M ] , B m = [ b m 1 ; . . . ; b mN ] Control binary codes quality by quantization error minimization N 2 2 � M � M � � � � � min � Z − m = 1 C m B m F = � z i − m = 1 C m b mi (8) � � � � � � Z , C , B 2 i = 1 Isomorphic Codebook Composite Image 4 Quantization 111 Mapping 000 001 010 011 100 101 110 111 4 Hash + Multimodal B 011 100 Code 110 2 2 101 Embedding “A Tabby cat is leaning Z 000 001 010 011 100 101 110 111 001 on a wooden table, with 2 one paw on a laser 010 8 2 Text mouse and the other on [ ] 8 a black laptop” Mapping C = C 1 , … , C M 000 001 010 011 100 101 110 111 M. Long et al. (Tsinghua University) Composite Correlation Qantization ACM SIGIR 2016 14 / 28

Composite Correlation Qantization for Efficient Multimodal Retrieval - PowerPoint PPT Presentation

Composite Correlation Qantization for Efficient Multimodal Retrieval Mingsheng Long 1 , Yue Cao 1 , Jianmin Wang 1 , and Philip S. Yu 12 1 School of Sofware Tsinghua University 2 Department of Computer Science University of Illinois, Chicago ACM

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER

Local Correlation with Local Vol and Stochastic Vol : Towards Correlation dynamics ? Pascal

1 Outline Introduction to WMSNs Spatial correlation for visual information in WMSNs

Correlation Quantitative A Aptitude & & Business S Statistics Correlation

The Chain Rule Given a composite function: The Chain Rule Given a composite function: h ( x ) =

Plan Composite Likelihood Methods What are composite likelihoods? David Firth Where are

Composite Trust Composite Trust Composite Trust A formal derivation of conjunction A formal

UNDERWATER EXPLOSIVE LOADING OF E-GLASS / VINYL ESTER COMPOSITE PLATES: CORRELATION OF

TOUGHNESS DETERMINATION IN COMPOSITE TOUGHNESS DETERMINATION IN COMPOSITE MULTIMATERIAL CLOSED

LPG Composite Cylinder 11/18/2013, Beck 1 CONFIDENTIAL LPG COMPOSITE CYLINDER 11/18/2013, Beck

Masked Correlation Filters for Partially Occluded Face Recognition Eric He ICASSP 2016

Remarks on the Data Complexity of Zero-Correlation Linear Attacks C eline Blondeau Aalto

Correlation scales of chorus emissions determined from multi-point THEMIS observations Vitalii

Multimodal Machine Learning Main Goal Define a common taxonomy for multimodal machine learning

12/17/2019 Department of Veterinary and Animal Sciences Hierarchical Markov decision processes

Multi-level Models for Classroom Dynamics Christopher DuBois Padhraic Smyth, UC Irvine Carter

Multilevel Models Session 2: Random intercept models Outline Two level random intercept

BigARTM: Open Source Library for Regularized Multimodal Topic Modeling of Large Collections

Fusical : Multimodal Fusion for Video Sentiment Boyang Tom Jin Leila Abdelrahman Cong Kevin Chen

Multimodal Corpus for Integrated language and action Rishabh Nigam 10598 Cognitive Sciences

Production in a Multimodal Corpus: How Speakers Communicate Complex Actions LREC 2008 Carlos

Composite Correlation Qantization for Efficient Multimodal Retrieval - PowerPoint PPT Presentation

Composite Correlation Qantization for Efficient Multimodal Retrieval Mingsheng Long 1 , Yue Cao 1 , Jianmin Wang 1 , and Philip S. Yu 12 1 School of Sofware Tsinghua University 2 Department of Computer Science University of Illinois, Chicago ACM

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER

Local Correlation with Local Vol and Stochastic Vol : Towards Correlation dynamics ? Pascal

1 Outline Introduction to WMSNs Spatial correlation for visual information in WMSNs

Correlation Quantitative A Aptitude &amp; &amp; Business S Statistics Correlation

The Chain Rule Given a composite function: The Chain Rule Given a composite function: h ( x ) =

Plan Composite Likelihood Methods What are composite likelihoods? David Firth Where are

Composite Trust Composite Trust Composite Trust A formal derivation of conjunction A formal

UNDERWATER EXPLOSIVE LOADING OF E-GLASS / VINYL ESTER COMPOSITE PLATES: CORRELATION OF

TOUGHNESS DETERMINATION IN COMPOSITE TOUGHNESS DETERMINATION IN COMPOSITE MULTIMATERIAL CLOSED

LPG Composite Cylinder 11/18/2013, Beck 1 CONFIDENTIAL LPG COMPOSITE CYLINDER 11/18/2013, Beck

Masked Correlation Filters for Partially Occluded Face Recognition Eric He ICASSP 2016

Remarks on the Data Complexity of Zero-Correlation Linear Attacks C eline Blondeau Aalto

Correlation scales of chorus emissions determined from multi-point THEMIS observations Vitalii

Multimodal Machine Learning Main Goal Define a common taxonomy for multimodal machine learning

12/17/2019 Department of Veterinary and Animal Sciences Hierarchical Markov decision processes

Multi-level Models for Classroom Dynamics Christopher DuBois Padhraic Smyth, UC Irvine Carter

Multilevel Models Session 2: Random intercept models Outline Two level random intercept

BigARTM: Open Source Library for Regularized Multimodal Topic Modeling of Large Collections

Fusical : Multimodal Fusion for Video Sentiment Boyang Tom Jin Leila Abdelrahman Cong Kevin Chen

Multimodal Corpus for Integrated language and action Rishabh Nigam 10598 Cognitive Sciences

Production in a Multimodal Corpus: How Speakers Communicate Complex Actions LREC 2008 Carlos

Correlation Quantitative A Aptitude & & Business S Statistics Correlation