extensions to self taught hashing kernelisation and
play

Extensions to Self-Taught Hashing: Kernelisation and Supervision - PowerPoint PPT Presentation

Extensions to Self-Taught Hashing: Kernelisation and Supervision Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu Birkbeck, University of London dell.z@ieee.org The SIGIR 2010 Workshop on Feature Generation and Selection for Information Retrieval


  1. Extensions to Self-Taught Hashing: Kernelisation and Supervision Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu Birkbeck, University of London dell.z@ieee.org The SIGIR 2010 Workshop on Feature Generation and Selection for Information Retrieval (FGSIR) 23 July 2010, Geneva, Switzerland

  2. Outline Problem 1 Related Work 2 Review of STH 3 Extensions to STH 4 Conclusion 5 D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 2 / 46

  3. Problem Similarity Search (aka Nearest Neighbour Search) — Given a query document, find its most similar documents from a large document collection Information Retrieval tasks near-duplicate detection, plagiarism analysis, collaborative filtering, caching, content-based multimedia retrieval, etc. k-Nearest-Neighbours (kNN) algorithm text categorisation, scene completion/recognition, etc. “The unreasonable effectiveness of data” If a map could include every possible detail of the land, how big would it be? D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 3 / 46

  4. Problem A promising way to accelerate similarity search is Semantic Hashing Design compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance) Each bit can be regarded as a binary feature Generating a few most informative binary features to represent the documents Then similarity search can done extremely fast by just checking a few nearby codes (memory addresses) For example, 0000 = ⇒ 0000 , 1000 , 0100 , 0010 , 0001 . D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 4 / 46

  5. Problem D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 5 / 46

  6. Problem D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 6 / 46

  7. Outline Problem 1 Related Work 2 Review of STH 3 Extensions to STH 4 Conclusion 5 D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 7 / 46

  8. Related Work Fast (Exact) Similarity Search in a Low -Dimensional Space Space-Partitioning Index KD-tree, etc. Data Partitioning Index R-tree, etc. D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 8 / 46

  9. Related Work Figure: An example of KD-tree (by Andrew Moore). D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 9 / 46

  10. Related Work Fast (Approximate) Similarity Search in a High -Dimensional Space Data-Oblivious Hashing Locality-Sensitive Hashing (LSH) Data-Aware Hashing binarised Latent Semantic Indexing (LSI), Laplacian Co-Hashing (LCH) stacked Restricted Boltzmann Machine (RBM) boosting based Similarity Sensitive Coding (SSC) and Forgiving Hashing (FgH) Spectral Hashing (SpH) — the state of the art Restrictive assumption: the data are uniformly distributed in a hyper-rectangle D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 10 / 46

  11. Related Work Table: Typical techniques for accelerating similarity search. low-dimensional space exact similarity search data-aware KD-tree, R-tree data-oblivious LSH LSI, LCH, high-dimensional space approximate similarity search data-aware RBM, SSC, FgH, SpH, STH D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 11 / 46

  12. Outline Problem 1 Related Work 2 Review of STH 3 Extensions to STH 4 Conclusion 5 D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 12 / 46

  13. Review of STH Input: X = { x i } n i =1 ⊂ R m Output: f ( x ) ∈ {− 1 , +1 } l : hash function − 1 = bit off; +1 = bit on l ≪ m D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 13 / 46

  14. Review of STH Figure: The proposed STH approach to semantic hashing. D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 14 / 46

  15. Review of STH Stage 1: Learning of Binary Codes Let y i ∈ {− 1 , +1 } l represent the binary code for document vector x i − 1 = bit off; +1 = bit on. Let Y = [ y 1 , . . . , y n ] T D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 15 / 46

  16. Review of STH Criterion 1a: Similarity Preserving We focus on the local structure of data N k ( x ): the set of k -nearest-neighbours of document x The local similarity matrix W i.e., the adjacency matrix of the k -nearest-neighbours graph symmetric and sparse � � x T � � � x j · if x i ∈ N k ( x j ) or x j ∈ N k ( x i ) i W ij = � x i � � x j � 0 otherwise � � � − � x i − x j � 2 exp if x i ∈ N k ( x j ) or x j ∈ N k ( x i ) 2 σ 2 W ij = 0 otherwise D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 16 / 46

  17. Review of STH Figure: The local structure of data in a high-dimensional space. D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 17 / 46

  18. Review of STH Figure: Manifold analysis: exploiting the local structure of data. D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 18 / 46

  19. Review of STH Criterion 1a: Similarity Preserving The Hamming distance between two codes y i and y j is � y i − y j � 2 4 We minimise the weighted total Hamming distance, as it incurs a heavy penalty if two similar documents are mapped far apart n n � y i − y j � 2 � � W ij 4 i =1 j =1 The squared error of distance would lead to a non-convex optimisation problem D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 19 / 46

  20. Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size For single-bit codes f = ( y 1 , . . . , y n ) T : n n ( y i − y j ) 2 = 1 � � 4 f T Lf S = W ij 4 i =1 j =1 Laplacian matrix L = D − W D = diag ( k 1 , . . . , k n ) where k i = � j W ij D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 20 / 46

  21. Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size Figure: Spectral graph partitioning through Normalised Cut . D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 21 / 46

  22. Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size Real relaxation Requiring y i ∈ {− 1 , +1 } makes the problem NP hard Substitute ˜ y i ∈ R for y i L is positive semi-definite eigenvalues: 0 = λ 1 = . . . = λ z < λ z +1 ≤ . . . ≤ λ n eigenvectors: u 1 , . . . , u z , u z +1 , . . . , u n Optimal non-trivial division: f = u z +1 The number of edges across clusters is small D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 22 / 46

  23. Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size For l -bit codes Y = [ y 1 , . . . , y n ] T : n n � y i − y j � 2 = 1 � � 4Tr( Y T LY ) S = W ij 4 i =1 j =1 Let ˜ Y be the real relaxation of Y D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 23 / 46

  24. Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size Laplacian Eigenmap (LapEig) Tr(˜ Y T L ˜ arg min Y ) ˜ Y Y T D ˜ ˜ subject to Y = I ˜ Y T D1 = 0 Generalised Eigenvalue Problem Lv = λ Dv (1) ˜ Y = [ v 1 , . . . , v l ] D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 24 / 46

  25. Review of STH Criterion 1b: Entropy Maximising Best utilisation of the hash table = Maximum entropy of the codes = Uniform distribution of the codes (each code has equal probability) The p -th bit is on for half of the corpus and off for the other half y ( p ) � +1 ˜ ≥ median( v p ) y ( p ) = i i − 1 otherwise The bits at different positions are almost mutually uncorrelated, as the eigenvectors given by LapEig are orthogonal to each other D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 25 / 46

  26. Review of STH Stage 2: Learning of Hash Function How to get the codes for new documents previously unseen? — Out-of-Sample Extension High computational complexity Nystrom method Linear approximation (e.g., LPI) Restrictive assumption about data distribution Eigenfunction approximation (e.g., SpH) D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 26 / 46

  27. Review of STH Stage 2: Learning of Hash Function We reduce it to a supervised learning problem Think of each bit y ( p ) ∈ { +1 , − 1 } in the binary code for i document x i as a binary class label (class-“on” or class-“off”) for that document Train a binary classifier y ( p ) = f ( p ) ( x ) on the given corpus that has already been “labelled” by the 1st stage Then we can use the learned binary classifiers f (1) , . . . , f ( l ) to predict the l -bit binary code y (1) , . . . , y ( l ) for any query document x D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 27 / 46

  28. Review of STH Kernel Methods for Pseudo- Supervised Learning — Support Vector Machine (SVM) y ( p ) = f ( p ) ( x ) = sgn( w T x ) n 1 2 w T w + C � arg min ξ i (2) n w ,ξ i ≥ 0 i =1 i =1 : y ( p ) ∀ n w T x i ≥ 1 − ξ i subject to i large-margin classification − → good generalisation linear/non-linear kernels − → linear/non-linear mapping convex optimisation − → global optimum D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 28 / 46

  29. Review of STH Self-Taught Hashing (STH): The Learning Process Unsupervised Learning of Binary Codes 1 Construct the k -nearest-neighbours graph for the given corpus Embed the documents in an l -dimensional space through LapEig (1) to get an l -dimensional real-valued vector for each document Obtain an l -bit binary code for each document via thresholding the above vectors at their median point, and then take each bit as a binary class label for that document Supervised Learning of Hash Function 2 Train l SVM classifiers (2) based on the given corpus that has been “labelled” as above D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 29 / 46

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend