Extensions to Self-Taught Hashing: Kernelisation and Supervision - PowerPoint PPT Presentation

Extensions to Self-Taught Hashing: Kernelisation and Supervision Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu Birkbeck, University of London dell.z@ieee.org The SIGIR 2010 Workshop on Feature Generation and Selection for Information Retrieval (FGSIR) 23 July 2010, Geneva, Switzerland

Outline Problem 1 Related Work 2 Review of STH 3 Extensions to STH 4 Conclusion 5 D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 2 / 46

Problem Similarity Search (aka Nearest Neighbour Search) — Given a query document, find its most similar documents from a large document collection Information Retrieval tasks near-duplicate detection, plagiarism analysis, collaborative filtering, caching, content-based multimedia retrieval, etc. k-Nearest-Neighbours (kNN) algorithm text categorisation, scene completion/recognition, etc. “The unreasonable effectiveness of data” If a map could include every possible detail of the land, how big would it be? D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 3 / 46

Problem A promising way to accelerate similarity search is Semantic Hashing Design compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance) Each bit can be regarded as a binary feature Generating a few most informative binary features to represent the documents Then similarity search can done extremely fast by just checking a few nearby codes (memory addresses) For example, 0000 = ⇒ 0000 , 1000 , 0100 , 0010 , 0001 . D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 4 / 46

Problem D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 5 / 46

Problem D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 6 / 46

Related Work Fast (Exact) Similarity Search in a Low -Dimensional Space Space-Partitioning Index KD-tree, etc. Data Partitioning Index R-tree, etc. D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 8 / 46

Related Work Figure: An example of KD-tree (by Andrew Moore). D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 9 / 46

Related Work Fast (Approximate) Similarity Search in a High -Dimensional Space Data-Oblivious Hashing Locality-Sensitive Hashing (LSH) Data-Aware Hashing binarised Latent Semantic Indexing (LSI), Laplacian Co-Hashing (LCH) stacked Restricted Boltzmann Machine (RBM) boosting based Similarity Sensitive Coding (SSC) and Forgiving Hashing (FgH) Spectral Hashing (SpH) — the state of the art Restrictive assumption: the data are uniformly distributed in a hyper-rectangle D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 10 / 46

Related Work Table: Typical techniques for accelerating similarity search. low-dimensional space exact similarity search data-aware KD-tree, R-tree data-oblivious LSH LSI, LCH, high-dimensional space approximate similarity search data-aware RBM, SSC, FgH, SpH, STH D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 11 / 46

Review of STH Input: X = { x i } n i =1 ⊂ R m Output: f ( x ) ∈ {− 1 , +1 } l : hash function − 1 = bit off; +1 = bit on l ≪ m D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 13 / 46

Review of STH Figure: The proposed STH approach to semantic hashing. D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 14 / 46

Review of STH Stage 1: Learning of Binary Codes Let y i ∈ {− 1 , +1 } l represent the binary code for document vector x i − 1 = bit off; +1 = bit on. Let Y = [ y 1 , . . . , y n ] T D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 15 / 46

Review of STH Criterion 1a: Similarity Preserving We focus on the local structure of data N k ( x ): the set of k -nearest-neighbours of document x The local similarity matrix W i.e., the adjacency matrix of the k -nearest-neighbours graph symmetric and sparse � � x T � � � x j · if x i ∈ N k ( x j ) or x j ∈ N k ( x i ) i W ij = � x i � � x j � 0 otherwise � � � − � x i − x j � 2 exp if x i ∈ N k ( x j ) or x j ∈ N k ( x i ) 2 σ 2 W ij = 0 otherwise D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 16 / 46

Review of STH Figure: The local structure of data in a high-dimensional space. D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 17 / 46

Review of STH Figure: Manifold analysis: exploiting the local structure of data. D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 18 / 46

Review of STH Criterion 1a: Similarity Preserving The Hamming distance between two codes y i and y j is � y i − y j � 2 4 We minimise the weighted total Hamming distance, as it incurs a heavy penalty if two similar documents are mapped far apart n n � y i − y j � 2 � � W ij 4 i =1 j =1 The squared error of distance would lead to a non-convex optimisation problem D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 19 / 46

Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size For single-bit codes f = ( y 1 , . . . , y n ) T : n n ( y i − y j ) 2 = 1 � � 4 f T Lf S = W ij 4 i =1 j =1 Laplacian matrix L = D − W D = diag ( k 1 , . . . , k n ) where k i = � j W ij D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 20 / 46

Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size Figure: Spectral graph partitioning through Normalised Cut . D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 21 / 46

Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size Real relaxation Requiring y i ∈ {− 1 , +1 } makes the problem NP hard Substitute ˜ y i ∈ R for y i L is positive semi-definite eigenvalues: 0 = λ 1 = . . . = λ z < λ z +1 ≤ . . . ≤ λ n eigenvectors: u 1 , . . . , u z , u z +1 , . . . , u n Optimal non-trivial division: f = u z +1 The number of edges across clusters is small D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 22 / 46

Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size For l -bit codes Y = [ y 1 , . . . , y n ] T : n n � y i − y j � 2 = 1 � � 4Tr( Y T LY ) S = W ij 4 i =1 j =1 Let ˜ Y be the real relaxation of Y D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 23 / 46

Review of STH Spectral Methods for Manifold Analysis — Minimising Cut-Size Laplacian Eigenmap (LapEig) Tr(˜ Y T L ˜ arg min Y ) ˜ Y Y T D ˜ ˜ subject to Y = I ˜ Y T D1 = 0 Generalised Eigenvalue Problem Lv = λ Dv (1) ˜ Y = [ v 1 , . . . , v l ] D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 24 / 46

Review of STH Criterion 1b: Entropy Maximising Best utilisation of the hash table = Maximum entropy of the codes = Uniform distribution of the codes (each code has equal probability) The p -th bit is on for half of the corpus and off for the other half y ( p ) � +1 ˜ ≥ median( v p ) y ( p ) = i i − 1 otherwise The bits at different positions are almost mutually uncorrelated, as the eigenvectors given by LapEig are orthogonal to each other D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 25 / 46

Review of STH Stage 2: Learning of Hash Function How to get the codes for new documents previously unseen? — Out-of-Sample Extension High computational complexity Nystrom method Linear approximation (e.g., LPI) Restrictive assumption about data distribution Eigenfunction approximation (e.g., SpH) D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 26 / 46

Review of STH Stage 2: Learning of Hash Function We reduce it to a supervised learning problem Think of each bit y ( p ) ∈ { +1 , − 1 } in the binary code for i document x i as a binary class label (class-“on” or class-“off”) for that document Train a binary classifier y ( p ) = f ( p ) ( x ) on the given corpus that has already been “labelled” by the 1st stage Then we can use the learned binary classifiers f (1) , . . . , f ( l ) to predict the l -bit binary code y (1) , . . . , y ( l ) for any query document x D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 27 / 46

Review of STH Kernel Methods for Pseudo- Supervised Learning — Support Vector Machine (SVM) y ( p ) = f ( p ) ( x ) = sgn( w T x ) n 1 2 w T w + C � arg min ξ i (2) n w ,ξ i ≥ 0 i =1 i =1 : y ( p ) ∀ n w T x i ≥ 1 − ξ i subject to i large-margin classification − → good generalisation linear/non-linear kernels − → linear/non-linear mapping convex optimisation − → global optimum D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 28 / 46

Review of STH Self-Taught Hashing (STH): The Learning Process Unsupervised Learning of Binary Codes 1 Construct the k -nearest-neighbours graph for the given corpus Embed the documents in an l -dimensional space through LapEig (1) to get an l -dimensional real-valued vector for each document Obtain an l -bit binary code for each document via thresholding the above vectors at their median point, and then take each bit as a binary class label for that document Supervised Learning of Hash Function 2 Train l SVM classifiers (2) based on the given corpus that has been “labelled” as above D. Zhang (Birkbeck) Extensions to STH FGSIR 2010 29 / 46

Extensions to Self-Taught Hashing: Kernelisation and Supervision - PowerPoint PPT Presentation

Extensions to Self-Taught Hashing: Kernelisation and Supervision Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu Birkbeck, University of London dell.z@ieee.org The SIGIR 2010 Workshop on Feature Generation and Selection for Information Retrieval

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files

Lecture 8: Hashing I Lecture Overview Dictionaries and Python Motivation Prehashing

Advanced Algorithms COMS31900 Hashing part two Static Perfect Hashing Rapha el Clifford

Hashing Hashing What is it? A form of narcotic intake? A side order for your eggs? A

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

H.264/AVC Standard and H.264/AVC Standard and H.264/AVC Standard and Extensions Extensions

MIN-HASHING AND LOCALITY SENSITIVE HASHING Thanks to: Rajaraman and Ullman, Mining Massive

Seungwon Song 2017.05.23 CS686 Paper Presentation #2 Suzi Kims Presentation Cell

Extraction of information in large graphs Automatic search for synonyms Pierre Senellart, under

Modeling and Mapping Metros Rail Stations Minhua Wang GIS Enterprise Architect

Google matrix of the world trade network Leonardo Ermann and Dima Shepelyansky (CNRS, Toulouse)

Territory Analysis Updates to the Traditional Methods CAS RPM M March 19 21, 2012 h 19 21

1 Human living spaces are often populated with shapes in a similar style. However, manually

RTX PRESENTATION Q1 2018/19 PRESENTATION BY CEO PETER RPKE & CFO KRISTIAN FREDERIKSEN

Hierarchy Construction Schemes within the Scale Set Framework Jean-Hugues PRUVOT, Luc BRUN GreyC

Sambuz

Useful Links

Newsletter

Mail Us

Extensions to Self-Taught Hashing: Kernelisation and Supervision - PowerPoint PPT Presentation

Extensions to Self-Taught Hashing: Kernelisation and Supervision Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu Birkbeck, University of London dell.z@ieee.org The SIGIR 2010 Workshop on Feature Generation and Selection for Information Retrieval

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files

Lecture 8: Hashing I Lecture Overview Dictionaries and Python Motivation Prehashing

Advanced Algorithms COMS31900 Hashing part two Static Perfect Hashing Rapha el Clifford

Hashing Hashing What is it? A form of narcotic intake? A side order for your eggs? A

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

H.264/AVC Standard and H.264/AVC Standard and H.264/AVC Standard and Extensions Extensions

MIN-HASHING AND LOCALITY SENSITIVE HASHING Thanks to: Rajaraman and Ullman, Mining Massive

Seungwon Song 2017.05.23 CS686 Paper Presentation #2 Suzi Kims Presentation Cell

Extraction of information in large graphs Automatic search for synonyms Pierre Senellart, under

Modeling and Mapping Metros Rail Stations Minhua Wang GIS Enterprise Architect

Google matrix of the world trade network Leonardo Ermann and Dima Shepelyansky (CNRS, Toulouse)

Territory Analysis Updates to the Traditional Methods CAS RPM M March 19 21, 2012 h 19 21

1 Human living spaces are often populated with shapes in a similar style. However, manually

RTX PRESENTATION Q1 2018/19 PRESENTATION BY CEO PETER RPKE &amp; CFO KRISTIAN FREDERIKSEN

Hierarchy Construction Schemes within the Scale Set Framework Jean-Hugues PRUVOT, Luc BRUN GreyC

Sambuz

Useful Links

Newsletter

Mail Us

RTX PRESENTATION Q1 2018/19 PRESENTATION BY CEO PETER RPKE & CFO KRISTIAN FREDERIKSEN