BEYOND PROJECT AND SIGN FOR COSINE ESTIMATION WITH BINARY CODES - - PowerPoint PPT Presentation

beyond project and sign for cosine estimation with binary
SMART_READER_LITE
LIVE PREVIEW

BEYOND PROJECT AND SIGN FOR COSINE ESTIMATION WITH BINARY CODES - - PowerPoint PPT Presentation

BEYOND PROJECT AND SIGN FOR COSINE ESTIMATION WITH BINARY CODES Raghavendran Balu, Teddy Furon and Herv Jgou INRIA, Rennes Problem statement: Nearest Neighbors search Finding the closest vector(s) from a database for a given query


slide-1
SLIDE 1

BEYOND “PROJECT AND SIGN” FOR COSINE ESTIMATION WITH BINARY CODES

Raghavendran Balu, Teddy Furon and Hervé Jégou INRIA, Rennes

slide-2
SLIDE 2

Problem statement: Nearest Neighbors search

– Finding the closest vector(s) from a database for a given query – In this paper: Problem: Exhaustive search has complexity

  • 2

Search engine Nearest neighbors query

x

database

y1, . . . , yn

NN(x) = arg min

1in kx yik = arg max 1in x>yi

slide-3
SLIDE 3

2 approaches to Nearest Neighbor Search

– Space partitioning

  • The search no longer exhaustive
  • Example: indexing technique involving several hash functions

– Approximate distance

  • Faster to compute but exhaustive
  • In this paper: we use an Hamming Embedding
  • 3
slide-4
SLIDE 4

Hamming embedding

  • Design a mapping function
  • Objective

– neighborhood in Hamming space reflects true neighborhood

  • Advantages

– compact descriptor – fast distance computation

  • 4

000 001 110 111 011 010 101 100

slide-5
SLIDE 5
  • Initialization: Randomly draw L directions
  • For a given vector , compute a bit for each direction, as
  • 1. Project
  • 2. And sign
  • Properties

– For two vectors and – The Hamming distance is related in expectation to the angle as

Locality Sensitive Hashing (LSH)

  • 5

[Charikar 02]

slide-6
SLIDE 6

Our approach

  • Synthesis point of view

– Reconstructed vector – If ‘close’ to on the sphere, then

  • Minimizing the quantization error

– If L < D and , ‘project and sign’ is optimal – If L > D, it is a combinatorial problem

  • Not tractable for large D
  • 6
slide-7
SLIDE 7

Reconstruction point of view

  • ‘Project and sign’ with a frame W
  • ‘Project and sign’ with a tight frame W
  • Our algorithm qoLSH

– quantization optimized LSH

  • ‘AntiSparse’ [Jégou 11]

– Too slow for large D

  • Optimal

– Untractable for large D

  • 7
  • ptimality

simplicity

slide-8
SLIDE 8

qoLSH algorithm

  • 8
  • Parameter: randomly draw a tight frame
  • Initialization: input

– ‘project and sign’:

  • Iteration k + 1

– For any j

  • Flip j-th bit:
  • Measure cosine:

– Keep best flip

slide-9
SLIDE 9

Estimated angle vs True angle

0.00 0.79 1.57 2.36 3.14 0.00 0.79 1.57 2.36 3.14 Estimated

  • LSH

qoLSH

  • 9

Synthetic data D = 8, L=64

slide-10
SLIDE 10

Angle estimation error analysis

BIAS STANDARD DEVIATION

  • 10

Synthetic data D = 128, L=256

qoLSH reduces estimation bias and variance compared to LSH

slide-11
SLIDE 11

Application the Nearest Neighbor Search

  • 11

query qoLHS Symmetric similarity Candidates reconstruction Asymmetric similarity Candidates database Re-ranking Max-heap

slide-12
SLIDE 12

Experimental details

  • Dataset
  • Synthetic ( n = 1 million, D = 8)
  • SIFT ( n = 1 million, D = 128)
  • http://corpus-texmex.irisa.fr
  • Algorithms
  • LSH with or without tight frame
  • qoLSH
  • anti-sparse
  • quantization optimal (if tractable)
  • Performance measurement
  • 1-Recall@R: probability that the true nearest neighbor belongs to a short list
  • f R candidates
  • 12
slide-13
SLIDE 13

Recall on synthetic data (n = 1M, D = 8)

  • 13
slide-14
SLIDE 14

Recall on real SIFT data (n = 1M, D = 128)

  • 14
slide-15
SLIDE 15

Conclusion

  • Hamming embedding dedicated for cosine similarity estimation
  • L<D

– ‘Project and sign’ is optimal with orthogonal random projection

  • L>D

– Tight frame is a good choice – ‘Project and sign’ is suboptimal – Our reconstruction based approach

  • decreases quantization error
  • improves cosine similarity estimation
  • improves quality of approximate NN search
  • strikes a good trade-off between quality and complexity

http://people.rennes.inria.fr/Raghavendran.Balu/code/qolsh.zip

Package Online!

  • 15
slide-16
SLIDE 16

QUESTIONS?

Thank You!

  • 16
slide-17
SLIDE 17
  • 17

000 001 110 111 011 010 101 100

slide-18
SLIDE 18

LSH suboptimality when L > D

  • When L>D, is not orthogonal

– Entropy H(B) < L bits

  • Example
  • LSH (sub optimal):
  • Optimal
  • 18