COMP9313: Big Data Management High Dimensional Similarity Search - - PowerPoint PPT Presentation
COMP9313: Big Data Management High Dimensional Similarity Search - - PowerPoint PPT Presentation
COMP9313: Big Data Management High Dimensional Similarity Search Similarity Search Problem Definition: Given a query and dataset , find o , where is similar to Two types of similarity search
- Problem Definition:
- Given a query π and dataset πΈ, find o β πΈ, where
π is similar to π
- Two types of similarity search
- Range search:
- πππ‘π’ π, π β€ Ο
- Nearest neighbor search
- πππ‘π’ πβ, π β€ πππ‘π’ π, π , βπ β πΈ
- Top-k version
- Distance/similarity function varies
- Euclidean, Jaccard, inner product, β¦
- Classic problem, with mutual solutions
2
Similarity Search
π πβ π
- Applications and relationship to Big Data
- Almost every object can be and has been
represented by a high dimensional vector
- Words, documents
- Image, audio, video
- β¦
- Similarity search is a fundamental process in
information retrieval
- E.g., Google search engine, face recognition system, β¦
- High Dimension makes a huge difference!
- Traditional solutions are no longer feasible
- This lecture is about why and how
- We focus on high dimensional vectors in Euclidean space
3
High Dimensional Similarity Search
Similarity Search in Low Dimensional Space
4
Similarity Search in One Dimensional Space
- Just numbers, use binary search, binary
search tree, B+ Treeβ¦
- The essential idea behind: objects can be
sorted
Similarity Search in Two Dimensional Space
- Why binary search no longer works?
- No order!
- Voronoi diagram
Euclidean distance Manhattan distance
- Partition based algorithms
- Partition data into βcellsβ
- Nearest neighbors are in the same cell with query or
adjacent cells
- How many βcellsβ to probe on 3-dimensional
space?
7
Similarity Search in Two Dimensional Space
- Triangle inequality
- πππ‘π’ π¦, π β€ πππ‘π’ π¦, π§ + πππ‘π’ π§, π
- Orchardβs Algorithm
- for each π¦ β πΈ, create a list of points in increasing
- rder of distance to π¦
- given query π, randomly pick a point π¦ as the
initial candidate (i.e., pivot π), compute πππ‘π’ π, π
- walk along the list of π, and compute the
distances to π. If found π§ closer to π than π, then use π§ as the new pivot (e.g., π β π§).
- repeat the procedure, and stop when
- πππ‘π’ π, π§ > 2 β πππ‘π’ π, π
8
Similarity Search in Metric Space
- Orchardβs Algorithm, stop when πππ‘π’ π, π§ >
2 β πππ‘π’ π, π 2 β πππ‘π’ π, π < πππ‘π’ π, π§ and πππ‘π’ π, π§ β€ πππ‘π’ π, π + πππ‘π’ π§, π β 2 β πππ‘π’ π, π < πππ‘π’ π, π + πππ‘π’ π§, π β πππ‘π’ π, π < πππ‘π’ π§, π
- Since the list of π is in increasing order of
distance to π, πππ‘π’ π, π§ > 2 β πππ‘π’ π, π hold for all the rest π§βs.
9
Similarity Search in Metric Space
None of the Above Works in High Dimensional Space!
10
- Refers to various phenomena that arise in
high dimensional spaces that do not occur in low dimensional settings.
- Triangle inequality
- The pruning power reduces heavily
- What is the volume of a high dimensional
βringβ (i.e., hyperspherical shell)?
- ππ πππ π₯=1,π=2
πππππ π =10,π=2 = 29%
- ππ πππ π₯=1,π=100
πππππ π =10,π=100 = 99.997%
11
Curse of Dimensionality
- There is no sub-linear solution to find the
exact result of a nearest neighbor query
- So we relax the condition
- approximate nearest neighbor search (ANNS)
- allow returned points to be not the NN of query
- Success: returns the true NN
- use success rate (e.g., percentage of succeed
queries) to evaluate the method
- Hard to bound the success rate
12
Approximate Nearest Neighbor Search in High Dimensional Space
- Success: returns π such that
- πππ‘π’ π, π β€ π β πππ‘π’(πβ, π)
- Then we can bound the success
probability
- Usually noted as 1 β Ξ΄
- Solution: Locality Sensitive Hashing
(LSH)
13
c-approximate NN Search
π πβ ππ π
- Hash function
- Index: Map data/objects to values (e.g., hash key)
- Same data β same hash key (with 100% probability)
- Different data β different hash keys (with high probability)
- Retrieval: Easy to retrieve identical objects (as
they have the same hash key)
- Applications: hash map, hash join
- Low cost
- Space: π(π)
- Time: π(1)
- Why it cannot be used in nearest neighbor search?
- Even a minor difference leads to totally different hash keys
14
Locality Sensitive Hashing
- Index: make the hash functions error tolerant
- Similar data β same hash key (with high probability)
- Dissimilar data β different hash keys (with high probability)
- Retrieval:
- Compute the hash key for the query
- Obtain all the data has the same key with query (i.e.,
candidates)
- Find the nearest one to the query
- Cost:
- Space: π(π)
- Time: π 1 + π(|ππππ|)
- It is not the real Locality Sensitive Hashing!
- We still have several unsolved issuesβ¦
15
Locality Sensitive Hashing
- Formal definition:
- Given point π1, π2, distance π
1, π 2, probability
π1, π2
- An LSH function β(β ) should satisfy
- Pr β π1 = β π2
β₯ π1, if πππ‘π’ π1, π2 β€ π
1
- Pr β π1 = β π2
β€ π2, if πππ‘π’ π1, π2 > π
2
- What is β β for a given distance/similarity
function?
- Jaccard similarity
- Angular distance
- Euclidean distance
16
LSH Functions
- Each data object is a set
- πΎπππππ π π1, π2 =
|ππβ©ππ| |ππβͺππ|
- Randomly generate a global order for all the
elements in C =Ϊ1
π ππ
- Let β(π) be the minimal member of π with
respect to the global order
- For example, π = {π, π, π, β, π}, we use inversed
alphabet order, then re-ordered π = {π, β, π, π, π}, hence β π = π.
17
MinHash - LSH Function for Jaccard Similarity
- Now we compute Pr β π1 = β π2
- Every element π β π1 βͺ π2 has equal chance
to be the first element among π1 βͺ π2 after re-
- rdering
- π β π1 β© π2 if and only if β π1 = β π2
- π β π1 β© π2 if and only if β π1 β β π2
- Pr β π1 = β π2
=
|{ππ|βπ π1 =βπ π2 }| |{ππ}|
=
|ππβ©ππ| |ππβͺππ| =
πΎπππππ π π1, π2
18
MinHash
- Each data object is a d dimensional vector
- π(π¦, π§) is the angle between π¦ and π§
- Randomly generate a normal vector π, where
ππ~π(0,1)
- Let β π¦; π = sgn(πππ¦)
- sgn o = α 1; ππ π β₯ 0
β1; ππ π < 0
- π¦ lies on which side of πβs
corresponding hyperplane
19
SimHash β LSH Function for Angular Distance
- Now we compute Pr β π1 = β π2
- β π1 β β π2 iff π1 and π2 are on different
sides of the hyperplane with π as its normal vector
- Pr β π1 = β π2
= 1 β π
π
20
SimHash
π1 π2 ΞΈ π π π =
- Each data object is a d dimensional vector
- πππ‘π’ π¦, π§ =
Ο1
π π¦π β π§π 2
- Randomly generate a normal vector π, where
ππ~π(0,1)
- Normal distribution is 2-stable, i.e., if ππ~π(0,1),
then Ο1
π ππ β π¦π ~π(0, π¦ 2 2)
- Let β π¦; π, π =
πππ¦+π π₯
, where π~π(0,1) and π₯ is user specified parameter
- Pr β π1; π, π = β π2; π, π
= Χ¬
π₯ 1 π1,π2 π π π’ π1,π2
1 β
π’ π₯ ππ’
- π
π β is the pdf of the absolute value of normal variable
21
p-stable LSH - LSH function for Euclidean distance
- Intuition of p-stable LSH
- Similar points have higher chance to be hashed
together
22
p-stable LSH
23
Pr β π¦ = β π§ for different Hash Functions
MinHash SimHash p-stable LSH
- Hard to distinguish if two pairs have distances
close to each other
- Pr β π1 = β π2
β₯ π1, if πππ‘π’ π1, π2 β€ π
1
- Pr β π1 = β π2
β€ π2, if πππ‘π’ π1, π2 > π
2
- We also want to control where the drastic
change happensβ¦
- Close to πππ‘π’(πβ, π)
- Given range
24
Problem of Single Hash Function
- Recall for a single hash function, we have
- Pr β π1 = β π2
= π(πππ‘π’(π1, π2)), denoted as ππ1,π2
- Now we consider two scenarios:
- Combine π hashes together, using AND operation
- One must match all the hashes
- Pr πΌπ΅ππΈ π1 = πΌπ΅ππΈ π2
= ππ1,π2
π
- Combine π hashes together, using OR operation
- One need to match at least one of the hashes
- Pr πΌππ π1 = πΌππ π2
= 1 β (1 β ππ1,π2)π
- Not match only when all the hashes donβt match
25
AND-OR Composition
- Example with minHash, π = 5, π = 5
26
AND-OR Composition
Pr πΌπ΅ππΈ π1 = πΌπ΅ππΈ π2 Pr πΌππ π1 = πΌππ π2 Pr β π1 = β π2
- Let βπ,π be LSH functions, where π β
1,2, β¦ , π , π β {1,2, β¦ , π}
- Let πΌπ π = [βπ,1 π , βπ,2 π , β¦ , βπ,π π ]
- super-hash
- πΌπ π1 = πΌπ π2 β βπ β 1,2, β¦ , π , βπ,π π1 = βπ,π π2
- Consider query π and any data point π, π is a
nearest neighbor candidate of π if
- βπ β 1,2, β¦ , π , πΌπ π = πΌπ π
- The probability of π is a nearest neighbor
candidate of π is
- 1 β (1 β ππ,π
π )π
27
AND-OR Composition in LSH
- 1 β (1 β ππ,π
π )π changes with ππ,π , (π = 20, π =
5)
- E.g., we are expected to retrieve 98.8% of the
data with Jaccard > 0.9
28
The Effectiveness of LSH
ππ,π 1 β (1 β ππ,π
π )π
0.2 0.002 0.4 0.050 0.6 0.333 0.7 0.601 0.8 0.863 0.9 0.988
- False Positive:
- returned data with dist o, q > π
2
- False Negative
- not returned data with dist o, q < π
1
- They can be controlled by carefully chosen π
and π
- Itβs a trade-off between space/time and accuracy
29
False Positives and False Negatives
- Pre-processing
- Generate LSH functions
- minHash: random permutations
- simHash: random normal vectors
- p-stable: random normal vectors and random uniform values
- Index
- Compute πΌπ π for each data object π, π β {1, β¦ , π}
- Index π using πΌπ π as key in the π-th hash table
- Query
- Compute πΌπ π for query π, π β {1, β¦ , π}
- Generate candidate set π βπ β 1, β¦ , π , πΌπ π = πΌπ π
- Compute the actual distance for all the candidates and
return the nearest one to the query
30
The Framework of NNS using LSH
- Concatenate k hashes is too βstrongβ
- βπ,π π1 β βπ,π π2 β πΌπ π β πΌπ π for any π
- Not adaptive to the distribution of the
distances
- What if not enough candidates?
- Need to tune w (or build indexes different wβs) to
handle different cases
31
The Drawback of LSH
- Observation:
- If πβs nearest neighbor does not falls into πβs hash
bucket, then most likely it will fall into the adjacent bucket to πβs
- Why? Ο1
π ππ β π¦π β Ο1 π ππ β ππ ~π(0, π¦, π 2 2)
- Idea:
- Not only look at the hash bucket where π falls
into, but also those adjacent to it
- Problem:
- How many such bucket? 2π
- And they are not equally important!
32
Multi-Probe LSH
- Consider the case when π = 2:
- Note that πΌπ(π) = (βπ,1 π , βπ,2(π))
- The ideal probe order would be:
- βπ,1 π , βπ,2 π : 0.315
- βπ,1 π , βπ,2 π β 1: 0.284
- βπ,1 π + 1, βπ,2 π : 0.150
- βπ,1 π β 1, βπ,2 π : 0.035
- βπ,1 π , βπ,2 π + 1: 0.019
33
Multi-Probe LSH
βπ,1 βπ,2 We donβt have to compute the integration, but use the offset between f π and the boundaries.
- Pros:
- Requires less π
- Because we use hash tables more efficiently
- More robust against the unlucky points
- Cons:
- Lose the theoretical guarantee about the results
- Not parallel-friendly
34
Multi Probe LSH
- C2LSH (SIGMOGβ12 paper)
- Which one is closer to π?
- We will omit the theoretical parts hence leads to a
slightly different version to the paper.
- But the essential ideas are the same
- Project 1 is to implement C2LSH using PySpark!
35
Collision Counting LSH (C2LSH)
π
1 1 1 1
π1
1 1 1 2
π2
1 1 1 1 1 1 1 1 1 2 1 1 2 2 3 4 1 1 1 1 2 1 1 1 1 2 3 4
- Collision: match on a single hash function
- Use number of collisions to determine the
candidates
- Match one of the super hash with π β collides at
least π½π hash values with π
- Recall in LSH, The probability of π with
dist π, π β€ π
1 is a nearest neighbor
candidate of π is 1 β (1 β π1
π)π
- Now we compute the case with collision
countingβ¦
36
Counting the Collisions
- βπ with dist π, π β€ π
1, we have
- Pr #ππππππ‘πππ π β₯ π½π = Οπ=π½π
π π π
ππ 1 β π πβπ
- π = Pr βπ π = βπ π
β₯ π1
- We define π Bernoulli random variables ππ βΌ
πΆ(π, 1 β π) with 1 β€ π β€ π.
- Let ππ equal 1 if π does not collide with π
- i.e., Pr ππ = 1 = 1 β π
- Let ππ equal 0 if π collides with π
- i.e., Pr ππ = 0 = π
- Hence E ππ = 1 β π
- Thus πΉ( ΰ΄€
π) = 1 β π , where ΰ΄€ π =
Οπ=1
π
ππ π
.
- Let π’ = π β π½ > 0, we have:
- Pr ΰ΄€
π β πΉ ΰ΄€ π β₯ π’ = Pr
Οπ=1
π
ππ π
β 1 β π β₯ π’ = Pr[Οππ β₯ (1 β π½)π]
37
The Collision Probability
- From Hoeffdingβs Inequality, we have
- Pr ΰ΄€
π β πΉ ΰ΄€ π β₯ π’ = Pr Οππ β₯ 1 β π½ π β€ exp β
2 πβπ½ 2π2 Οπ=1
π
1β0 2
= exp β2 π β π½ 2π β€ exp β2 π1 β π½ 2π
- Since the event β#collision(π) β₯ π½πβ is equivalent to the
event βπ misses the collision with π less than (1 β π½)π timesβ,
- Pr #ππππππ‘πππ π β₯ π½π = Pr Οππ < 1 β π½ π β₯ 1 β
exp β2 π1 β π½ 2π
- Now you can compute the case for π with dist π, π β₯ π
2 in
a similar wayβ¦
- Then we can accordingly set π½ to control false positives and
false negatives
38
The Collision Probability
- When we are not getting enough candidatesβ¦
- E.g., # of candidates < top-k
- Observation:
- A close point π usually falls into adjacent hash
buckets of πβs if it does not collide with π
- Why?
- Ο1
π ππ β π¦π β Ο1 π ππ β ππ ~π(0, π¦, π 2 2)
- Idea:
- Include the adjacent hash buckets into
consideration
- So you donβt need to re-hash them againβ¦
39
Virtual Rehashing
- At first consider β π = β π
- Consider β π = β π Β± 1 if not enough
candidates
- Then β π = β π Β± 2 and so onβ¦
40
Virtual Rehashing
π
1 1 1 1 1 1 1 1 1 1
π1
1 2
- 1
1 2 4
- 3
- 1
π
1 1 1 1 1 1 1 1 1 1
π1
1 2
- 1
1 2 4
- 3
- 1
π
1 1 1 1 1 1 1 1 1 1
π1
1 2
- 1
1 2 4
- 3
- 1
- Pre-processing
- Generate LSH functions
- Random normal vectors and random uniform values
- Index
- Compute and store βπ π for each data object π,
π β {1, β¦ , π}
- Query
- Compute βπ π for query π, π β {1, β¦ , π}
- Take those π that shares at least π½π hashes with π as
candidates
- Relax the collision condition (e.g., virtual rehashing) and
repeat the above step, until we got enough candidates
41
The Framework of NNS using C2LSH
candGen(data_hashes, query_hashes, π½π, πΎπ):
- ffset β 0
cand β β while true: for each (id, hashes) in data_hashes: if count(hashes, query_hashes, offset) β₯ π½π: cand β cand βͺ {id} if ππππ < πΎπ:
- ffset β offset + 1
else: break return cand
42
Pseudo code of Candidate Generation in C2LSH
count(hashes_1, hashes_2, offset): counter β 0 for each βππ‘β1, βππ‘β2 in hashes_1, hashes_2: if βππ‘β1 β βππ‘β2 β€ offset: counter β counter + 1 return counter
43
Pseudo code of Candidate Generation in C2LSH
- Spec has been released, deadline: 18 Jul, 2020
- Late Penalty: 10% on day 1 and 30% on each
subsequent day.
- Implement a light version of C2LSH (i.e., the
- ne we introduced in the lecture)
- Start working ASAP
- Evaluation: Correctness and Efficiency
- Must use PySpark, some python modules and
PySpark functions are banned.
- E.g., numpy, pandas, collect(), take(), β¦
- Use transformations!
44
Project 1
- There will be a bonus part (max 20 points) to
encourage efficient implementations.
- Details in the spec
- Make sure you have valid output
- Make your own test cases, a real dataset would
be more desirable
- Toy example in the spec is a real βtoyβ (e.g., for
babiesβ¦)
- Wonβt accept excuses like βit works on my own
computerβ
- Donβt violate the Student Conduct!!!
45
Project 1
Product Quantization
and K-Means Clustering
- NaΓ―ve (but exact) solution:
- Linear scan: compute πππ‘π’(π, π) for all π β πΈ
- πππ‘π’ π, π =
Οπ=1
π
ππ β ππ 2
- π ππ
- π times (π subtractions + π β 1 additions + π multiplications)
- Storage is also costly: π ππ
- Could be problematic in DBMS and distributed systems
- This motivates the idea of compression
47
Recall: NNS in High Dimensional Euclidean Space
- Idea: compressed representation of vectors
- Each vector π is represented by a representative
- Denoted as π π(π)
- We will discuss how to get the representatives later
- We control the total number of representatives for
the dataset (denoted as π)
- One representative represents multiple vectors
- Instead of store π, we store its representative id
- π floats => 1 integer
- Instead of compute πππ‘π’(π, π), we compute
πππ‘π’(π π(π), π)
- We only need k computations of distance!
48
Vector Quantization
- Assigning representatives is essentially a
partition problem
- Construct a βgoodβ partition of a database of π
- bjects into a set of π clusters
- How to measure the βgoodnessβ of a given
partitioning scheme?
- Cost of a cluster
- π·ππ‘π’ π·π = Οππβπ·π π
π β ππππ’ππ π·π 2 2
- Cost of π clusters: sum of π·ππ‘π’ π·π
49
How to Generate Representatives
- Itβs an optimization problem!
- Global optimal:
- NP-hard (for a wide range of cost functions)
- Requires exhaustively enumerate all π
π partitions
- Stirling numbers of the second kind
- π
π ~ ππ π! when π β β
- Heuristic methods:
- k-means
- Many variants
50
Partitioning Problem: Basic Concept
- Given π, the k-means algorithm is implemented in
four steps:
- 1. Partition objects into k nonempty subsets (randomly)
- 2. Compute seed points as the centroids of the clusters of
the current partitioning (the centroid is the center, i.e., mean point, of the cluster)
- 3. Assign each object to the cluster with the nearest seed
point
- 4. Go back to Step 2, stop when the assignment does not
change
51
The k-Means Clustering Method
An Example of k-Means Clustering
K=2 Arbitrarily partition
- bjects into
k groups Update the cluster centroids Update the cluster centroids Reassign objects Loop if needed
52
The initial data set
βΌ
Partition objects into k nonempty subsets
βΌ
Repeat
βΌ
Compute centroid (i.e., mean point) for each partition
βΌ
Assign each object to the cluster of its nearest centroid
βΌ
Until no change
- Encode the vectors
- Generate a codebook π = {π1, β¦ , ππ} via k-
means
- Assign π to its nearest codeword in π
- E.g., π π π = ππ π β 1 β¦ π such that πππ‘π’ π, ππ β€ πππ‘π’ π, π
π βπ
- Represent each vector π by its assigned codeword
- Assume π = 256, π = 216
- Before: 4 bytes * 256 = 1024 bytes for each
vector
- Now:
- data: 16 bits = 2 bytes
- codebook: 4 * 256 * 216
53
Vector Quantization
- Given query π, how to find a
point close to π?
- Algorithm:
- Compute π π(π)
- Candidate set π· = all data vectors
associated with π π(π)
- Verification: compute distance
between π and ππ β π·
- Requires loading the vectors in C
- Any problem/improvement?
54
Vector Quantization β Query Processing
Inverted index: a hash table that maps π
π to a list of oi that
are associated with π
π
- To achieve better accuracy, fine-grained
quantizer with large π is needed
- Large π
- Costly to run K-means
- Computing π π(π) is expensive: π(ππ)
- May need to look beyond π π(π) cell
- Solution:
- Product Quantization
55
Limitations of VQ
- Idea
- Partition the dimension into m partitions
- Accordingly a vector => subvectors
- Use separate VQ with π codewords for each
chunk
- Example:
- 8-dim vector decomposed into π = 2 subvectors
- Each codebook has π = 4 codewords, (i.e., ππ,π)
- Total space in bits:
- Data: n β π β πππ(π)
- Codebook: π β
π π β π β 32
56
Product Quantization
57
Example of PQ
2 4 6 5
- 2
6 4 1 1 2 1 4 9
- 1
2 β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ 3.3 4.1 2.7 1.4 β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ 00 00 01 11 β¦ β¦ β¦ β¦ β¦ β¦ 2.1 3.6 5.3 6.6 1.2 1.5 2.4 3.3 β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
π1,0 π1,2 π1,3 π1,1 π2,1 π2,3 π2,0 π2,2
- Euclidean distance between
a query point q and a data point encoded as t
- Restore the virtual joint center
by looking up each partition
- f t in the corresponding
codebooks => p
- π2 π, π’ = Οπ=1
π
ππ β ππ 2
- Known as Asymmetric
Distance Computation (ADC)
- π2 π, π’ = Οπ=1
π
π(π) β ππ,π’(π)
2
58
Distance Estimation
- 3
7 3 2 3.3 4.1 2.7 1.4 β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
π1,0 π1,2 π1,3 π1,1 π2,1 π2,3 π2,0 π2,2
2.1 3.6 5.3 6.6 1.2 1.5 2.4 3.3 β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ 01 00 1.2 1.5 2.4 3.3 3.3 4.1 2.7 1.4
q t p
1 3 5 4
- Compute ADC for every point in the database
- How?
- Candidate = those with the π smallest AD
- [Optional] Reranking (if π > 1):
- Load the data vectors and compute the actual
Euclidean distance
- Return the one with the smallest distance
59
Query Processing
60
Query Processing
- 3
7 3 2 01 00
q t1
1 3 5 4 11 10
t2
β¦ β¦ β¦ β¦
π(1) β π1,π’1(1)
2 + π(2) β π2,π’1(2) 2
π(1) β π1,π’2(1)
2 + π(2) β π2,π’2(2) 2 3.3 4.1 2.7 1.4 β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ 2.1 3.6 5.3 6.6 1.2 1.5 2.4 3.3 β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
π1,0 π1,2 π1,3 π1,1 π2,1 π2,3 π2,0 π2,2
- Pre-processing:
- Step 1: partition data vectors
- Step 2: generate codebooks (e.g., k-means)
- Step 3: encode data
- Query
- Step 1: compute distance between q and
codewords
- Step 2: compute AD for each point and return the
candidates
- Step 3: re-ranking (optional)
61