dimension reduction
play

Dimension Reduction and Nearest Neighbor Search Advanced - PowerPoint PPT Presentation

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall 2018 Dimension reduction: Why we care? High dimension data are common, yet working on them directly is expensive. Dimension reduction: Why we


  1. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 β€’ Each 𝐡 π‘—π‘˜ is chosen i.i.d. from 𝑢(0,1/𝑙) β€’ Linear combination of independent Gaussian r.v. is also Gaussian 2 , 𝑍~𝑢 𝜈 𝑍 , 𝜏 𝑍 2 β†’ π‘π‘Œ + 𝑐𝑍~𝑢 π‘πœˆ π‘Œ + π‘πœˆ 𝑍 , 𝑏 2 𝜏 π‘Œ 2 + 𝑐 2 𝜏 𝑍 2 β€’ π‘Œ~𝑢 𝜈 π‘Œ , 𝜏 π‘Œ 𝑣 is unit vector

  2. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 β€’ Each 𝐡 π‘—π‘˜ is chosen i.i.d. from 𝑢(0,1/𝑙) β€’ Linear combination of independent Gaussian r.v. is also Gaussian 2 , 𝑍~𝑢 𝜈 𝑍 , 𝜏 𝑍 2 β†’ π‘π‘Œ + 𝑐𝑍~𝑢 π‘πœˆ π‘Œ + π‘πœˆ 𝑍 , 𝑏 2 𝜏 π‘Œ 2 + 𝑐 2 𝜏 𝑍 2 β€’ π‘Œ~𝑢 𝜈 π‘Œ , 𝜏 π‘Œ 𝑣 is unit vector Moreover, these 𝐡𝑣 𝑗 are mutually independent!

  3. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  4. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  5. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  6. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 In terms of expectation we are fine, but how fast do we deviate from expectation?

  7. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  8. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  9. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  10. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  11. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , Notice π‘Œ 𝑗 = 𝑙 β‹… 𝑍 𝑗 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  12. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  13. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 For suitable Chernoff bound for 𝝍 πŸ‘ -distribution: 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  14. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  15. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  16. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  17. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  18. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  19. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  20. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  21. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  22. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  23. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 when πœ‡ ≀ 1/4 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  24. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 when πœ‡ ≀ 1/4 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 let πœ‡ = πœ—/4 1βˆ’2𝑑

  25. Theorem (Johnson-Lindenstrauss 1984) : βˆ€0 < πœ— < 1 , for any set 𝑇 of π‘œ points from ℝ 𝑒 , there is a 𝜚: ℝ 𝑒 β†’ ℝ 𝑙 with 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) , such that βˆ€π‘¦ 𝑗 , 𝑦 π‘˜ ∈ 𝑇 : 2 ≀ 2 ≀ (1 + πœ—) 𝑦 𝑗 βˆ’ 𝑦 π‘˜ 2 1 βˆ’ πœ— 𝑦 𝑗 βˆ’ 𝑦 π‘˜ 𝜚 𝑦 𝑗 βˆ’ 𝜚 𝑦 π‘˜ 2 2 2 β€œ JLT states in Euclidian space, it is always possible to embed a set of π‘œ points in arbitrary dimension to 𝑃(log π‘œ) dimension with constant distortion. ” β€œ Even better, it is very easy to find such 𝜚 : Just sample a random 𝑙 Γ— 𝑒 matrix 𝐡 ”

  26. Nearest Neighbor Search (NNS) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦

  27. Nearest Neighbor Search (NNS) a set a distance function satisfying triangle inequality Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦

  28. Nearest Neighbor Search (NNS) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦

  29. Nearest Neighbor Search (NNS) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦 Can find many applications in: β€’ database systems β€’ pattern recognition β€’ machine learning β€’ bioinformatics β€’ …

  30. Nearest Neighbor Search (NNS) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦 Can find many applications in: β€’ database systems ? β€’ pattern recognition sound β€’ machine learning β€’ bioinformatics β€’ … size

  31. Nearest Neighbor Search (NNS) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 𝑉 𝑒 for some finite 𝑉 𝑦 ∈ 𝑉 𝑒 , find the 𝑧 𝑗 which is closest to Τ¦ Query: given a point Τ¦ 𝑦 Goal: Efficiently answer the query What efficiency we care? β€’ Usually space and time Trivial solution: β€’ No preprocessing, just linear search Voronoi diagram When dimension 𝑒 is small: β€’ Binary search when 𝑒 = 1 𝑙 -d tree β€’ 𝑙 -d tree β€’ Voronoi diagram β€’ …

  32. Nearest Neighbor Search (NNS) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 𝑉 𝑒 for some finite 𝑉 𝑦 ∈ 𝑉 𝑒 , find the 𝑧 𝑗 which is closest to Τ¦ Query: given a point Τ¦ 𝑦 Goal: Efficiently answer the query What if dimension 𝑒 is large, say 𝑒 ≫ log π‘œ ?

  33. Nearest Neighbor Search (NNS) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 𝑉 𝑒 for some finite 𝑉 𝑦 ∈ 𝑉 𝑒 , find the 𝑧 𝑗 which is closest to Τ¦ Query: given a point Τ¦ 𝑦 Goal: Efficiently answer the query What if dimension 𝑒 is large, say 𝑒 ≫ log π‘œ ? Curse of dimensionality: It is conjectured that to solve NNS in high dimension requires either super-polynomial( π‘œ ) space or super-polynomial( 𝑒 ) time.

  34. Nearest Neighbor Search (NNS) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 𝑉 𝑒 for some finite 𝑉 𝑦 ∈ 𝑉 𝑒 , find the 𝑧 𝑗 which is closest to Τ¦ Query: given a point Τ¦ 𝑦 Goal: Efficiently answer the query What if dimension 𝑒 is large, say 𝑒 ≫ log π‘œ ? Curse of dimensionality: It is conjectured that to solve NNS in high dimension requires either super-polynomial( π‘œ ) space or super-polynomial( 𝑒 ) time. Blessing: Randomization + Approximation

  35. Approximate Near ( est ) Neighbor (ANN) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise

  36. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise

  37. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead.

  38. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝐸 π‘›π‘—π‘œ = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) min 𝐸 𝑛𝑏𝑦 = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) max 𝑆 = 𝐸 π‘›π‘—π‘œ 𝑑 βˆ’1 , 𝐸 π‘›π‘—π‘œ 𝑑 0 , 𝐸 π‘›π‘—π‘œ 𝑑 1 , β‹― , 𝐸 𝑛𝑏𝑦 β‹… β‹… β‹… 2 2 2

  39. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝐸 π‘›π‘—π‘œ = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) min 𝐸 𝑛𝑏𝑦 = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) max 𝑆 = 𝐸 π‘›π‘—π‘œ 𝑑 βˆ’1 , 𝐸 π‘›π‘—π‘œ 𝑑 0 , 𝐸 π‘›π‘—π‘œ 𝑑 1 , β‹― , 𝐸 𝑛𝑏𝑦 β‹… β‹… β‹… 2 2 2 Let 𝑠 βˆ— be the min in 𝑆 s.t. ( 𝑑, 𝑠 βˆ— ) -ANN returns yes with 𝑧 βˆ—

  40. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝐸 π‘›π‘—π‘œ = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) min 𝐸 𝑛𝑏𝑦 = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) max 𝑆 = 𝐸 π‘›π‘—π‘œ 𝑑 βˆ’1 , 𝐸 π‘›π‘—π‘œ 𝑑 0 , 𝐸 π‘›π‘—π‘œ 𝑑 1 , β‹― , 𝐸 𝑛𝑏𝑦 β‹… β‹… β‹… 2 2 2 Let 𝑠 βˆ— be the min in 𝑆 s.t. ( 𝑑, 𝑠 βˆ— ) -ANN returns yes with 𝑧 βˆ— 𝑦, 𝑧 βˆ— ≀ 𝑦, 𝑧 𝑗 > 𝑠 βˆ— / 𝑑 𝑑 β‹… 𝑠 βˆ— βˆ€π‘§ 𝑗 ∈ π‘Œ: dist Τ¦ dist Τ¦

  41. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝑑 -ANN can be solved βˆ€π‘  : ( 𝑑, 𝑠) -ANN can be solved 𝐸 𝑛𝑏𝑦 𝐸 π‘›π‘—π‘œ with space 𝑃 𝑑 β‹… log 𝑑 ΰ΅— with space 𝑑 and query time 𝑒 𝐸 𝑛𝑏𝑦 𝐸 π‘›π‘—π‘œ and query time 𝑃 𝑒 β‹… log 2 log 𝑑 ΰ΅—

  42. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝑑 -ANN can be solved βˆ€π‘  : ( 𝑑, 𝑠) -ANN can be solved 𝐸 𝑛𝑏𝑦 𝐸 π‘›π‘—π‘œ with space 𝑃 𝑑 β‹… log 𝑑 ΰ΅— with space 𝑑 and query time 𝑒 𝐸 𝑛𝑏𝑦 𝐸 π‘›π‘—π‘œ and query time 𝑃 𝑒 β‹… log 2 log 𝑑 ΰ΅—

  43. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise GF(2): two elements {0,1} , XOR as sum, AND as multiplication. 𝑒 Therefore, 𝑨 𝑗 π‘˜ = 𝐡𝑧 𝑗 π‘˜ = Οƒ π‘š=1 𝐡 π‘˜π‘š β‹… 𝑧 𝑗 π‘š mod 2 . Let 𝑙 , π‘ž and 𝑑 to be fixed later. Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . Space: 𝑃(π‘œ β‹… 2 𝑙 ) Query time: 𝑃(𝑒𝑙) computation + 𝑃(1) memory access

  44. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Let 𝑙 , π‘ž and 𝑑 to be fixed later. Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) .

  45. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Let 𝑙 , π‘ž and 𝑑 to be fixed later. Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . 𝑧 ∈ 0,1 𝑒 : For suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦

  46. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Let 𝑙 , π‘ž and 𝑑 to be fixed later. Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . 𝑧 ∈ 0,1 𝑒 : For suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ (𝑑, 𝑠) -ANN is solved w.h.p.

  47. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦

  48. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž)

  49. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž)

  50. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž) an alternative view regarding the generation of 𝐡 𝑗 : β€’ build 𝐷 βŠ† [𝑒] s.t. each element in [𝑒] is chosen independently with pr. 2π‘ž β€’ each coordinate in 𝐷 is independently set to 0 or 1 each with pr. 1/2

  51. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž) an alternative view regarding the generation of 𝐡 𝑗 : β€’ build 𝐷 βŠ† [𝑒] s.t. each element in [𝑒] is chosen independently with pr. 2π‘ž β€’ each coordinate in 𝐷 is independently set to 0 or 1 each with pr. 1/2 observations: β€’ if π‘˜ βˆ‰ 𝐷 for all coordinates π‘˜ where Τ¦ 𝑦 π‘˜ β‰  𝑧 π‘˜ , then 𝐡 Τ¦ Τ¦ 𝑦 𝑗 = 𝐡 Τ¦ 𝑧 𝑗 β€’ otherwise, if exists such π‘˜ ∈ 𝐷 , then once all other entries in 𝐡 𝑗 are fixed, exactly one of the two choices for 𝐡 π‘—π‘˜ will make 𝐡 Τ¦ 𝑦 𝑗 = 𝐡 Τ¦ 𝑧 𝑗

  52. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž) choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠

  53. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠

  54. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise

  55. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise

  56. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 choose 𝑑 = = 2

  57. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 choose 𝑑 = = 2

  58. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise independent 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 choose 𝑑 = = 2

  59. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ Chernoff bound: 𝑙 Let independent r.v. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ∈ {0,1} , let π‘Œ = Οƒ 𝑗=1 π‘Œ 𝑗 , then for 𝑑 > 0 : Pr π‘Œ β‰₯ 𝔽 π‘Œ + 𝑑 ≀ exp βˆ’ 2𝑑 2 choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 𝑙 Pr π‘Œ ≀ 𝔽 π‘Œ βˆ’ 𝑑 ≀ exp βˆ’ 2𝑑 2 𝑙 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise independent 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 3 choose 𝑑 = = 2

  60. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ Chernoff bound: 𝑙 Let independent r.v. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ∈ {0,1} , let π‘Œ = Οƒ 𝑗=1 π‘Œ 𝑗 , then for 𝑑 > 0 : Pr π‘Œ β‰₯ 𝔽 π‘Œ + 𝑑 ≀ exp βˆ’ 2𝑑 2 choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 𝑙 Pr π‘Œ ≀ 𝔽 π‘Œ βˆ’ 𝑑 ≀ exp βˆ’ 2𝑑 2 𝑙 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise independent 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 3 choose 𝑑 = = 2

  61. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise independent 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 choose 𝑑 = = 2

  62. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise ln π‘œ 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 . 1 βˆ’ 2 βˆ’1/𝑠 Ξ€ Ξ€ Let 𝑙 = 1 8βˆ’2 βˆ’(𝑑+2) , π‘ž = 2 and 𝑑 = Ξ€ Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . Space: 𝑃(π‘œ β‹… 2 𝑙 ) Query time: 𝑃(𝑒𝑙) computation + 𝑃(1) memory access

  63. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise ln π‘œ 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 . 1 βˆ’ 2 βˆ’1/𝑠 Ξ€ Ξ€ Let 𝑙 = 1 8βˆ’2 βˆ’(𝑑+2) , π‘ž = 2 and 𝑑 = Ξ€ Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . Space: 𝑃(π‘œ β‹… 2 𝑙 ) Query time: 𝑃(𝑒𝑙) computation + 𝑃(1) memory access Space: π‘œ 𝑃(1) Solve (𝑑, 𝑠) -ANN w.h.p. Query time: 𝑃(𝑒 log π‘œ)

  64. Locality-Sensitive Hashing (LSH) Given a metric space π‘Œ, dist , a random β„Ž: π‘Œ β†’ 𝑉 drawn from β„‹ is an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH if, for all Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ :

  65. Locality-Sensitive Hashing (LSH) Given a metric space π‘Œ, dist , a random β„Ž: π‘Œ β†’ 𝑉 drawn from β„‹ is an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH if, for all Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : π‘ž > π‘Ÿ

  66. Locality-Sensitive Hashing (LSH) Given a metric space π‘Œ, dist , a random β„Ž: π‘Œ β†’ 𝑉 drawn from β„‹ is an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH if, for all Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : If there exists an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH β„Ž: π‘Œ β†’ 𝑉 , then there exists an (𝑠, 𝑑𝑠, π‘ž 𝑙 , π‘Ÿ 𝑙 ) -LSH 𝑕: π‘Œ β†’ 𝑉 𝑙

  67. Locality-Sensitive Hashing (LSH) Given a metric space π‘Œ, dist , a random β„Ž: π‘Œ β†’ 𝑉 drawn from β„‹ is an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH if, for all Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : If there exists an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH β„Ž: π‘Œ β†’ 𝑉 , then there exists an (𝑠, 𝑑𝑠, π‘ž 𝑙 , π‘Ÿ 𝑙 ) -LSH 𝑕: π‘Œ β†’ 𝑉 𝑙 Independently draw β„Ž 1 , β„Ž 2 , β‹― , β„Ž 𝑙 according to the distribution of β„Ž ∈ 𝑉 𝑙 𝑕 𝑦 = β„Ž 1 𝑦 , β„Ž 2 𝑦 , β‹― , β„Ž 𝑙 𝑦

  68. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ :

  69. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in nondecreasing order of 𝑕(𝑧 𝑗 ) . Upon query Τ¦ 𝑦 ∈ π‘Œ : Find all 𝑧 𝑗 such that 𝑕 Τ¦ 𝑦 = 𝑕(𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”.

  70. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in nondecreasing order of 𝑕(𝑧 𝑗 ) . Upon query Τ¦ 𝑦 ∈ π‘Œ : Find all 𝑧 𝑗 such that 𝑕 Τ¦ 𝑦 = 𝑕(𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct If the real answer is β€œyes”: correct with probability at least π‘ž βˆ—

  71. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in nondecreasing order of 𝑕(𝑧 𝑗 ) . Upon query Τ¦ 𝑦 ∈ π‘Œ : Find all 𝑧 𝑗 such that 𝑕 Τ¦ 𝑦 = 𝑕(𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct If the real answer is β€œyes”: correct with probability at least π‘ž βˆ— Space: 𝑃(π‘œ) Time: 𝑃(log π‘œ)

  72. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in nondecreasing order of 𝑕(𝑧 𝑗 ) . Upon query Τ¦ 𝑦 ∈ π‘Œ : Find all 𝑧 𝑗 such that 𝑕 Τ¦ 𝑦 = 𝑕(𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct If the real answer is β€œyes”: correct with probability at least π‘ž βˆ— Space: 𝑃(π‘œ) Time: 𝑃(log π‘œ) + 𝑃(1) in expectation

  73. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑑, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : 1 π‘ž βˆ— , independently draw 𝑕 1 , 𝑕 2 , β‹― , 𝑕 π‘š . Ξ€ Let π‘š = Maintain π’Ž sorted tables: For π‘˜ = 1,2, β‹― , π‘š : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in table- π‘˜ in nondecreasing order of 𝑕 π‘˜ (𝑧 𝑗 ) . Upon query π’š ∈ 𝒀 : Find first 10 β‹… π‘š such 𝑧 𝑗 that βˆƒπ‘˜: 𝑕 π‘˜ Τ¦ 𝑦 = 𝑕 π‘˜ (𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”.

  74. (𝑑, 𝑑𝑠, π‘ž βˆ— , Ξ€ (𝑑, 𝑠) -ANN in metric space (π‘Œ, dist) 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: some point Τ¦ 𝑦 ∈ π‘Œ 1 π‘ž βˆ— , independently draw 𝑕 1 , 𝑕 2 , β‹― , 𝑕 π‘š . Let π‘š = Ξ€ Maintain π’Ž sorted tables: For π‘˜ = 1,2, β‹― , π‘š : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in table- π‘˜ in nondecreasing order of 𝑕 π‘˜ (𝑧 𝑗 ) . Upon query π’š ∈ 𝒀 : Find first 10 β‹… π‘š such 𝑧 𝑗 that βˆƒπ‘˜: 𝑕 π‘˜ Τ¦ 𝑦 = 𝑕 π‘˜ (𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”.

  75. (𝑑, 𝑑𝑠, π‘ž βˆ— , Ξ€ (𝑑, 𝑠) -ANN in metric space (π‘Œ, dist) 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: some point Τ¦ 𝑦 ∈ π‘Œ 1 π‘ž βˆ— , independently draw 𝑕 1 , 𝑕 2 , β‹― , 𝑕 π‘š . Let π‘š = Ξ€ Maintain π’Ž sorted tables: For π‘˜ = 1,2, β‹― , π‘š : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in table- π‘˜ in nondecreasing order of 𝑕 π‘˜ (𝑧 𝑗 ) . Upon query π’š ∈ 𝒀 : Find first 10 β‹… π‘š such 𝑧 𝑗 that βˆƒπ‘˜: 𝑕 π‘˜ Τ¦ 𝑦 = 𝑕 π‘˜ (𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct.

  76. (𝑑, 𝑑𝑠, π‘ž βˆ— , Ξ€ (𝑑, 𝑠) -ANN in metric space (π‘Œ, dist) 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: some point Τ¦ 𝑦 ∈ π‘Œ 1 π‘ž βˆ— , independently draw 𝑕 1 , 𝑕 2 , β‹― , 𝑕 π‘š . Let π‘š = Ξ€ Maintain π’Ž sorted tables: For π‘˜ = 1,2, β‹― , π‘š : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in table- π‘˜ in nondecreasing order of 𝑕 π‘˜ (𝑧 𝑗 ) . Upon query π’š ∈ 𝒀 : Find first 10 β‹… π‘š such 𝑧 𝑗 that βˆƒπ‘˜: 𝑕 π‘˜ Τ¦ 𝑦 = 𝑕 π‘˜ (𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct. If exists 𝑧 𝑑 such that dist Τ¦ 𝑦, 𝑧 𝑑 ≀ 𝑠 , then Pr answer "no" ≀

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend