Approximate Nearest Neighbor via Point- Location among Balls

Outline • Problem and Motivation • Related Work • Background Techniques • Method of Har-Peled (in notes)

Problem • P is a set of points in a metric space. P • Build a data structure to efficiently search ANN

Motivation • Nearest Neighbor Search has lots of applications. • Curse of dimensionality - Voronoi diagram method exponential in dimension. • Settle for approximate answers.

Related Work • Indyk and Motwani • Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality • Reduced ANN to Approximate Point-Location among Equal Balls. • Polynomial construction time. • Sublinear query time.

Related Work • Har-Peled • A Replacement for Voronoi Diagrams of Near Linear Size • Simplified and improved Indyk- Motwani reduction. - Better construction and query time.

Related Work • Sabharwal, Sharma and Sen • Nearest Neighbors Search using Point Location in Balls with applications to approximate Voronoi Decompositions. • Improved number of balls by a logarithmic factor. • Also a complex construction which only requires O(n) balls.

Metric Spaces x • Pair (X, d ) d (y,x) • d : X × X ➝ [0, ∞ ) y d (x,y) X • d (x,y) = 0 iff x = y • d (x,y) = d (y,x) d (y,z) d (x,z) • d (x,y) + d(y,z) ≥ d(x,z) z

Hierarchically well- Separated Tree (HST) • Each vertex u has a label 9 ∆ u ≥ 0. • ∆ u = 0 iff u is a leaf. 5 8 • If a vertex u is a child of a vertex v, then ∆ u ≤ ∆ v . 0 4 0 0 • Distance between two leaves u,v is defined as ∆ lca(u,v) where lca is the 0 0 least common ancestor.

Hierarchically well- Separated Tree (HST) 9 • Each vertex u has a representative descendant leaf rep u . 5 8 • rep u ∈ {rep v | v is a child of u}. 0 4 0 0 • If u is a leaf, then rep u = u. 0 0

Metric t-approximation x d N (x,y) • A metric N t- d M (x,y) approximates a metric M, y X if they are on the same set of points, and d M (x,y) ≤ d N (x,y) ≤ t d M (x,y) for any points x,y.

Any n-point metric is 2 (n-1)-approximated by some HST x d H (x,y) d M (x,y) y ≈ X x y

First Step: Compute a 2- spanner • Given a metric space M, a 2-spanner is a weighted graph G whose vertices are the point of M and whose shortest path metric 2-approximates M. • d M (x,y) ≤ d G (x,y) ≤ 2 d M (x,y) for all x,y. • Can be computed in O (nlogn) time — Details in Chapter 4.

Construct a HST which (n-1)-approximates the 2-spanner 1 1 • Compute the minimum 1 spanning tree of G, the 2- spanner 1 2 2

Construct a HST which (n-1)-approximates the 2-spanner • Construct the HST using 1 a variation of Kruskal’s 1 1 algorithm • Order the edges in non- 1 2 2 decreasing order.

Construct a HST which (n-1)-approximates the 2-spanner 1 1 1 1 • Start with n 1-element 2 HSTs.

Construct a HST which (n-1)-approximates the 2-spanner 1 1 1 • Add the edges one by 1 one, and merge 2 corresponding HSTs by adding a parent node with ∆ label equal to (n-1) times the edge’s weight. 5

Construct a HST which (n-1)-approximates the 2-spanner 1 1 1 • Add the edges one by 1 one, and merge 2 corresponding HSTs by adding a parent node with ∆ label equal to (n-1) times the edge’s weight. 5 5

Construct a HST which (n-1)-approximates the 2-spanner 1 1 1 • Add the edges one by 1 one, and merge 2 corresponding HSTs by adding a parent node with ∆ label equal to (n-1) times the edge’s weight. 5 5 5

Construct a HST which (n-1)-approximates the 2-spanner 1 1 1 • Add the edges one by 1 one, and merge 2 corresponding HSTs by adding a parent node with ∆ label equal to (n-1) 5 times the edge’s weight. 5 5 5

Construct a HST which (n-1)-approximates the 2-spanner 1 1 1 • Add the edges one by 1 one, and merge 2 corresponding HSTs by adding a parent node with 10 ∆ label equal to (n-1) 5 times the edge’s weight. 5 5 5

The HST (n-1)- approximates the 2- spanner 1 1 1 y • Consider vertices x and y 1 in the graph and the first 2 x edge e that connects their respective connected components. 5 5 5 5 x y

The HST (n-1)- approximates the 2- spanner 1 1 • Let C be the connected 1 y component containing x and y after e is added. 1 2 • w(e) ≤ d G (x,y) ≤ (|C|-1) w x (e) ≤ (n-1) w(e) = d H (x,y) 5 • d G (x,y) ≤ d H (x,y) ≤ (n-1) 5 d G (x,y) 5 5 x y

Any n-point metric is 2 (n-1)-approximated by some HST 10 1 1 5 1 5 ≈ ≈ 1 2 5 5 2

Target Balls • Let B be a set of balls such that the union of the balls in B contains the metric space M. • For a point q in M, the target ball of q in B, denoted ⊙☊ B (q), is the smallest ball in B that contains q. • We want to reduce ANN to target ball queries.

A Trivial Result — Using Balls to Find ANN • Let B(P,r) be the set of b balls of radius r around q each point p in P . p • Let B be the union of B(P , (1+ ∊ ) i ) where i ranges from − ∞ to ∞ . • For a point q, let p be the center of b = ⊙☊ B (q). Then p is (1+ ∊ )-ANN to q.

A Trivial Result — Using Balls to Find ANN • Let s be the nearest b neighbor to q in P . q p • Let r = d (s,q). r • Fix i such that (1+ ε ) i < r s ≤ (1+ ε ) i+1 • Radius of b > (1+ ε ) i • d (s,q) ≤ d (p,q) ≤ (1+ ε ) i+1 ≤ (1+ ε ) d (s,q)

What We Need to Fix • This works, but has unbounded complexity. • We want the number of balls we need to check to be linear. • We first try limiting the range of the radii of the balls. • First, we need to figure out how to handle a range of distances.

Near-Neighbor Data Structure (NNbr) • Let d (q,P) be the infinum P of d (q,p) for p ∈ P . • NNbr(P ,r) is a data p structure, such that when y given a query point q, it NNbr(P ,r) returns r can decide if d (q,P) ≤ r. p on query y • If d (q,P) ≤ r, NNbr(P ,r) also returns a witness point p such that d (q,p) ≤ r.

Near-Neighbor Data Structure (NNbr) q p • Can be realized by n balls of radius r around the points of P . • Perform target ball queries on this set of balls.

Interval Near-Neighbor Data Structure • NNbr data structure with exponential jumps in range. • N i = NNbr(P , (1+ ∊ ) i a) • M = log 1+ ∊ (b/a) • I(P,a,b, ∊ ) = {N 0 , ..., N M }

Interval Near-Neighbor Data Structure • log 1+ ∊ (b/a) = O(log(b/a)/ log(1+ ∊ )) = O( ∊ -1 log(b/ a)) NNbr data structures. • O( ∊ -1 nlog(b/a)) balls.

Using Interval NNbr to find ANN • First check boundaries: O (1) NNbr queries, O(n) target ball queries. • Then, do binary search on the M NNbr’s. This is O (log( ∊ -1 log(b/a))) NNbr queries, or O(nlog( ∊ -1 log (b/a))) target ball queries. • Fast if b/a small.

Faraway Clusters of Points Q • Let Q be a set of m points. • Let U be the union of the balls of radius r around the points of Q • Suppose U is connected.

Faraway Clusters of Points Q • Any two points p,q in Q are in distance ≤ 2r(m-1) from each other. • If d (q,Q) > 2mr/ δ , any point of Q is a (1+ δ )- ANN of q in Q. q

Faraway Clusters of Points Q • Let s be the closest p point in Q to q. s • Let p be any member > 2mr/ δ of Q • 2mr/ δ < d (q,s) ≤ d (q,p) ≤ d (q,s) + d (s,p) ≤ d (q,s) + 2mr ≤ (1+ δ ) d (q,s) q

Approximate Nearest Neighbor via Point- Location among Balls - PowerPoint PPT Presentation

Approximate Nearest Neighbor via Point- Location among Balls Outline Problem and Motivation Related Work Background Techniques Method of Har-Peled (in notes) Problem P is a set of points in a metric space. P Build a

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Approximate Nearest Neighbors via Point Location Among Balls Method of Har-Peled (improved

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

New directions in approximate nearest neighbors for the angular distance Thijs Laarhoven

On Cartesian Trees, Lowest Common Ancestors, and Range Minimum Queries 0 2 0 4 3 5 1 7 6

CSE101: Algorithm Design and Analysis Russell Impagliazzo Sanjoy Dasgupta Ragesh Jaiswal

TARDiS: A branch and merge approach to weak consistency By: Natacha Crooks, Youer Pu, Nancy

Improving Pseudo-Code CS16: Introduction to Data Structures & Algorithms Spring 2020

I u eff I u PRE I u u C C A B A B C * A : C is reachable from A Mahsa

Roberto Gonzlez & Rubn Cuevas, UC3M Reza Motamedi & Reza Rejaie, Univ. Oregon Angel

Multi-Agent Negotiation of Virtual Machine Migration Using the Lightweight Coordination Calculus

Semantics-preserving program transformations from CHR to LCC and back Thierry Martinez INRIA

Sambuz

Useful Links

Newsletter

Mail Us