Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi

The NLS Problem • Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒

The NLS Problem • Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒 • Goal: build a data structure s.t. – given a query 𝑟 , find the closest line ℓ ∗ to 𝑟

The NLS Problem • Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒 • Goal: build a data structure s.t. – given a query 𝑟 , find the closest line ℓ ∗ to 𝑟 – polynomial space – sub-linear query time

The NLS Problem • Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒 • Goal: build a data structure s.t. – given a query 𝑟 , find the closest line ℓ ∗ to 𝑟 – polynomial space – sub-linear query time Approximation • Finds an approximate closest line ℓ 𝑒𝑒𝑒𝑒 𝑟 , ℓ ≤ 𝑒𝑒𝑒𝑒 ( 𝑟 , ℓ ∗ )(1 + 𝜗 )

Nearest Neighbor Problems Motivation Previous Work Our result Notation BACKGROUND

Nearest Neighbor Problem NN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point 𝑟 , finds the closest point 𝑞 ∗ to 𝑟 .

Nearest Neighbor Problem NN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point 𝑟 , finds the closest point 𝑞 ∗ to 𝑟 . Applications: database, information retrieval, • pattern recognition, computer vision – Features: dimensions – Objects: points – Similarity: distance between points

Nearest Neighbor Problem NN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point 𝑟 , finds the closest point 𝑞 ∗ to 𝑟 . Applications: database, information retrieval, • pattern recognition, computer vision – Features: dimensions – Objects: points – Similarity: distance between points Current solutions suffer from “curse of • dimensionality”: – Either space or query time is exponential in 𝑒 – Little improvement over linear search

Approximate Nearest Neighbor(ANN) • ANN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point 𝑟 , finds an approximate closest point 𝑞 to 𝑟 , i.e., 𝑒𝑒𝑒𝑒 𝑟 , 𝑞 ≤ 𝑒𝑒𝑒𝑒 𝑟 , 𝑞 ∗ 1 + 𝜗

Approximate Nearest Neighbor(ANN) • ANN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point 𝑟 , finds an approximate closest point 𝑞 to 𝑟 , i.e., 𝑒𝑒𝑒𝑒 𝑟 , 𝑞 ≤ 𝑒𝑒𝑒𝑒 𝑟 , 𝑞 ∗ 1 + 𝜗 • There exist data structures with different tradeoffs. Example: 1 – Space: 𝑒𝑂 𝑃 𝜗2 𝑃 1 𝑒 log 𝑂 – Query time: 𝜗

Motivation for NLS One of the simplest generalizations of ANN: data items are represented by 𝑙 - flats (affine subspace) instead of points

Motivation for NLS One of the simplest generalizations of ANN: data items are represented by 𝑙 - flats (affine subspace) instead of points • Model data under linear variations • Unknown or unimportant parameters in database

Motivation for NLS One of the simplest generalizations of ANN: data items are represented by 𝑙 - flats (affine subspace) instead of points • Model data under linear variations • Unknown or unimportant parameters in database • Example: – Varying light gain parameter of images – Each image/point becomes a line – Search for the closest line to the query image

Previous and Related Work • Magen[02]: Nearest Subspace Search for constant 𝑙 𝑃 1 1 – Query time is fast : 𝑒 + log 𝑂 + 𝜗 – Space is super-polynomial : 2 log 𝑂 𝑃 1

Previous and Related Work • Magen[02]: Nearest Subspace Search for constant 𝑙 𝑃 1 1 – Query time is fast : 𝑒 + log 𝑂 + 𝜗 – Space is super-polynomial : 2 log 𝑂 𝑃 1 Dual Problem: Database is a set of points, query is a 𝑙 -flat • [AIKN] for 1-flat: for any 𝑒 > 0 – Query time: 𝑃 𝑒 3 𝑂 0 . 5+𝑢 𝜗2 + 1 1 – Space: 𝑒 2 𝑂 𝑃 𝑢2

Previous and Related Work • Magen[02]: Nearest Subspace Search for constant 𝑙 𝑃 1 1 – Query time is fast : 𝑒 + log 𝑂 + 𝜗 – Space is super-polynomial : 2 log 𝑂 𝑃 1 Dual Problem: Database is a set of points, query is a 𝑙 -flat • [AIKN] for 1-flat: for any 𝑒 > 0 – Query time: 𝑃 𝑒 3 𝑂 0 . 5+𝑢 𝜗2 + 1 1 – Space: 𝑒 2 𝑂 𝑃 𝑢2 • Very recently [MNSS] extended it for 𝑙 -flats 𝑙 𝑙+1−𝜍 +𝑢 – Query time 𝑃 𝑜 𝜏𝑙 𝑙+1−𝜍 + 𝑜 log 𝑃 1 1+ 𝑢 𝑜 ) – Space: 𝑃 ( 𝑜

Our Result We give a randomized algorithm that for any sufficiently small 𝜗 reports a 1 + 𝜗 -approximate solution with high probability 1 • Space: 𝑂 + 𝑒 𝑃 𝜗2 𝑃 1 1 • Time : 𝑒 + log 𝑂 + 𝜗

Our Result We give a randomized algorithm that for any sufficiently small 𝜗 reports a 1 + 𝜗 -approximate solution with high probability 1 • Space: 𝑂 + 𝑒 𝑃 𝜗2 𝑃 1 1 • Time : 𝑒 + log 𝑂 + 𝜗 • Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on 𝑒

Our Result We give a randomized algorithm that for any sufficiently small 𝜗 reports a 1 + 𝜗 -approximate solution with high probability 1 • Space: 𝑂 + 𝑒 𝑃 𝜗2 𝑃 1 1 • Time : 𝑒 + log 𝑂 + 𝜗 • Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on 𝑒 • The first algorithm with poly log query time and polynomial space for objects other than points

Our Result We give a randomized algorithm that for any sufficiently small 𝜗 reports a 1 + 𝜗 -approximate solution with high probability 1 • Space: 𝑂 + 𝑒 𝑃 𝜗2 𝑃 1 1 • Time : 𝑒 + log 𝑂 + 𝜗 • Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on 𝑒 • The first algorithm with poly log query time and polynomial space for objects other than points • Only uses reductions to ANN

Notation • 𝑀 : the set of lines with size 𝑂 • q : the query point

Notation • 𝑀 : the set of lines with size 𝑂 • q : the query point • 𝐶 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑

Notation • 𝑀 : the set of lines with size 𝑂 • q : the query point • 𝐶 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑 • 𝑒𝑒𝑒𝑒 : the Euclidean distance between objects

Notation • 𝑀 : the set of lines with size 𝑂 • q : the query point • 𝐶 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑 • 𝑒𝑒𝑒𝑒 : the Euclidean distance between objects • 𝑏𝑜𝑏𝑏𝑏 : defined between lines

Notation • 𝑀 : the set of lines with size 𝑂 • q : the query point • 𝐶 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑 • 𝑒𝑒𝑒𝑒 : the Euclidean distance between objects • 𝑏𝑜𝑏𝑏𝑏 : defined between lines • 𝜀 -close: two lines ℓ , ℓ′ are 𝜀 -close if sin( 𝑏𝑜𝑏𝑏𝑏 ℓ , ℓ ′ ) ≤ 𝜀

Net Module Unbounded Module Parallel Module MODULES

Net Module • Intuition: sampling points from each line finely enough to get a set of points 𝑄 , and building an 𝐵𝑂𝑂 ( 𝑄 , 𝜗 ) should suffice to find the approximate closest line.

Net Module • Intuition: sampling points from each line finely enough to get a set of points 𝑄 , and building an 𝐵𝑂𝑂 ( 𝑄 , 𝜗 ) should suffice to find the approximate closest line. Lemma: • Let 𝑦 be the separation parameter: distance between two adjacent samples on a line, Then – Either the returned line ℓ 𝑞 is an approximate closest line – Or 𝑒𝑒𝑒𝑒 𝑟 , ℓ 𝑞 ≤ 𝑦 / 𝜗

Net Module • Intuition: sampling points from each line finely enough to get a set of points 𝑄 , and building an 𝐵𝑂𝑂 ( 𝑄 , 𝜗 ) should suffice to find the approximate closest line. Lemma: • Let 𝑦 be the separation parameter: distance between two adjacent samples on a line, Then – Either the returned line ℓ 𝑞 is an approximate closest line – Or 𝑒𝑒𝑒𝑒 𝑟 , ℓ 𝑞 ≤ 𝑦 / 𝜗 Issue: It should be used inside a bounded region

Unbounded Module - Intuition • All lines in 𝑀 pass through the origin 𝑝

Unbounded Module - Intuition • All lines in 𝑀 pass through the origin 𝑝 • Data structure: – Project all lines onto any sphere 𝑇 𝑝 , 𝑠 to get point set 𝑄 – Build ANN data structure 𝐵𝑂𝑂 ( 𝑄 , 𝜗 )

Unbounded Module - Intuition • All lines in 𝑀 pass through the origin 𝑝 • Data structure: – Project all lines onto any sphere 𝑇 𝑝 , 𝑠 to get point set 𝑄 – Build ANN data structure 𝐵𝑂𝑂 ( 𝑄 , 𝜗 ) • Query Algorithm: – Project the query on 𝑇 ( 𝑝 , 𝑠 ) to get 𝑟′ – Find the approximate closest point to 𝑟′ , i.e., 𝑞 = 𝐵𝑂𝑂 𝑄 𝑟 ′ – Return the corresponding line of 𝑞

Unbounded Module • All lines in 𝑀 pass through a small ball 𝐶 𝑝 , 𝑠 • Query is far enough, outside of 𝐶 ( 𝑝 , 𝑆 ) • Use the same data structure and query algorithm

Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi - PowerPoint PPT Presentation

Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi The NLS Problem Given: a set of lines in The NLS Problem Given: a set of lines in Goal: build a data structure s.t. given a

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi 1 The NLS Problem

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

New directions in approximate nearest neighbors for the angular distance Thijs Laarhoven

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Overview of Line Search Topics Problem Definition Problem definition f ( ) Line search

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

Going beyond Zeno through a pointwise asymptotically stable set in a hybrid system Rafal Goebel

SEVEN KEYS FOR PRACTICAL UNDERSTANDING AND USE OF CGNS Marc Poinot, SAFRAN Christopher Rumsey,

Category-level localization Cordelia Schmid Category-level localization Localization of

Bridging Learning Communities with Scratch and Makey Makey ACRL DVC 2018 Spring Program

Laplace Transforms Circuit Analysis Passive element equivalents Review of ECE 221 methods

OBSERVING INTERACTION human-computer interaction CSE 440 WINTER 2015 University of FEB 12 -

C ASE P REPARATION : part 1 Most debates are won and lost in the preparation room.

caregivers with close relatives in Nursing Home (NH) Pr Anne-Sophie Rigaud, Catherine Bayle,