Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi - - PowerPoint PPT Presentation
Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi - - PowerPoint PPT Presentation
Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi The NLS Problem Given: a set of lines in The NLS Problem Given: a set of lines in Goal: build a data structure s.t. given a
The NLS Problem
- Given: a set of π lines π in βπ
The NLS Problem
- Given: a set of π lines π in βπ
- Goal: build a data structure s.t.
β given a query π, find the closest line ββ to π
The NLS Problem
- Given: a set of π lines π in βπ
- Goal: build a data structure s.t.
β given a query π, find the closest line ββ to π β polynomial space β sub-linear query time
The NLS Problem
- Given: a set of π lines π in βπ
- Goal: build a data structure s.t.
β given a query π, find the closest line ββ to π β polynomial space β sub-linear query time
Approximation
- Finds an approximate closest line β
ππππ π,β β€ ππππ(π, ββ)(1 + π)
BACKGROUND
Nearest Neighbor Problems Motivation Previous Work Our result Notation
Nearest Neighbor Problem
NN: Given a set of π points π, build a data structure s.t. given a query point π, finds the closest point πβ to π.
Nearest Neighbor Problem
NN: Given a set of π points π, build a data structure s.t. given a query point π, finds the closest point πβ to π.
- Applications: database, information retrieval,
pattern recognition, computer vision
β Features: dimensions β Objects: points β Similarity: distance between points
Nearest Neighbor Problem
NN: Given a set of π points π, build a data structure s.t. given a query point π, finds the closest point πβ to π.
- Applications: database, information retrieval,
pattern recognition, computer vision
β Features: dimensions β Objects: points β Similarity: distance between points
- Current solutions suffer from βcurse of
dimensionalityβ:
β Either space or query time is exponential in π β Little improvement over linear search
Approximate Nearest Neighbor(ANN)
- ANN: Given a set of π points π, build a data
structure s.t. given a query point π, finds an approximate closest point π to π, i.e., ππππ π,π β€ ππππ π, πβ 1 + π
Approximate Nearest Neighbor(ANN)
- ANN: Given a set of π points π, build a data
structure s.t. given a query point π, finds an approximate closest point π to π, i.e., ππππ π,π β€ ππππ π, πβ 1 + π
- There exist data structures with different
- tradeoffs. Example:
β Space: ππ π
1 π2
β Query time:
π log π π π 1
Motivation for NLS
One of the simplest generalizations of ANN: data items are represented by π- flats (affine subspace) instead of points
Motivation for NLS
One of the simplest generalizations of ANN: data items are represented by π- flats (affine subspace) instead of points
- Model data under linear variations
- Unknown or unimportant parameters in
database
Motivation for NLS
One of the simplest generalizations of ANN: data items are represented by π- flats (affine subspace) instead of points
- Model data under linear variations
- Unknown or unimportant parameters in
database
- Example:
β Varying light gain parameter of images β Each image/point becomes a line β Search for the closest line to the query image
Previous and Related Work
- Magen[02]: Nearest Subspace Search for constant π
β Query time is fast : π + log π +
1 π π 1
β Space is super-polynomial : 2 log π π 1
Previous and Related Work
- Magen[02]: Nearest Subspace Search for constant π
β Query time is fast : π + log π +
1 π π 1
β Space is super-polynomial : 2 log π π 1
Dual Problem: Database is a set of points, query is a π-flat
- [AIKN] for 1-flat: for any π > 0
β Query time: π π3π0.5+π’ β Space: π2ππ
1 π2+ 1 π’2
Previous and Related Work
- Magen[02]: Nearest Subspace Search for constant π
β Query time is fast : π + log π +
1 π π 1
β Space is super-polynomial : 2 log π π 1
Dual Problem: Database is a set of points, query is a π-flat
- [AIKN] for 1-flat: for any π > 0
β Query time: π π3π0.5+π’ β Space: π2ππ
1 π2+ 1 π’2
- Very recently [MNSS] extended it for π-flats
β Query time π π
π π+1βπ+π’
β Space: π(π
1+
ππ π+1βπ + π logπ 1 π’ π)
Our Result
We give a randomized algorithm that for any sufficiently small π reports a 1 + π -approximate solution with high probability
- Space: π + π π
1 π2
- Time : π + log π +
1 π π 1
Our Result
We give a randomized algorithm that for any sufficiently small π reports a 1 + π -approximate solution with high probability
- Space: π + π π
1 π2
- Time : π + log π +
1 π π 1
- Matches up to polynomials, the performance of best
algorithm for ANN. No exponential dependence on π
Our Result
We give a randomized algorithm that for any sufficiently small π reports a 1 + π -approximate solution with high probability
- Space: π + π π
1 π2
- Time : π + log π +
1 π π 1
- Matches up to polynomials, the performance of best
algorithm for ANN. No exponential dependence on π
- The first algorithm with poly log query time and
polynomial space for objects other than points
Our Result
We give a randomized algorithm that for any sufficiently small π reports a 1 + π -approximate solution with high probability
- Space: π + π π
1 π2
- Time : π + log π +
1 π π 1
- Matches up to polynomials, the performance of best
algorithm for ANN. No exponential dependence on π
- The first algorithm with poly log query time and
polynomial space for objects other than points
- Only uses reductions to ANN
Notation
- π : the set of lines with size π
- q : the query point
Notation
- π : the set of lines with size π
- q : the query point
- πΆ(π, π ): ball of radius π around π
Notation
- π : the set of lines with size π
- q : the query point
- πΆ(π, π ): ball of radius π around π
- ππππ: the Euclidean distance
between objects
Notation
- π : the set of lines with size π
- q : the query point
- πΆ(π, π ): ball of radius π around π
- ππππ: the Euclidean distance
between objects
- πππππ: defined between lines
Notation
- π : the set of lines with size π
- q : the query point
- πΆ(π, π ): ball of radius π around π
- ππππ: the Euclidean distance
between objects
- πππππ: defined between lines
- π-close: two lines β , ββ² are π-close
if sin(πππππ β, ββ² ) β€ π
MODULES
Net Module Unbounded Module Parallel Module
Net Module
- Intuition: sampling points from each line
finely enough to get a set of points π, and building an π΅ππ(π, π) should suffice to find the approximate closest line.
Net Module
- Intuition: sampling points from each line
finely enough to get a set of points π, and building an π΅ππ(π, π) should suffice to find the approximate closest line. Lemma:
- Let π¦ be the separation parameter:
distance between two adjacent samples
- n a line, Then
β Either the returned line βπ is an approximate closest line β Or ππππ π, βπ β€ π¦/π
Net Module
- Intuition: sampling points from each line
finely enough to get a set of points π, and building an π΅ππ(π, π) should suffice to find the approximate closest line. Lemma:
- Let π¦ be the separation parameter:
distance between two adjacent samples
- n a line, Then
β Either the returned line βπ is an approximate closest line β Or ππππ π, βπ β€ π¦/π
Issue: It should be used inside a bounded region
Unbounded Module - Intuition
- All lines in π pass through the origin
π
Unbounded Module - Intuition
- All lines in π pass through the origin
π
- Data structure:
β Project all lines onto any sphere π π,π to get point set π β Build ANN data structure π΅ππ(π, π)
Unbounded Module - Intuition
- All lines in π pass through the origin
π
- Data structure:
β Project all lines onto any sphere π π,π to get point set π β Build ANN data structure π΅ππ(π, π)
- Query Algorithm:
β Project the query on π(π, π ) to get πβ² β Find the approximate closest point to πβ², i.e., π = π΅πππ πβ² β Return the corresponding line of π
Unbounded Module
- All lines in π pass through a small ball
πΆ π, π
- Query is far enough, outside of πΆ(π, π)
- Use the same data structure and
query algorithm
Unbounded Module
- All lines in π pass through a small ball
πΆ π, π
- Query is far enough, outside of πΆ(π, π)
- Use the same data structure and
query algorithm Lemma: if π β₯ π
ππ , the returned line βπ is
- Either an approximate closest line
- Or is π-close to the closest line ββ
Unbounded Module
- All lines in π pass through a small ball
πΆ π, π
- Query is far enough, outside of πΆ(π, π)
- Use the same data structure and
query algorithm Lemma: if π β₯ π
ππ , the returned line βπ is
- Either an approximate closest line
- Or is π-close to the closest line ββ
This helps us in two ways
- Bound the region for the net module
- Restrict search to almost parallel lines
Parallel Module - Intuition
- All lines in π are parallel
Parallel Module - Intuition
- All lines in π are parallel
- Data structure:
β Project all lines onto any hyper-plane π which is perpendicular to all the lines to get point set π β Build ANN data structure π΅ππ(π, π)
Parallel Module - Intuition
- All lines in π are parallel
- Data structure:
β Project all lines onto any hyper-plane π which is perpendicular to all the lines to get point set π β Build ANN data structure π΅ππ(π, π)
- Query algorithm:
β Project the query on π to get πβ² β Find the approximate closest point to πβ², i.e., π = π΅πππ πβ² β Return the corresponding line to π
Parallel Module
- All lines in π are π-close to a base line βπ
- Project the lines onto a hyper-plane π which is
perpendicular to βπ
- Query is close enough to π
- Use the same data structure and query algorithm
Parallel Module
- All lines in π are π-close to a base line βπ
- Project the lines onto a hyper-plane π which is
perpendicular to βπ
- Query is close enough to π
- Use the same data structure and query algorithm
Lemma: if ππππ π, π β€
πΈπ π , then
- Either the returned line βπ is an approximate closest
line
- Or ππππ π, βπ
β€ πΈ
Parallel Module
- All lines in π are π-close to a base line βπ
- Project the lines onto a hyper-plane π which is
perpendicular to βπ
- Query is close enough to π
- Use the same data structure and query algorithm
Lemma: if ππππ π, π β€
πΈπ π , then
- Either the returned line βπ is an approximate closest
line
- Or ππππ π, βπ
β€ πΈ Thus, for a set of almost parallel lines, we can use a set
- f parallel modules to cover a bounded region.
How the Modules Work Together
Given a set of lines, we come up with a polynomial number of balls.
How the Modules Work Together
Given a set of lines, we come up with a polynomial number of balls.
- If π is inside the ball
β Use net module
q
How the Modules Work Together
Given a set of lines, we come up with a polynomial number of balls.
- If π is inside the ball
β Use net module
- If π is outside the ball
β First use unbounded module to find a line β
q
How the Modules Work Together
Given a set of lines, we come up with a polynomial number of balls.
- If π is inside the ball
β Use net module
- If π is outside the ball
β First use unbounded module to find a line β
q β
How the Modules Work Together
Given a set of lines, we come up with a polynomial number of balls.
- If π is inside the ball
β Use net module
- If π is outside the ball
β First use unbounded module to find a line β β Then use parallel module to search among parallel lines to β
q β
Outline of the Algorithms
- Input: a set of π lines π
Outline of the Algorithms
- Input: a set of π lines π
- Randomly choose a subset of π/2 lines π
Outline of the Algorithms
- Input: a set of π lines π
- Randomly choose a subset of π/2 lines π
- Solve the problem over π to get a line βπ
Outline of the Algorithms
- Input: a set of π lines π
- Randomly choose a subset of π/2 lines π
- Solve the problem over π to get a line βπ
- For logπ iterations
β Use βπ to find a much closer line βπβ² β Update βπ with βπ
β²
Improvement step
Outline of the Algorithms
- Input: a set of π lines π
- Randomly choose a subset of π/2 lines π
- Solve the problem over π to get a line βπ
- For logπ iterations
β Use βπ to find a much closer line βπβ² β Update βπ with βπ
β²
Improvement step
Outline of the Algorithms
- Input: a set of π lines π
- Randomly choose a subset of π/2 lines π
- Solve the problem over π to get a line βπ
- For logπ iterations
β Use βπ to find a much closer line βπβ² β Update βπ with βπ
β²
Why?
Improvement step
Outline of the Algorithms
- Input: a set of π lines π
- Randomly choose a subset of π/2 lines π
- Solve the problem over π to get a line βπ
- For logπ iterations
β Use βπ to find a much closer line βπβ² β Update βπ with βπ
β²
Let π1, β¦ , πlog π be the log π closest lines to π in the set π
Improvement step
Outline of the Algorithms
- Input: a set of π lines π
- Randomly choose a subset of π/2 lines π
- Solve the problem over π to get a line βπ
- For logπ iterations
β Use βπ to find a much closer line βπβ² β Update βπ with βπ
β²
Let π1, β¦ , πlog π be the log π closest lines to π in the set π With high probability at least one of {π1, β¦ , πlog π} is sampled in π
β ππππ π, βπ β€ ππππ π, πlog π (1 + π) β log π improvement steps suffices to find an approximate closest line
Improvement step
Improvement Step
Given a line β, how to improve it, i.e., find a closer line?
Improvement Step
Given a line β, how to improve it, i.e., find a closer line?
- Data structure
- Query Processing Algorithm
Improvement Step
Given a line β, how to improve it, i.e., find a closer line?
- Data structure
- Query Processing Algorithm
Use the three modules here
Conclusion
Bounds we get for NLS problem
β Polynomial Space: π π + π π
1 π2
β Poly-logarithmic query time : π + log π +
1 π π 1
Conclusion
Bounds we get for NLS problem
β Polynomial Space: π π + π π
1 π2
β Poly-logarithmic query time : π + log π +
1 π π 1
Future Work
- The current result is not efficient in practice
β Large exponents β Algorithm is complicated
Conclusion
Bounds we get for NLS problem
β Polynomial Space: π π + π π
1 π2
β Poly-logarithmic query time : π + log π +
1 π π 1
Future Work
- The current result is not efficient in practice
β Large exponents β Algorithm is complicated
- Can we get a simpler algorithms?
Conclusion
Bounds we get for NLS problem
β Polynomial Space: π π + π π
1 π2
β Poly-logarithmic query time : π + log π +
1 π π 1
Future Work
- The current result is not efficient in practice
β Large exponents β Algorithm is complicated
- Can we get a simpler algorithms?
- Generalization to higher dimensional flats
Conclusion
Bounds we get for NLS problem
β Polynomial Space: π π + π π
1 π2
β Poly-logarithmic query time : π + log π +
1 π π 1
Future Work
- The current result is not efficient in practice
β Large exponents β Algorithm is complicated
- Can we get a simpler algorithm?
- Generalization to higher dimensional flats
- Generalization to other objects, e.g. balls