Hypercube locality-sensitive hashing for approximate near neighbors - - PowerPoint PPT Presentation
Hypercube locality-sensitive hashing for approximate near neighbors - - PowerPoint PPT Presentation
Hypercube locality-sensitive hashing for approximate near neighbors Thijs Laarhoven ts ttts MFCS 2017, Aalborg, Denmark (August 23, 2017) Nearest neighbor
O
Nearest neighbor searching
O
Nearest neighbor searching
Data set
O
Nearest neighbor searching
Target
O
Nearest neighbor searching
Nearest neighbor (ℓ2-norm)
O
Nearest neighbor searching
Nearest neighbor (ℓ1-norm)
O
Nearest neighbor searching
Nearest neighbor (angular distance)
O
Nearest neighbor searching
Nearest neighbor (ℓ2-norm)
O
r Nearest neighbor searching
Distance guarantee
O
r Nearest neighbor searching
Approximate nearest neighbor
O
r c · r Nearest neighbor searching
Approximation factor c > 1
O
Nearest neighbor searching
Example: Precompute Voronoi cells
O
Nearest neighbor searching
Example: Precompute Voronoi cells
O
Nearest neighbor searching
Given a target...
O
Nearest neighbor searching
...quickly find the right cell
O
Nearest neighbor searching
Works well in low dimensions
Nearest neighbor searching
Problem setting
- High dimensions d
Nearest neighbor searching
Problem setting
- High dimensions d
- Large data set of size n = 2Ω(d/logd)
◮ Smaller n? =⇒ Use JLT to reduce d
Nearest neighbor searching
Problem setting
- High dimensions d
- Large data set of size n = 2Ω(d/logd)
◮ Smaller n? =⇒ Use JLT to reduce d
- Assumption: Data set lies on the sphere
◮ Equivalent to angular distance/cosine similarity in all of d ◮ Reduction from Eucl. NNS in d to Eucl. NNS on the sphere [AR’15]
Nearest neighbor searching
Problem setting
- High dimensions d
- Large data set of size n = 2Ω(d/logd)
◮ Smaller n? =⇒ Use JLT to reduce d
- Assumption: Data set lies on the sphere
◮ Equivalent to angular distance/cosine similarity in all of d ◮ Reduction from Eucl. NNS in d to Eucl. NNS on the sphere [AR’15]
- Goal: Query time O(nρ) with ρ < 1
O
Hyperplane LSH
[Charikar, STOC’02]
O
Hyperplane LSH
Random point
O
Hyperplane LSH
Opposite point
O
Hyperplane LSH
Two Voronoi cells
O
Hyperplane LSH
Another pair of points
O
Hyperplane LSH
Another hyperplane
O
Hyperplane LSH
Defines partition
O
Hyperplane LSH
Preprocessing
O
Hyperplane LSH
Query
O
Hyperplane LSH
Collisions
O
Hyperplane LSH
Failure
O
Hyperplane LSH
Rerandomization
O
Hyperplane LSH
Collisions
O
Hyperplane LSH
Success
O
Hyperplane LSH
Overview
O
Hyperplane LSH
Overview
- Simple: one hyperplane corresponds to one inner product
O
Hyperplane LSH
Overview
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
O
Hyperplane LSH
Overview
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
- Can be made very efficient in practice
◮ Sparse hyperplane vectors [Ach’01, LHC’06] ◮ Orthogonal hyperplanes [TT’07]
O
Hyperplane LSH
Overview
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
- Can be made very efficient in practice
◮ Sparse hyperplane vectors [Ach’01, LHC’06] ◮ Orthogonal hyperplanes [TT’07]
- Theoretically suboptimal: use “nicer” (lattice-based) partitions
O
Hyperplane LSH
Overview
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
- Can be made very efficient in practice
◮ Sparse hyperplane vectors [Ach’01, LHC’06] ◮ Orthogonal hyperplanes [TT’07]
- Theoretically suboptimal: use “nicer” (lattice-based) partitions
◮ Random points [AI’06, AINR’14, ...]
O
Hyperplane LSH
Overview
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
- Can be made very efficient in practice
◮ Sparse hyperplane vectors [Ach’01, LHC’06] ◮ Orthogonal hyperplanes [TT’07]
- Theoretically suboptimal: use “nicer” (lattice-based) partitions
◮ Random points [AI’06, AINR’14, ...] ◮ Leech lattice [AI’06]
O
Hyperplane LSH
Overview
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
- Can be made very efficient in practice
◮ Sparse hyperplane vectors [Ach’01, LHC’06] ◮ Orthogonal hyperplanes [TT’07]
- Theoretically suboptimal: use “nicer” (lattice-based) partitions
◮ Random points [AI’06, AINR’14, ...] ◮ Leech lattice [AI’06] ◮ Classical root lattices Ad, Dd [JASG’08] ◮ Exceptional root lattices E6,7,8, F4, G2 [JASG’08]
O
Hyperplane LSH
Overview
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
- Can be made very efficient in practice
◮ Sparse hyperplane vectors [Ach’01, LHC’06] ◮ Orthogonal hyperplanes [TT’07]
- Theoretically suboptimal: use “nicer” (lattice-based) partitions
◮ Random points [AI’06, AINR’14, ...] ◮ Leech lattice [AI’06] ◮ Classical root lattices Ad, Dd [JASG’08] ◮ Exceptional root lattices E6,7,8, F4, G2 [JASG’08] ◮ Cross-polytopes [TT’07, AILRS’15, KW’17]
O
Hyperplane LSH
Overview
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
- Can be made very efficient in practice
◮ Sparse hyperplane vectors [Ach’01, LHC’06] ◮ Orthogonal hyperplanes [TT’07]
- Theoretically suboptimal: use “nicer” (lattice-based) partitions
◮ Random points [AI’06, AINR’14, ...] ◮ Leech lattice [AI’06] ◮ Classical root lattices Ad, Dd [JASG’08] ◮ Exceptional root lattices E6,7,8, F4, G2 [JASG’08] ◮ Cross-polytopes [TT’07, AILRS’15, KW’17] ◮ Hypercubes [TT’07]
O
Hyperplane LSH
Asymptotically “optimal”
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
- Can be made very efficient in practice
◮ Sparse hyperplane vectors [Ach’01, LHC’06] ◮ Orthogonal hyperplanes [TT’07]
- Theoretically suboptimal: use “nicer” (lattice-based) partitions
◮ Random points [AI’06, AINR’14, ...] ◮ Leech lattice [AI’06] ◮ Classical root lattices Ad, Dd [JASG’08] ◮ Exceptional root lattices E6,7,8, F4, G2 [JASG’08] ◮ Cross-polytopes [TT’07, AILRS’15, KW’17] ◮ Hypercubes [TT’07]
O
Hyperplane LSH
Topic of this paper
- Simple: one hyperplane corresponds to one inner product
- Easy to analyze: collision probability 1 − θ
π for vectors at angle θ
- Can be made very efficient in practice
◮ Sparse hyperplane vectors [Ach’01, LHC’06] ◮ Orthogonal hyperplanes [TT’07]
- Theoretically suboptimal: use “nicer” (lattice-based) partitions
◮ Random points [AI’06, AINR’14, ...] ◮ Leech lattice [AI’06] ◮ Classical root lattices Ad, Dd [JASG’08] ◮ Exceptional root lattices E6,7,8, F4, G2 [JASG’08] ◮ Cross-polytopes [TT’07, AILRS’15, KW’17] ◮ Hypercubes [TT’07]
O
Hypercube LSH
[Terasawa–Tanaka, WADS’07]
O
Hypercube LSH
Vertices of hypercube
O
Hypercube LSH
Random rotation
O
Hypercube LSH
Voronoi regions
O
Hypercube LSH
Defines partition
Hypercube LSH
Collision probabilities
Hyperplane LSH Hypercube LSH
arccos( 2
π ) π 3 π 2
π
1 π 3 π
ν 1 → θ → p(θ)1/d
Hypercube LSH
Collision probabilities
Hyperplane LSH Hypercube LSH
arccos( 2
π ) π 3 π 2
π
1 π 3 π
ν 1 → θ → p(θ)1/d
- Two vectors at angle ( π
2 )− lie in the same orthant with probability ( 1 π)d
- Two vectors at angle π
3 lie in the same orthant with probability (
- 3
π )d
Hypercube LSH
Asymptotic performance (random data)
Hyperplane LSH Hypercube LSH Cross-polytope LSH
1 2 2 2 2 4 0.05 0.1 0.2 0.5 1 → c → ρ
Hypercube LSH
Asymptotic performance (random data)
Hyperplane LSH Hypercube LSH Cross-polytope LSH
1 2 2 2 2 4 0.05 0.1 0.2 0.5 1 → c → ρ
- Hyperplane LSH: ρ =
- 2
πcln2 + O( 1 c2 )
Hypercube LSH
Asymptotic performance (random data)
Hyperplane LSH Hypercube LSH Cross-polytope LSH
1 2 2 2 2 4 0.05 0.1 0.2 0.5 1 → c → ρ
- Hyperplane LSH: ρ =
- 2
πcln2 + O( 1 c2 )
- Hypercube LSH: ρ =
- 2
πclnπ + O( 1 c2 ) – saves factor log2(π) ≈ 1.65
Hypercube LSH
Asymptotic performance (random data)
Hyperplane LSH Hypercube LSH Cross-polytope LSH
1 2 2 2 2 4 0.05 0.1 0.2 0.5 1 → c → ρ
- Hyperplane LSH: ρ =
- 2
πcln2 + O( 1 c2 )
- Hypercube LSH: ρ =
- 2
πclnπ + O( 1 c2 ) – saves factor log2(π) ≈ 1.65
- Cross-polytope LSH: ρ =
1 2c2−1 + o( 1 c2 )
Conclusions
Positive results
- Exact asymptotics for full-dimensional hypercube LSH
- Exact asymptotics for partial hypercube LSH when d′ ≤ O(d/logd)
- Asymptotically superior to hyperplane LSH
- Theoretical justification for using orthogonal hyperplanes
Conclusions
Positive results
- Exact asymptotics for full-dimensional hypercube LSH
- Exact asymptotics for partial hypercube LSH when d′ ≤ O(d/logd)
- Asymptotically superior to hyperplane LSH
- Theoretical justification for using orthogonal hyperplanes
Negative results
- Asymptotically inferior to e.g. cross-polytope LSH
- Need large hypercubes to beat hyperplane LSH
Conclusions
Positive results
- Exact asymptotics for full-dimensional hypercube LSH
- Exact asymptotics for partial hypercube LSH when d′ ≤ O(d/logd)
- Asymptotically superior to hyperplane LSH
- Theoretical justification for using orthogonal hyperplanes
Negative results
- Asymptotically inferior to e.g. cross-polytope LSH
- Need large hypercubes to beat hyperplane LSH
Open problems
- Exact asymptotics for all of partial hypercube LSH
- Other, better partition families?
Conclusions
Positive results
- Exact asymptotics for full-dimensional hypercube LSH
- Exact asymptotics for partial hypercube LSH when d′ ≤ O(d/logd)
- Asymptotically superior to hyperplane LSH
- Theoretical justification for using orthogonal hyperplanes
Negative results
- Asymptotically inferior to e.g. cross-polytope LSH
- Need large hypercubes to beat hyperplane LSH
Open problems
- Exact asymptotics for all of partial hypercube LSH
- Other, better partition families?