Tutorial on Statistical N-Body Problems and Proximity Data Structures
Alexander Gray
School of Computer Science Carnegie Mellon University
Tutorial on Statistical N-Body Problems and Proximity Data - - PowerPoint PPT Presentation
Tutorial on Statistical N-Body Problems and Proximity Data Structures Alexander Gray School of Computer Science Carnegie Mellon University Outline: 1. Physics problems and methods 2. Generalized N-body problems 3. Proximity data structures
Tutorial on Statistical N-Body Problems and Proximity Data Structures
Alexander Gray
School of Computer Science Carnegie Mellon University
Outline:
Outline:
‘N-body problem’ of physics
‘N-body problem’ of physics
Simulation (electrostatic, gravitational, statistical mechanics): Compute:
a j i j i j i
x x m m x x K − ∝ ) , ( ) , ( ,
j N i j i x
x K i ∑
≠
∀
Some problems: Gaussian kernel
‘N-body problem’ of physics
Computational fluid dynamics (smoothed particle hydrodynamics): more complicated: nonstationary, anisotropic, edge-dependent (Gray thesis 2003)
) , ( ,
j N i j i x
x K i ∑
≠
∀
Compute: ) 2 ( 3 6 4 ) , (
3 3 2
t t t x x K
i
− + − =
2 2 1 1 ≥ < ≤ < ≤ t t t
2 2 / h
x x t
j i −
=
‘N-body problem’ of physics
Main obstacle:
) (
2
N O
Barnes-Hut Algorithm
[Barnes and Hut, 87]
≈
i R R i
x K N x x K ) , ( ) , ( µ
if
θ r s >
s r
Fast Multipole Method
[Greengard and Rokhlin 1987]
≈ ∀
i i
x x K x ) , ( ,
multipole/Taylor expansion if
r s >
For Gaussian kernel: “Fast Gauss Transform” [Greengard and Strain 91]
Quadtree/octree:
non-rigorous, uniform distribution
non-rigorous, uniform distribution
non-rigorous, uniform distribution
non-rigorous, uniform distribution [Callahan-Kosaraju 95]: O(N) is impossible for log-depth tree
Both are used Often Barnes-Hut is chosen for several reasons…
analytic), heterogeneous kernels
– “Implementing the FMM in 3 Dimensions”, J.Stat.Phys. 1991 – “A New Error Estimate for the Fast Gauss Transform”, J.Sci.Comput. 2002 – “An Implementation of the FMM Without Multipoles”, SIAM J.Sci.Stat.Comput. 1992
can use any kind of tree
requires grid structure
quad-tree/oct-tree not very adaptive kd-tree adaptive ball-tree/metric tree very adaptive
Barnes-Hut FMM runtime O(NlogN) O(N) expansions optional required simple,recursive? yes no adaptive trees? yes no error bounds? no yes
Outline:
N-body problems in statistical learning
Obvious N-body problems: [Gray and Moore, NIPS 2000] [Gray PhD thesis 2003]
N-body problems in statistical learning
2 2 2
/
) , (
h x x j i
j i
e x x K
− −
= 1 ) , (
2a j i
t x x K − = 1 1 ≥ < ≤ t t
2 2 / h
x x t
j i −
=
Typical kernels: Gaussian, Epanechnikov (optimal):
N-body problems in statistical learning
Less obvious N-body problems:
2005)
Gray, Lee, Rotella & Moore 2005) [Gray and Moore, NIPS 2000] [Gray PhD thesis 2003]
Kernel density estimation
≠
N q r r q h q
(consistency)
smoothness conditions
} 2 }, { ; ), ( , , { r Kr ⋅ Σ ∀ δ
Abstract problem: Operator 1 Operator 2 Kernel function Monadic function Multiplicity Chromatic number KDE: Compute
) , ( ,
j N i j i x
x K i ∑
≠
∀
All-NN: Compute
} 2 ; ; , min, arg , { ⋅ ⋅ ∀ δ
j i j k
x x i − ∀ min arg ,
Abstract problem: Operator 1 Operator 2 Kernel function Monadic function Multiplicity Chromatic number
(bichromatic, k)
These are examples of…
All-NN: 2-point: 3-point: KDE: SPH:
} ; ), ( , , { }} { ; ), ( , , { } ), ( , , , { } ), ( , , { } , min, arg , { t w K r K w I w I
r r R r
δ δ δ δ δ Σ ∀ ⋅ Σ ∀ Σ Σ Σ Σ Σ ⋅ ∀
[Gray PhD thesis 2003]
etc.
kernel function.
required.
points.
Fast Gauss Transform
[Greengard and Strain 89, 91]
Gaussian kernel
– appoximation is O(Dp) instead of O(pD) – also ignore Gaussian tails beyond a threshold – choose K<√N, find K clusters; compare each cluster to each other: O(K2)=O(N) – not a tree, just a set of clusters
from signal processing). Considered state-of-the-art in statistics.
(borrowed from physics). Considered state-of-the-art in computer vision. Runtime of both depends explicitly on D.
Data in high D basically always lie on manifold of (much) lower dimension, D’.
Nearest neighbor: Range-search (radial):
} 1 , , ), ( , , { } 1 , , , min, arg , { ⋅ ⋅ Σ ⋅ ⋅ ⋅ ⋅ δ δ
r
I
How are these problems solved?
j j k
x x − min arg
r x x I
j N i j
< −
≠
Outline:
kd-trees:
most widely-used space- partitioning tree
[Friedman, Bentley & Finkel 1977]
A kd-tree: level 1
A kd-tree: level 2
A kd-tree: level 3
A kd-tree: level 4
A kd-tree: level 5
A kd-tree: level 6
Exclusion and inclusion,
using point-node kd-tree bounds. O(D) bounds on distance minima/maxima: ( )
{ }
( )
{ } [ ]
∑
− + − ≥ −
D d d d d d i i
u x x l x x , max , max min
2 2
( )
{ }
∑
− − ≤ −
D d d d d d i i
l x x u x x
2 2
) ( , max max
Exclusion and inclusion,
using point-node kd-tree bounds. O(D) bounds on distance minima/maxima: ( )
{ }
( )
{ } [ ]
∑
− + − ≥ −
D d d d d d i i
u x x l x x , max , max min
2 2
( )
{ }
∑
− − ≤ −
D d d d d d i i
l x x u x x
2 2
) ( , max max
Range-count example Idea #1.2: Recursive range-count algorithm
[folk algorithm]
Range-count example
Range-count example
Range-count example
Range-count example
Pruned! (inclusion)
Range-count example
Range-count example
Range-count example
Range-count example
Range-count example
Range-count example
Range-count example
Range-count example
Pruned! (exclusion)
Range-count example
Range-count example
Range-count example
What’s the best data structure for proximity problems?
proposed nearest-neighbor data structures (maybe even thousands)
Nobody really knows how things compare
[Gray, Lee, Rotella, Moore 2005] Careful agostic empirical comparison, open source 15 datasets, dimension 2-1M The most well-known methods from 1972-2004
Ball-trees, basically – though there is high variance and dataset dependence
[Moore 99]
2005]
A ball-tree: level 1
A ball-tree: level 2
A ball-tree: level 3
A ball-tree: level 4
A ball-tree: level 5
find sqrt(N) clusters – this is the middle
Outline:
Is it really because of the expansions?
..............................expansions
Single-tree: Dual-tree (symmetric):
Simple recursive algorithm
SingleTree(q,R) { if approximate(q,R), return. if leaf(R), SingleTreeBase(q,R). else, SingleTree(q,R.left). SingleTree(q,R.right). }
(NN or range-search: recurse on the closer node first)
Simple recursive algorithm
DualTree(Q,R) { if approximate(Q,R), return. if leaf(Q) and leaf(R), DualTreeBase(Q,R). else, DualTree(Q.left,R.left). DualTree(Q.left,R.right). DualTree(Q.right,R.left). DualTree(Q.right,R.right). }
(NN or range-search: recurse on the closer node first)
Query points Reference points
Dual-tree traversal
(depth-first)
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Query points Reference points
Dual-tree traversal
Finite-difference function approximation.
) ( ) ( ) ( 2 1 ) ( ) (
1 1 i i i i i i
x x x x x f x f x f x f − − − + ≈
+ +
) ( ) ( ) ( 2 1 ) ( ) (
min min max min max min
δ δ δ δ δ δ δ δ − − − + ≈ K K K K ) )( ( ) ( ) ( a x a f a f x f − ′ + ≈
Taylor expansion: Gregory-Newton finite form:
Finite-difference function approximation.
) ( ) ( 2
max min QR QR R N r qr q
K K N K K err
R
δ δ δ − ≤ − =∑
) ( ) (
max min 2 1 QR QR
K K K δ δ + =
assumes monotonic decreasing kernel
Stopping rule?
could also use center of mass
Simple approximation method
approximate(Q,R) { if incorporate(dl, du). }
)) ( ), ( max(
min
R diam Q diam ⋅ ≥τ δ ). ( ), (
min max
δ δ K N du K N dl
R R
= =
trivial to change kernel hard error bounds
Tweak parameters
Case 1 – algorithm gives no error bounds Case 2 – algorithm gives hard error bounds: must run it many times Case 3 – algorithm automatically achives your error tolerance
Automatic approximation method
approximate(Q,R) { if incorporate(dl, du). return. }
) ( ) ( ) (
min 2 max min
Q K K
N φ
δ δ
ε
≤ − ). ( ), (
min max
δ δ K N du K N dl
R R
= =
just set error tolerance, no tweak parameters hard error bounds
THEOREM: Dual-tree algorithm is O(N) ASSUMPTION: N points from density f
) (log N O N ⋅ ⇒ ) 1 ( ) 1 ( ) 1 ( ) 2 / ( ) ( O T O N T N T = + = ) 1 ( ) 1 ( ) 1 ( ) 2 / ( 2 ) ( O T O N T N T = + =
single-tree (point-node) dual-tree (node-node)
) (N O ⇒
LEMMA: Number of nodes that are well- separated from a query node Q is bounded by a constant
D
C c s g ) , , ( 1+
Thus the recurrence yields the entire runtime.
On a manifold, use its dimension D’ (the data’s ‘intrinsic dimension’).
Speedup Results: Number of points
One order-of-magnitude speedup
23 35 hrs 1.6M 10 31616* 800K 5 7904* 400K 2 1976* 200K 1.0 494 100K .46 123 50K .31 31 25K .12 7 12.5K dual- N naïve tree 5500x
Speedup Results: Different kernels
51 23 1.6M 22 10 800K 11 5 400K 5 2 200K 2 1.0 100K 1.1 .46 50K .70 .31 25K .32 .12 12.5K N Epan. Gauss. Epanechnikov: 10-6 relative error Gaussian: 10-3 relative error
Speedup Results: Dimensionality
51 23 1.6M 22 10 800K 11 5 400K 5 2 200K 2 1.0 100K 1.1 .46 50K .70 .31 25K .32 .12 12.5K N Epan. Gauss.
Speedup Results: Different datasets
Name N D Time (sec) 9 2 3M PSF2d 24 784 10K MNIST 8 38 136K CovType 10 5 103K Bio5
Exclusion and inclusion,
Use binary search to locate critical radius:
min||x-xi|| < h1 => min||x-xi|| < h2 Also needed: b_lo,b_hi are arguments; store bounds for each b
Application of HODC principle
Speedup Results
One order-of-magnitude speedup
Outline:
– FFT – IFGT – Dual-tree (Gaussian) – Dual-tree (Epanechnikov)
satisfied
ry=rx; 2) K=10√N, get rx, ry=16 and doubled until error satisfied; hand-tune p for dataset: {8,8,5,3,2}
until error satisfied
colors (N=50k, D=2)
58.2 6.7 (6.7*) 6.5 (6.7*) 6.2 (6.7*) Dualtree (Epanech.)
(117.2*) 18.7 (89.8*) 12.2 (65.1*) Dualtree (Gaussian)
>Exhaust. 1.7 IFGT
2.9 0.1 FFT 329.7 [111.0]
Exact 1% 10% 50%
sj2 (N=50k, D=2)
6.5 0.8 (0.8*) 0.8 (0.8*) 0.8 (0.8*) Dualtree (Epanech.)
3.4 (4.8*) 2.7 (3.1*) Dualtree (Gaussian)
>Exhaust. 12.2 IFGT
>Exhaust. 3.1 FFT 301.7 [109.2]
Exact 1% 10% 50%
bio5 (N=100k, D=5)
408.9 28.4 (28.4*) 28.4 (28.4*) 27.0 (28.2*) Dualtree (Epanech.)
(128.7*) 79.6 (111.8*) 72.2 (98.8*) Dualtree (Gaussian)
>Exhaust. >Exhaust. IFGT
>Exhaust. >Exhaust. FFT 1966.3 [1074.9]
Exact 1% 10% 50%
corel (N=38k, D=32)
261.6 10.1 (10.1*) 10.1 (10.1*) 10.0 (10.0*) Dualtree (Epanech.)
(167.6*) 159.9 (163*) 155.9 (159.7*) Dualtree (Gaussian)
>Exhaust. >Exhaust. IFGT
>Exhaust. >Exhaust. FFT 710.2 [558.7]
Exact 1% 10% 50%
covtype (N=150k, D=38)
1572.0 56.4 (56.4*) 56.3 (56.3*) 54.3 (54.3*) Dualtree (Epanech.)
(148.6*) 140.4 (145.7*) 139.9 (143.6*) Dualtree (Gaussian)
>Exhaust. >Exhaust. IFGT
>Exhaust. >Exhaust. FFT 13157.1 [11486.0]
Exact 1% 10% 50%
Multipole expansions are needed to:
generalizes divide-and-conquer to multiple sets
[Gray PhD thesis 2003], [Gray 2005]
Generalized N-body solutions: Multi-tree methods
Rotella, Moore 2005): vanilla
multiple bandwidths
error bound is crucial
possible to get exact predictions
> pairs are possible; Monte Carlo for large radii
Tricks for different N-body problems
algorithm
do this
from a fundamentally rethought methodology
geometry
Looking for comments and collaborators! agray@cs.cmu.edu
Simple recursive algorithm
DualTree(Q,R) { if approximate(Q,R), return. if leaf(Q) and leaf(R), DualTreeBase(Q,R). else, DualTree(Q.left,closer-of(R.left,R.right)). DualTree(Q.left,farther-of(R.left,R.right)). DualTree(Q.right,closer-of(R.left,R.right)). DualTree(Q.right,farther-of(R.left,R.right)). }
(Actually, recurse on the closer node first)
Exclusion and inclusion,
using kd-tree node-node bounds. O(D) bounds on distance minima/maxima:
(Analogous to point-node bounds.) Also needed: Nodewise bounds.
Exclusion and inclusion,
using point-node kd-tree bounds. O(D) bounds on distance minima/maxima: ( )
{ }
( )
{ } [ ]
∑
− + − ≥ −
D d d d d d i i
u x x l x x , max , max min
2 2
( )
{ }
∑
− − ≤ −
D d d d d d i i
l x x u x x
2 2
) ( , max max