tutorial on statistical n body problems and proximity
play

Tutorial on Statistical N-Body Problems and Proximity Data - PowerPoint PPT Presentation

Tutorial on Statistical N-Body Problems and Proximity Data Structures Alexander Gray School of Computer Science Carnegie Mellon University Outline: 1. Physics problems and methods 2. Generalized N-body problems 3. Proximity data structures


  1. The Proximity Project [Gray, Lee, Rotella, Moore 2005] Careful agostic empirical comparison, open source 15 datasets, dimension 2-1M The most well-known methods from 1972-2004 • Exact NN: 15 methods • All-NN, mono & bichromatic: 3 methods • Approximate NN: 10 methods • Point location: 3 methods • (NN classification: 3 methods) • (Radial range search: 3 methods)

  2. …and the overall winner is? (exact NN, high-D) Ball-trees, basically – though there is high variance and dataset dependence • Auton ball-trees III [Omohundro 91],[Uhlmann 91], [Moore 99] • Cover-trees [Alina B.,Kakade,Langford 04] • Crust-trees [Yianilos 95],[Gray,Lee,Rotella,Moore 2005]

  3. A ball-tree: level 1

  4. A ball-tree: level 2

  5. A ball-tree: level 3

  6. A ball-tree: level 4

  7. A ball-tree: level 5

  8. Anchors Hierarchy [Moore 99] • ‘Middle-out’ construction • Uses farthest-point method [Gonzalez 85] to find sqrt(N) clusters – this is the middle • Bottom-up construction to get the top • Top-down division to get the bottom • Smart pruning throughout to make it fast • (NlogN), very fast in practice

  9. Outline: 1. Physics problems and methods 2. Generalized N-body problems 3. Proximity data structures 4. Dual-tree algorithms 5. Comparison

  10. Questions • What’s the magic that allows O(N) ? Is it really because of the expansions? • Can we obtain an method that’s: 1. O(N) 2. Lightweight: - works with or without ..............................expansions - simple, recursive

  11. New algorithm • Use an adaptive tree ( kd -tree or ball-tree) • Dual-tree recursion • Finite-difference approximation

  12. Single-tree : Dual-tree (symmetric):

  13. Simple recursive algorithm SingleTree (q,R) { if approximate (q,R), return. if leaf(R), SingleTreeBase (q,R). else, SingleTree (q,R.left). SingleTree (q,R.right). } (NN or range-search: recurse on the closer node first)

  14. Simple recursive algorithm DualTree (Q,R) { if approximate (Q,R), return. if leaf(Q) and leaf(R), DualTreeBase (Q,R). else, DualTree (Q.left,R.left). DualTree (Q.left,R.right). DualTree (Q.right,R.left). DualTree (Q.right,R.right). } (NN or range-search: recurse on the closer node first)

  15. Dual-tree traversal (depth-first) Reference points Query points

  16. Dual-tree traversal Reference points Query points

  17. Dual-tree traversal Reference points Query points

  18. Dual-tree traversal Reference points Query points

  19. Dual-tree traversal Reference points Query points

  20. Dual-tree traversal Reference points Query points

  21. Dual-tree traversal Reference points Query points

  22. Dual-tree traversal Reference points Query points

  23. Dual-tree traversal Reference points Query points

  24. Dual-tree traversal Reference points Query points

  25. Dual-tree traversal Reference points Query points

  26. Dual-tree traversal Reference points Query points

  27. Dual-tree traversal Reference points Query points

  28. Dual-tree traversal Reference points Query points

  29. Dual-tree traversal Reference points Query points

  30. Dual-tree traversal Reference points Query points

  31. Dual-tree traversal Reference points Query points

  32. Dual-tree traversal Reference points Query points

  33. Dual-tree traversal Reference points Query points

  34. Dual-tree traversal Reference points Query points

  35. Dual-tree traversal Reference points Query points

  36. Finite-difference function approximation. Taylor expansion: ′ ≈ + − f ( x ) f ( a ) f ( a )( x a ) Gregory-Newton finite form:  −  1 f ( x ) f ( x ) ≈ + −   + f ( x ) f ( x ) ( x x ) i 1 i i i − 2 x x   + i 1 i  δ − δ  max min 1 K ( ) K ( ) δ ≈ δ + δ − δ   min min K ( ) K ( ) ( ) δ − δ max min 2  

  37. Finite-difference function approximation. assumes monotonic decreasing kernel [ ] = δ + δ min max K K ( ) K ( ) 1 2 QR QR N [ ] N ( ) R = ∑ δ − ≤ δ − δ min max err K K K ( ) K ( ) R q qr QR QR 2 r could also use center of mass Stopping rule?

  38. Simple approximation method approximate (Q,R) { = δ = δ dl N K ( ), du N K ( ). R max R min δ ≥ τ ⋅ max( diam ( Q ), diam ( R )) if min incorporate( dl , du ). } � trivial to change kernel � hard error bounds

  39. Big issue in practice… Tweak parameters Case 1 – algorithm gives no error bounds Case 2 – algorithm gives hard error bounds: must run it many times Case 3 – algorithm automatically achives your error tolerance

  40. Automatic approximation method approximate (Q,R) { = δ = δ dl N K ( ), du N K ( ). R max R min δ − δ ≤ N φ ε K ( ) K ( ) ( Q ) 2 if min max min incorporate( dl , du ). return. } � just set error tolerance, no tweak parameters � hard error bounds

  41. Runtime analysis THEOREM: Dual-tree algorithm is O(N) ASSUMPTION: N points from density f < ≤ ≤ 0 c f C

  42. Recurrence for self-finding single-tree (point-node) = + T ( N ) T ( N / 2 ) O ( 1 ) = T ( 1 ) O ( 1 ) ⇒ N ⋅ O (log N ) dual-tree (node-node) = + T ( N ) 2 T ( N / 2 ) O ( 1 ) = T ( 1 ) O ( 1 ) ⇒ O ( N )

  43. Packing bound LEMMA: Number of nodes that are well- separated from a query node Q is bounded by a constant D 1 +   g ( s , c , C ) Thus the recurrence yields the entire runtime. Done. (cf. [Callahan-Kosaraju 95]) On a manifold , use its dimension D’ (the data’s ‘intrinsic dimension’).

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend