Tutorial on Statistical N-Body Problems and Proximity Data - PowerPoint PPT Presentation

The Proximity Project [Gray, Lee, Rotella, Moore 2005] Careful agostic empirical comparison, open source 15 datasets, dimension 2-1M The most well-known methods from 1972-2004 • Exact NN: 15 methods • All-NN, mono & bichromatic: 3 methods • Approximate NN: 10 methods • Point location: 3 methods • (NN classification: 3 methods) • (Radial range search: 3 methods)

…and the overall winner is? (exact NN, high-D) Ball-trees, basically – though there is high variance and dataset dependence • Auton ball-trees III [Omohundro 91],[Uhlmann 91], [Moore 99] • Cover-trees [Alina B.,Kakade,Langford 04] • Crust-trees [Yianilos 95],[Gray,Lee,Rotella,Moore 2005]

A ball-tree: level 1

Anchors Hierarchy [Moore 99] • ‘Middle-out’ construction • Uses farthest-point method [Gonzalez 85] to find sqrt(N) clusters – this is the middle • Bottom-up construction to get the top • Top-down division to get the bottom • Smart pruning throughout to make it fast • (NlogN), very fast in practice

Outline: 1. Physics problems and methods 2. Generalized N-body problems 3. Proximity data structures 4. Dual-tree algorithms 5. Comparison

Questions • What’s the magic that allows O(N) ? Is it really because of the expansions? • Can we obtain an method that’s: 1. O(N) 2. Lightweight: - works with or without ..............................expansions - simple, recursive

New algorithm • Use an adaptive tree ( kd -tree or ball-tree) • Dual-tree recursion • Finite-difference approximation

Single-tree : Dual-tree (symmetric):

Simple recursive algorithm SingleTree (q,R) { if approximate (q,R), return. if leaf(R), SingleTreeBase (q,R). else, SingleTree (q,R.left). SingleTree (q,R.right). } (NN or range-search: recurse on the closer node first)

Simple recursive algorithm DualTree (Q,R) { if approximate (Q,R), return. if leaf(Q) and leaf(R), DualTreeBase (Q,R). else, DualTree (Q.left,R.left). DualTree (Q.left,R.right). DualTree (Q.right,R.left). DualTree (Q.right,R.right). } (NN or range-search: recurse on the closer node first)

Dual-tree traversal (depth-first) Reference points Query points

Dual-tree traversal Reference points Query points

Finite-difference function approximation. Taylor expansion: ′ ≈ + − f ( x ) f ( a ) f ( a )( x a ) Gregory-Newton finite form:  −  1 f ( x ) f ( x ) ≈ + −   + f ( x ) f ( x ) ( x x ) i 1 i i i − 2 x x   + i 1 i  δ − δ  max min 1 K ( ) K ( ) δ ≈ δ + δ − δ   min min K ( ) K ( ) ( ) δ − δ max min 2  

Finite-difference function approximation. assumes monotonic decreasing kernel [ ] = δ + δ min max K K ( ) K ( ) 1 2 QR QR N [ ] N ( ) R = ∑ δ − ≤ δ − δ min max err K K K ( ) K ( ) R q qr QR QR 2 r could also use center of mass Stopping rule?

Simple approximation method approximate (Q,R) { = δ = δ dl N K ( ), du N K ( ). R max R min δ ≥ τ ⋅ max( diam ( Q ), diam ( R )) if min incorporate( dl , du ). } � trivial to change kernel � hard error bounds

Big issue in practice… Tweak parameters Case 1 – algorithm gives no error bounds Case 2 – algorithm gives hard error bounds: must run it many times Case 3 – algorithm automatically achives your error tolerance

Automatic approximation method approximate (Q,R) { = δ = δ dl N K ( ), du N K ( ). R max R min δ − δ ≤ N φ ε K ( ) K ( ) ( Q ) 2 if min max min incorporate( dl , du ). return. } � just set error tolerance, no tweak parameters � hard error bounds

Runtime analysis THEOREM: Dual-tree algorithm is O(N) ASSUMPTION: N points from density f < ≤ ≤ 0 c f C

Recurrence for self-finding single-tree (point-node) = + T ( N ) T ( N / 2 ) O ( 1 ) = T ( 1 ) O ( 1 ) ⇒ N ⋅ O (log N ) dual-tree (node-node) = + T ( N ) 2 T ( N / 2 ) O ( 1 ) = T ( 1 ) O ( 1 ) ⇒ O ( N )

Packing bound LEMMA: Number of nodes that are well- separated from a query node Q is bounded by a constant D 1 +   g ( s , c , C ) Thus the recurrence yields the entire runtime. Done. (cf. [Callahan-Kosaraju 95]) On a manifold , use its dimension D’ (the data’s ‘intrinsic dimension’).

Tutorial on Statistical N-Body Problems and Proximity Data - PowerPoint PPT Presentation

Tutorial on Statistical N-Body Problems and Proximity Data Structures Alexander Gray School of Computer Science Carnegie Mellon University Outline: 1. Physics problems and methods 2. Generalized N-body problems 3. Proximity data structures

T Levels/Skills Plan Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

Tree Pr ee Proximity ximity Finding the good and bad of trees. joe@buildfax.com Tree

Planar Delaunay Triangulations and Proximity Structures Proximity Structures Given: a set P of n

Proximity Language Model A Language Model beyond Bag of Words through Proximity Jinglei Zhao 1

personal heating and cooling for people with paraplegia body temperature regulation Body gets

A GAMS TUTORIAL A GAMS TUTORIAL A GAMS TUTORIAL WHAT IS GAMS ? General Algebraic Modeling

Michael Faraday James Clerk Maxwell James Clerk Maxwell Gin a body meet a body Gin a body meet a

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Behavioral Detection and Containment of Proximity Malware in Delay Tolerant Networks Wei Peng,

Replay, Relay and Inverse-Sybil Attacks on Proximity Tracing Apps Krzysztof Pietrzak 2020

Close Proximity Radiography www.tracoilandgas.com Overview What is Close Proximity Radiography?

#prep X Assembly 03-B: Proximity Sensor + Right Fan You got the Dual Fan Upgrade? This is what

The distribution of the proximity function Timm Oertel Joseph Paat + Robert Weismantel +

Excel Tutorial 1 Getting Started with Excel Tutorial 2 Formatting a Workbook Tutorial 3

Soft body physics and fracture generation Erich Jagomgis What is a soft body? What is not a

Tools for Estimating Groundwater Contaminant Flux to Surface Water Steven Acree Robert Ford

Your departmental website How to create an online presence, with pictures 7 September, 2016 J

M1120 Class 5 Dan Barbasch September 6, 2011 Dan Barbasch () M1120 Class 5 September 6, 2011

How to Read and Present a Scientific Paper Jiri Srba Thanks to Emmanuel Fleury for providing his

Chapter 12 Design of Control Surfaces From: Aircraft Design: A Systems Engineering Approach

Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US Timnit

The MetaOCaml files Status report and research proposal

Analysis Methods and Tools Dr Edmondo Minisci Centre for Future Air-Space Transportation