high dimensional integration without markov chains
play

High-dimensional integration without Markov chains Alexander Gray - PowerPoint PPT Presentation

High-dimensional integration without Markov chains Alexander Gray Carnegie Mellon University School of Computer Science High-dimensional integration by: Nonparametric statistics Computational geometry Computational physics


  1. Idea #9: Sample more where the error was larger • Choose new x i with probability p i [ ] 2 ˆ − f ( x ) I q ( x ) v = = i q i v , p i ∑ i i q ( x ) v i i i • Draw from N (x i ,h*)

  2. Should we forget old points? I tried that. It doesn’t work. So I remember all the old samples.

  3. Idea #10: Incrementally update estimates   N f ( x ) ∑ ( ) ˆ ˆ = + +   Ι Ι N i / N N q q tot tot q ( x )   i i [ ]   2 ˆ − f ( x ) I q ( x ) N  ∑  ( ) ˆ ˆ ˆ ˆ 2 = + i q i + 2 V ( I ) V ( I ) N / N N   q q tot tot q ( x )   i i

  4. Overall method: FIRE Repeat: [ ] 2 ˆ − f ( x ) I q ( x ) 1. Resample N points from {x i } using = i q i v i q ( x ) { } Add to training set. i v ~ x = p i i ∑ i v T ~ Build/update i { } x i i [ ]  ~ ~  2 ˆ − f ( x ) I q ( x )   = i q h i * 2. Compute h min   ~ new q ( x )   * * * ∈ 2 3 h { h , h , h }   3 2 h i ˆ 3. Sample N points {x i } from = − α + α q () ( 1 ) f () f () 0 ˆ * 4. For each x i compute using f ( x ), T { } h i x i 5. Update I and V

  5. Properties • Because FIRE is importance sampling: – consistent – unbiased • The NWR estimate approaches f(x)/I • Somewhat reminiscent of particle filtering; EM-like; like N interacting Metropolis samplers

  6. Test problems • Thin region Anisotropic Gaussian a s 2 in off-diagonals a={0.99,0.9,0.5}, D={5,10,25,100} • Isolated modes Mixture of two normals 0.5 N (4,1) + 0.5 N (4+b,1) b={2,4,6,8,10}, D={5,10,25,100}

  7. Competitors • Standard Monte Carlo • MCMC (Metropolis-Hastings) – starting point [Gelman-Roberts-Gilks 95] – adaptation phase [Gelman-Roberts-Gilks 95] – burn-in time [Geyer 92] – multiple chains [Geyer 92] – thinning [Gelman 95]

  8. How to compare Look at its relative error over many runs When to stop it? 1. Use its actual stopping criterion 2. Use a fixed wall-clock time

  9. Anisotropic Gaussian (a=0.9,D=10) • MCMC – started at center of mass – when it wants to stop: >2 hours – after 2 hours • with best s: rel. err {24%,11%,3%,62%} • small s and large s: >250% errors • automatic s: {59%,16%,93%,71%} – ~40M samples • FIRE – when it wants to stop: ~1 hour – after 2 hours: rel. err {1%,2%,1%,1%} – ~1.5M samples

  10. Mixture of Gaussians (b=10,D=10) • MCMC – started at one mode – when it wants to stop: ~30 minutes – after 2 hours: • with best s: rel. err {54%,42%,58%,47%} • small s, large s, automatic s: similar – ~40M samples • FIRE – when it wants to stop: ~10 minutes – after 2 hours: rel. err {<1%,1%,32%,<1%} – ~1.2M samples

  11. Extension #1 Non-positive functions Positivization [Owen-Zhou 1998]

  12. Extension #2 More defensiveness, and accuracy Control variates [Veach 1997]

  13. Extension #3 More accurate regression Local linear regression

  14. Extension #4 (maybe) Fully automatic stopping Function-wide confidence bands stitch together pointwise bands, control with FDR

  15. Summary • We can do high-dimensional integration without Markov chains, by statistical inference • Promising alternative to MCMC – safer (e.g. isolated modes) – not a black art – faster • Intrinsic dimension - multiple viewpoints • MUCH more work needed – please help me!

  16. One notion of intrinsic dimension log C(r) log r ‘Correlation dimension’ Similar: notion in metric analysis

  17. N-body problems mm = K ( x , x ) i • Coulombic i a − x x i (high accuracy required)

  18. N-body problems mm = K ( x , x ) i • Coulombic i a − x x i (high accuracy required) 2 2 2 • Kernel density − − σ = x x / K ( x , x ) e i i estimation − ≤ < 2 a 1 t 0 t 1 2 / σ = − 2 = t x x K ( x , x ) i i ≥ 0 t 1 (only moderate accuracy required, often high-D)

  19. N-body problems mm = K ( x , x ) i • Coulombic i a − x x i (high accuracy required) 2 2 2 • Kernel density − − σ = x x / K ( x , x ) e i i estimation − ≤ < 2 a 1 t 0 t 1 2 / σ = − 2 = t x x K ( x , x ) i i ≥ 0 t 1 (only moderate accuracy required, often high-D) • SPH (smoothed − + 2 3 ≤ < 4 6 t 3 t 0 t 1 particle = − 3 K ( x , x ) ( 2 t ) ≤ < 1 t 2 hydrodynamics) i 0 ≥ (only moderate accuracy required) t 2 Also: different for every point, non-isotropic, edge-dependent, …

  20. N-body methods: Approximation r • Barnes-Hut s ∑ r ≈ µ K ( x , x ) N K ( x , ) s > if i R R θ i

  21. N-body methods: Approximation r • Barnes-Hut s ∑ r ≈ µ K ( x , x ) N K ( x , ) s > if i R R θ i • FMM s ∑ ∀ ≈ s > x , K ( x , x ) r multipole/Taylor expansion if i of order p i

  22. N-body methods: Runtime ≈ O ( N log N ) • Barnes-Hut ≈ non-rigorous, uniform distribution ≈ O ( N ) • FMM ≈ non-rigorous, uniform distribution

  23. N-body methods: Runtime ≈ O ( N log N ) • Barnes-Hut ≈ non-rigorous, uniform distribution ≈ O ( N ) • FMM ≈ non-rigorous, uniform distribution [Callahan-Kosaraju 95]: O(N) is impossible for log-depth tree (in the worst case)

  24. Expansions • Constants matter! p D factor is slowdown • Large dimension infeasible • Adds much complexity (software, human time) • Non-trivial to do new kernels (assuming they’re even analytic), heterogeneous kernels

  25. Expansions • Constants matter! p D factor is slowdown • Large dimension infeasible • Adds much complexity (software, human time) • Non-trivial to do new kernels (assuming they’re even analytic), heterogeneous kernels • BUT: Needed to achieve O(N) Needed to achieve high accuracy Needed to have hard error bounds

  26. Expansions • Constants matter! p D factor is slowdown • Large dimension infeasible • Adds much complexity (software, human time) • Non-trivial to do new kernels (assuming they’re even analytic), heterogeneous kernels • BUT: Needed to achieve O(N) (?) Needed to achieve high accuracy (?) Needed to have hard error bounds (?)

  27. N-body methods: Adaptivity • Barnes-Hut recursive � can use any kind of tree • FMM hand-organized control flow � requires grid structure quad-tree/oct-tree not very adaptive kd -tree adaptive ball-tree/metric tree very adaptive

  28. kd -trees: most widely-used space- partitioning tree [Friedman, Bentley & Finkel 1977] • Univariate axis-aligned splits • Split on widest dimension • O(N log N) to build, O(N) space

  29. A kd- tree: level 1

  30. A kd -tree: level 2

  31. A kd -tree: level 3

  32. A kd- tree: level 4

  33. A kd -tree: level 5

  34. A kd -tree: level 6

  35. A ball-tree: level 1 [Uhlmann 1991], [Omohundro 1991]

  36. A ball-tree: level 2

  37. A ball-tree: level 3

  38. A ball-tree: level 4

  39. A ball-tree: level 5

  40. N-body methods: Comparison Barnes-Hut FMM runtime O ( N log N ) O(N) expansions optional required simple,recursive? yes no adaptive trees? yes no error bounds? no yes

  41. Questions • What’s the magic that allows O(N) ? Is it really because of the expansions? • Can we obtain an method that’s: 1. O(N) 2. lightweight: works with or without ..............................expansions simple, recursive

  42. New algorithm • Use an adaptive tree ( kd -tree or ball-tree) • Dual-tree recursion • Finite-difference approximation

  43. Single-tree : Dual-tree (symmetric):

  44. Simple recursive algorithm SingleTree (q,R) { if approximate (q,R), return. if leaf(R), SingleTreeBase (q,R). else, SingleTree (q,R.left). SingleTree (q,R.right). } (NN or range-search: recurse on the closer node first)

  45. Simple recursive algorithm DualTree (Q,R) { if approximate (Q,R), return. if leaf(Q) and leaf(R), DualTreeBase (Q,R). else, DualTree (Q.left,R.left). DualTree (Q.left,R.right). DualTree (Q.right,R.left). DualTree (Q.right,R.right). } (NN or range-search: recurse on the closer node first)

  46. Dual-tree traversal (depth-first) Reference points Query points

  47. Dual-tree traversal Reference points Query points

  48. Dual-tree traversal Reference points Query points

  49. Dual-tree traversal Reference points Query points

  50. Dual-tree traversal Reference points Query points

  51. Dual-tree traversal Reference points Query points

  52. Dual-tree traversal Reference points Query points

  53. Dual-tree traversal Reference points Query points

  54. Dual-tree traversal Reference points Query points

  55. Dual-tree traversal Reference points Query points

  56. Dual-tree traversal Reference points Query points

  57. Dual-tree traversal Reference points Query points

  58. Dual-tree traversal Reference points Query points

  59. Dual-tree traversal Reference points Query points

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend