High-dimensional integration without Markov chains Alexander Gray - PowerPoint PPT Presentation

Idea #9: Sample more where the error was larger • Choose new x i with probability p i [ ] 2 ˆ − f ( x ) I q ( x ) v = = i q i v , p i ∑ i i q ( x ) v i i i • Draw from N (x i ,h*)

Should we forget old points? I tried that. It doesn’t work. So I remember all the old samples.

Idea #10: Incrementally update estimates   N f ( x ) ∑ ( ) ˆ ˆ = + +   Ι Ι N i / N N q q tot tot q ( x )   i i [ ]   2 ˆ − f ( x ) I q ( x ) N  ∑  ( ) ˆ ˆ ˆ ˆ 2 = + i q i + 2 V ( I ) V ( I ) N / N N   q q tot tot q ( x )   i i

Overall method: FIRE Repeat: [ ] 2 ˆ − f ( x ) I q ( x ) 1. Resample N points from {x i } using = i q i v i q ( x ) { } Add to training set. i v ~ x = p i i ∑ i v T ~ Build/update i { } x i i [ ]  ~ ~  2 ˆ − f ( x ) I q ( x )   = i q h i * 2. Compute h min   ~ new q ( x )   * * * ∈ 2 3 h { h , h , h }   3 2 h i ˆ 3. Sample N points {x i } from = − α + α q () ( 1 ) f () f () 0 ˆ * 4. For each x i compute using f ( x ), T { } h i x i 5. Update I and V

Properties • Because FIRE is importance sampling: – consistent – unbiased • The NWR estimate approaches f(x)/I • Somewhat reminiscent of particle filtering; EM-like; like N interacting Metropolis samplers

Test problems • Thin region Anisotropic Gaussian a s 2 in off-diagonals a={0.99,0.9,0.5}, D={5,10,25,100} • Isolated modes Mixture of two normals 0.5 N (4,1) + 0.5 N (4+b,1) b={2,4,6,8,10}, D={5,10,25,100}

Competitors • Standard Monte Carlo • MCMC (Metropolis-Hastings) – starting point [Gelman-Roberts-Gilks 95] – adaptation phase [Gelman-Roberts-Gilks 95] – burn-in time [Geyer 92] – multiple chains [Geyer 92] – thinning [Gelman 95]

How to compare Look at its relative error over many runs When to stop it? 1. Use its actual stopping criterion 2. Use a fixed wall-clock time

Anisotropic Gaussian (a=0.9,D=10) • MCMC – started at center of mass – when it wants to stop: >2 hours – after 2 hours • with best s: rel. err {24%,11%,3%,62%} • small s and large s: >250% errors • automatic s: {59%,16%,93%,71%} – ~40M samples • FIRE – when it wants to stop: ~1 hour – after 2 hours: rel. err {1%,2%,1%,1%} – ~1.5M samples

Mixture of Gaussians (b=10,D=10) • MCMC – started at one mode – when it wants to stop: ~30 minutes – after 2 hours: • with best s: rel. err {54%,42%,58%,47%} • small s, large s, automatic s: similar – ~40M samples • FIRE – when it wants to stop: ~10 minutes – after 2 hours: rel. err {<1%,1%,32%,<1%} – ~1.2M samples

Extension #1 Non-positive functions Positivization [Owen-Zhou 1998]

Extension #2 More defensiveness, and accuracy Control variates [Veach 1997]

Extension #3 More accurate regression Local linear regression

Extension #4 (maybe) Fully automatic stopping Function-wide confidence bands stitch together pointwise bands, control with FDR

Summary • We can do high-dimensional integration without Markov chains, by statistical inference • Promising alternative to MCMC – safer (e.g. isolated modes) – not a black art – faster • Intrinsic dimension - multiple viewpoints • MUCH more work needed – please help me!

One notion of intrinsic dimension log C(r) log r ‘Correlation dimension’ Similar: notion in metric analysis

N-body problems mm = K ( x , x ) i • Coulombic i a − x x i (high accuracy required)

N-body problems mm = K ( x , x ) i • Coulombic i a − x x i (high accuracy required) 2 2 2 • Kernel density − − σ = x x / K ( x , x ) e i i estimation − ≤ < 2 a 1 t 0 t 1 2 / σ = − 2 = t x x K ( x , x ) i i ≥ 0 t 1 (only moderate accuracy required, often high-D)

N-body problems mm = K ( x , x ) i • Coulombic i a − x x i (high accuracy required) 2 2 2 • Kernel density − − σ = x x / K ( x , x ) e i i estimation − ≤ < 2 a 1 t 0 t 1 2 / σ = − 2 = t x x K ( x , x ) i i ≥ 0 t 1 (only moderate accuracy required, often high-D) • SPH (smoothed − + 2 3 ≤ < 4 6 t 3 t 0 t 1 particle = − 3 K ( x , x ) ( 2 t ) ≤ < 1 t 2 hydrodynamics) i 0 ≥ (only moderate accuracy required) t 2 Also: different for every point, non-isotropic, edge-dependent, …

N-body methods: Approximation r • Barnes-Hut s ∑ r ≈ µ K ( x , x ) N K ( x , ) s > if i R R θ i

N-body methods: Approximation r • Barnes-Hut s ∑ r ≈ µ K ( x , x ) N K ( x , ) s > if i R R θ i • FMM s ∑ ∀ ≈ s > x , K ( x , x ) r multipole/Taylor expansion if i of order p i

N-body methods: Runtime ≈ O ( N log N ) • Barnes-Hut ≈ non-rigorous, uniform distribution ≈ O ( N ) • FMM ≈ non-rigorous, uniform distribution

N-body methods: Runtime ≈ O ( N log N ) • Barnes-Hut ≈ non-rigorous, uniform distribution ≈ O ( N ) • FMM ≈ non-rigorous, uniform distribution [Callahan-Kosaraju 95]: O(N) is impossible for log-depth tree (in the worst case)

Expansions • Constants matter! p D factor is slowdown • Large dimension infeasible • Adds much complexity (software, human time) • Non-trivial to do new kernels (assuming they’re even analytic), heterogeneous kernels

Expansions • Constants matter! p D factor is slowdown • Large dimension infeasible • Adds much complexity (software, human time) • Non-trivial to do new kernels (assuming they’re even analytic), heterogeneous kernels • BUT: Needed to achieve O(N) Needed to achieve high accuracy Needed to have hard error bounds

Expansions • Constants matter! p D factor is slowdown • Large dimension infeasible • Adds much complexity (software, human time) • Non-trivial to do new kernels (assuming they’re even analytic), heterogeneous kernels • BUT: Needed to achieve O(N) (?) Needed to achieve high accuracy (?) Needed to have hard error bounds (?)

N-body methods: Adaptivity • Barnes-Hut recursive � can use any kind of tree • FMM hand-organized control flow � requires grid structure quad-tree/oct-tree not very adaptive kd -tree adaptive ball-tree/metric tree very adaptive

kd -trees: most widely-used space- partitioning tree [Friedman, Bentley & Finkel 1977] • Univariate axis-aligned splits • Split on widest dimension • O(N log N) to build, O(N) space

A kd- tree: level 1

A kd -tree: level 2

A kd -tree: level 3

A kd- tree: level 4

A kd -tree: level 5

A kd -tree: level 6

A ball-tree: level 1 [Uhlmann 1991], [Omohundro 1991]

A ball-tree: level 2

N-body methods: Comparison Barnes-Hut FMM runtime O ( N log N ) O(N) expansions optional required simple,recursive? yes no adaptive trees? yes no error bounds? no yes

Questions • What’s the magic that allows O(N) ? Is it really because of the expansions? • Can we obtain an method that’s: 1. O(N) 2. lightweight: works with or without ..............................expansions simple, recursive

New algorithm • Use an adaptive tree ( kd -tree or ball-tree) • Dual-tree recursion • Finite-difference approximation

Single-tree : Dual-tree (symmetric):

Simple recursive algorithm SingleTree (q,R) { if approximate (q,R), return. if leaf(R), SingleTreeBase (q,R). else, SingleTree (q,R.left). SingleTree (q,R.right). } (NN or range-search: recurse on the closer node first)

Simple recursive algorithm DualTree (Q,R) { if approximate (Q,R), return. if leaf(Q) and leaf(R), DualTreeBase (Q,R). else, DualTree (Q.left,R.left). DualTree (Q.left,R.right). DualTree (Q.right,R.left). DualTree (Q.right,R.right). } (NN or range-search: recurse on the closer node first)

Dual-tree traversal (depth-first) Reference points Query points

Dual-tree traversal Reference points Query points

High-dimensional integration without Markov chains Alexander Gray - PowerPoint PPT Presentation

High-dimensional integration without Markov chains Alexander Gray Carnegie Mellon University School of Computer Science High-dimensional integration by: Nonparametric statistics Computational geometry Computational physics

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Discrete time Markov chains Today: Discrete Time Markov Chains, Limiting Discrete time Markov

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Simulation of Discrete-Time Markov Chains Discrete-Time Markov Chains (DTMCs) Numerical Solution

Under Interval and Fuzzy From the . . . Symmetric Markov Chains Uncertainty, Symmetric In

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Markov chains and MCMC methods Ingo Blechschmidt November 7th, 2014 Kleine Bayessche AG Markov

Markov chains Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad Niemi

Topology-Aware Cooperative Data Protection in Blockchain-Based Decentralized Storage Networks

CSEP 590 B Computational Biology Gene Expression Analysis 1 Assaying Gene Expression 3

Chapt hapter er 6 6 Parallel Processors from Client to Cloud 6.1 Introduction Introduction

Lecture 3 SIMD and Vectorization GPU Architecture Todays lecture Vectorization and SSE

Targeted end-to-end knowledge graph decomposition Bla krlj, Jan Kralj and Nada Lavra c

Collaboration wont just happen Supervising Co-Teaching Teams: Whose Line is it Deliberate

Python - Data Analysis Essentials Day 2 Giuseppe Accaputo g@accaputo.ch 01.12.2018 Slide 1 IT

ML HPC: Optimizing Optimizers for Optimization Workshop on the Convergence of ML & HPC

High-dimensional integration without Markov chains Alexander Gray - PowerPoint PPT Presentation

High-dimensional integration without Markov chains Alexander Gray Carnegie Mellon University School of Computer Science High-dimensional integration by: Nonparametric statistics Computational geometry Computational physics

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Discrete time Markov chains Today: Discrete Time Markov Chains, Limiting Discrete time Markov

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Simulation of Discrete-Time Markov Chains Discrete-Time Markov Chains (DTMCs) Numerical Solution

Under Interval and Fuzzy From the . . . Symmetric Markov Chains Uncertainty, Symmetric In

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Markov chains and MCMC methods Ingo Blechschmidt November 7th, 2014 Kleine Bayessche AG Markov

Markov chains Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad Niemi

Topology-Aware Cooperative Data Protection in Blockchain-Based Decentralized Storage Networks

CSEP 590 B Computational Biology Gene Expression Analysis 1 Assaying Gene Expression 3

Chapt hapter er 6 6 Parallel Processors from Client to Cloud 6.1 Introduction Introduction

Lecture 3 SIMD and Vectorization GPU Architecture Todays lecture Vectorization and SSE

Targeted end-to-end knowledge graph decomposition Bla krlj, Jan Kralj and Nada Lavra c

Collaboration wont just happen Supervising Co-Teaching Teams: Whose Line is it Deliberate

Python - Data Analysis Essentials Day 2 Giuseppe Accaputo g@accaputo.ch 01.12.2018 Slide 1 IT

ML HPC: Optimizing Optimizers for Optimization Workshop on the Convergence of ML &amp; HPC

ML HPC: Optimizing Optimizers for Optimization Workshop on the Convergence of ML & HPC