Quasi-Reliable Estimates of Effective Sample Size Robert Skeel - PowerPoint PPT Presentation

Quasi-Reliable Estimates of Effective Sample Size Robert Skeel Purdue University / Arizona State University July 20, 2016 with Youhan Fang and Yudong Cao

Evaluating Performance of MCMC Methods Statistician Charles Geyer (1992): It would enforce a salutary discipline if the gold standard for comparison of Markov chain Monte Carlo schemes were asymptotic variance . . . it is easier to invent methods than to understand exactly what their strengths and weaknesses are . . . . Physicist Alan Sokal (1997): the better algorithm is the one that has the smaller autocorrelation time, when time is measured in units of CPU time

The Cost of an Independent Sample N τ × cost per step = ESS × cost per step where τ is the integrated autocorrelation time , N = sample size, and ESS is effective sample size .

The Cost of an Independent Sample N τ × cost per step = ESS × cost per step where τ is the integrated autocorrelation time , N = sample size, and ESS is effective sample size . However, not generally accepted for molecular simulation, because very often . . . ESS = 0.

The Cost of an Independent Sample N τ × cost per step = ESS × cost per step where τ is the integrated autocorrelation time , N = sample size, and ESS is effective sample size . However, not generally accepted for molecular simulation, because very often . . . ESS = 0. Nonetheless, for comparing methods; one can simply use small representative problems.

Evaluating Accuracy of MCMC Estimates Alan Sokal again: It is important to emphasize that unless further information is given—namely, the autocorrelation time of the algorithm— . . . statements [of sample size] have no value whatsoever . What about problems for which even ESS = 1 is infeasible?

What to Do With Intractable Problems 1. Do “exploration” instead of sampling. 2. Change the problem: 2.1 Use a simpler model (Daniel Zuckerman). A coarsened model with error bars is better than a refined model without error bars (because then the limitations are more explicit). 2.2 Artificially limit the extent of state space, and adjust conclusions accordingly.

Reliability and Quasi-reliability Reliability is impractical in general due to metastability. Here quasi-reliability means ensuring thorough sampling of that part of state space that has already been explored, to minimize the risk of missing an opening to an unexplored part of state space. More concretely, it means thorough sampling of those modes that have already been visited, to minimize the risk of missing an opening to an unvisited mode.

Outline Preliminaries Thorough Sampling Coping with Statistical Error Reversible Samplers

Problem Statement Given probability density ρ q ( q ), q ∈ R d , known only up to a multiplicative factor, compute observables E [ u ( Q )], which are expectations for specified functions u ( q ). Note use of upper case for random variables.

Markov Chains Q 0 → Q 1 → · · · → Q N − 1 Assume ergodic with stationary density ρ q ( q ). Also, assume Q 0 ∼ ρ q ( q ). To estimate an observable, use N − 1 E [ u ( Q )] ≈ U = 1 � u ( Q n ) , N n =0 but use just one realization q 0 → q 1 → · · · → q N − 1 .

Variance of the Estimated Mean � C ( k ) N − 1 � � Var [ U ] = 1 � 1 − k � N Var [ u ( Q )] 1 + 2 C (0) N k =1 where the covariances C ( k ) = E [( u ( Q 0 ) − µ )( u ( Q k ) − µ )] , with µ = E [ u ( Q )]) . In the limit N → ∞ , Var [ U ] = 1 N Var [ u ( Q )] τ + o ( 1 N ) where τ is the integrated autocorrelation time + ∞ C ( k ) � τ = 1 + 2 C (0) . k =1

Estimates of CoVariances N − k − 1 C N ( k ) = 1 � ( u ( q n ) − u )( u ( q n + k ) − u ) . N n =0 Priestley (1981, pp.323–324)

MCMC in Extended State Space Some samplers augment state variables q with auxiliary variables p , extend the p.d.f. to ρ ( q , p ) so that � ρ ( q , p ) d p = ρ q ( q ) , and make moves in extended state space z 0 → z 1 → · · · → z N − 1 where z = ( q , p ).

Forward Transfer Operator Associated with the MCMC propagator is an operator F , which maps a relative density u n − 1 = ρ n − 1 /ρ for Z n − 1 to a relative density u n = ρ n /ρ for Z n : 1 � ρ ( z | z ′ ) u n − 1 ( z ′ ) ρ ( z ′ ) d z ′ u n = F u n − 1 where F u n − 1 ( z ) = ρ ( z ) and ρ ( z | z ′ ) is the transition probablity for the chain. Note that u ≡ 1 is an eigenfunction of F for eigenvalue λ 1 = 1.

Error in Probability Density starting from ρ 0 ( q ) = δ ( q − q 0 ) is proportional to 1 / | 1 − λ 2 | , where λ 2 is the nonunit eigenvalue of F nearest 1. However , the spectral gap is not the relevant quantity—in general.

Error in Estimate of τ � An estimate of τ has a standard deviation ≈ M / N τ where M is the number of terms taken in formula for τ . Therefore, use instead N − 1 w ( k ) C ( k ) � τ ≈ 1 + 2 C (0) k =1 where w ( k ) is a decreasing function known as a lag window . The tiny program acor of Jonathan Goodman (2009) uses a lag window that is 1 from 0 to roughly 1 . 4 τ and decreases linearly from 1 . 4 τ to 2 . 4 τ . It requires the number of samples N to exceed 35 τ , roughly.

Outline Preliminaries Thorough Sampling Coping with Statistical Error Reversible Samplers

Weakness of Existing Approach Mixture of Gaussians − log ρ ( q 1 , q 2 ) = 2 (16( q 1 + 2) 2 + ( q 2 − 4) 2 ) + 1 2 (( q 1 − 2) 2 + 16 q 2 1 2 ) + const Observable function: u ( q 1 , q 2 ) = q 1 .

Estimate of τ as N Increases Sampler: Euler-Maruyama Brownian dynamics (w/o rejections). Need to ensure thorough sampling.

Definition of Thorough Sampling: for any subset A of state space, an estimate 1 A of E [1 A ( Q )] = Pr( Q ∈ A ) should satisfy Var [1 A − E [1 A ( Q )]] ≤ 1 4 tol 2 .

Definition of Thorough Sampling: for any subset A of state space, an estimate 1 A of E [1 A ( Q )] = Pr( Q ∈ A ) should satisfy Var [1 A − E [1 A ( Q )]] ≤ 1 4 tol 2 . Since N Var [1 A ( Q )] ≤ 1 1 Var [1 A − E [1 A ( Q )]] ≈ τ A 4 N τ A , it is enough to have 4 N τ A ≤ 1 1 4 tol 2 for all A .

Symmetries There may be permutations P such that ρ q ( P q ) = ρ q ( q ) and u ( P q ) = u ( q ) for all interesting u . Then, consider only A for which 1 A ( P q ) = 1 A ( q ).

Symmetries There may be permutations P such that ρ q ( P q ) = ρ q ( q ) and u ( P q ) = u ( q ) for all interesting u . Then, consider only A for which 1 A ( P q ) = 1 A ( q ). For simplicity, instead of only indicator functions, consider all functions in W = { u = u ( q ) | E [ u ( Q )] = 0 , u ( P q ) = u ( q ) for symmetries P } .

For More Explicit Notation Introduce the inner product � � v , u � = v ( z ) u ( z ) ρ ( z ) d z . Note that � 1 , u � = E [ u ( Z )]. One can show by induction that E [ v ( Z 0 ) u ( Z k )] = �F k v , u � . and, in particular, C ( k ) = �F k u , u � .

Maximum Autocorrelation Time Define thorough sampling using � + ∞ � �F k u , u � � τ max = max 1 + 2 . � u , u � u ∈ W k =1

Spatial Discretization Consider u ( q ) = a T u ( q ) where u i ∈ W are given and a i are unknown. Then C ( k ) = �F k u , u � = a T C k a where C k = �F k u , u T � = E [ u ( Q 0 ) u ( Q k ) T ] and + ∞ a T K a � τ max ≈ max where K = C 0 + 2 C k a T C 0 a a k =1 —an eigenvalue problem. For u , suggest u i ( q ) = q i .

Cross Covariance Matrices C 0 is symmetric positive definite. The eigenvalue problem is well conditioned if C k is symmetric. This depends on the sampler.

Reversible MCMC Samplers satisfy detailed balance: ρ ( z ′ | z ) ρ ( z ) = ρ ( z | z ′ ) ρ ( z ′ ) e.g., • Brownian sampler, one step of the Euler-Maruyama integrator for Brownian dynamics with or without a MRRTT ( Metropolis-Rosenbluth- Rosenbluth -Teller-Teller ) acceptance test, • hybrid Monte Carlo, • a generalized hybrid Monte Carlo step, followed by a momenta flip, • several steps with a reversible Langevin integrator, followed by a momenta flip.

Modified Reversible MCMC Samplers The momenta flip is counterproductive: Reversing the momenta flip seems to improve mixing. A modified reversible propagator couples the reversible substep z n − 1 → ¯ z n with a substep, z n = R (¯ z n ) , where R satisfies R ◦ R = id and ρ ◦ R = ρ . For a momenta flip, R ( q , p ) = ( q , − p ).

An Empirical Comparison Canc` es, Legoll & Stoltz (2007) Discrepancy for a pair of torsion angles in alkanes. dimension 36 36 87 114 Langevin dynamics 0.034 0.016 0.014 0.017 HMC 0.039 0.012 0.026 0.021 Brownian dynamics 0.079 0.023 0.040 0.031 Nos´ e-Hoover chains 0.103 0.046 0.029 0.035 Brownian sampler 0.104 0.034 0.048 0.061 1 million samples for column 1, 10 million for others.

Discretized Langevin Dynamics Let F ( q ) = −∇ q log ρ ( q , p ). O: p := √ 1 − γ ∆ t p + √ γ ∆ t N (0 , I ) B: p := p + 1 2 ∆ tF ( q ) A: q := q + ∆ t p B: p := p + 1 2 ∆ tF ( q ) O: p := √ 1 − γ ∆ t p + √ γ ∆ t N (0 , I ) The dynamics has a precise stationary density. It differs from the desired density by O (∆ t 2 ). Leimkuhler, Matthews & Stoltz (2015) Such error is much smaller than statistical error. (The Euler-Maruyama integrator is a special case of this.)

Quasi-Reliable Estimates of Effective Sample Size Robert Skeel - PowerPoint PPT Presentation

Quasi-Reliable Estimates of Effective Sample Size Robert Skeel Purdue University / Arizona State University July 20, 2016 with Youhan Fang and Yudong Cao Evaluating Performance of MCMC Methods Statistician Charles Geyer (1992): It would

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

SAMPLE SIZE IN TRIAXIAL LOADS How sample size affects the frictional behavior Photo by H.

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Sample Size Power, Sample Size, and the FDR How many observations do we need? Depends on

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

Quasi-Resonant Converters Introduction 20.1 The zero-current-switching quasi-resonant switch

Well quasi-ordering Aronszajn lines. Carlos Martinez-Ranero Centro de Ciencias Matematicas March

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

Sample Preparation Sample Preparation Sample Size 6 mm x 12 mm x 50 mm 10 mm x 12 mm

Math 1710 Class 24 Examples Power 2-Sample CIs Dr. Allen Back and HTs 2-Sample

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Lumber Size Lumber Size Control Control Studies Studies Lumber Size Control Lumber Size

Lab 2 discussion Last Time Debugging Its a science use experiments to refine

SEM Photographs of Activated ash samples SEM Micrographs (Original ash samples) (a) Sample S1F1

Excursion 5 Tours I & II: Power: Pre-data, Post-data & How not to corrupt power A

Ambiguity and the Lexicon in Natural Language 2 The Lexicon Informatics 2A: Lecture 12 Closed vs.

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 12 2 The Lexicon Word

CIDA funded ICCIDD GN Project: Sustained Elimina<on of

Session Transcript: 07-07-2020 Yoga Alliance - CE Workshop | Yoga Philosophy Closed Captioning/

Fighting against theft, cloning and counterfeiting of integrated circuits Lilian Bossuet Associate

An Introduction to Analysing Qualitative Data PAUL GREENBANK CENTRE FOR LEARNING AND TEACHING

I believe that in all men's lives at certain periods, and in many men's lives at all periods

Quasi-Reliable Estimates of Effective Sample Size Robert Skeel - PowerPoint PPT Presentation

Quasi-Reliable Estimates of Effective Sample Size Robert Skeel Purdue University / Arizona State University July 20, 2016 with Youhan Fang and Yudong Cao Evaluating Performance of MCMC Methods Statistician Charles Geyer (1992): It would

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

SAMPLE SIZE IN TRIAXIAL LOADS How sample size affects the frictional behavior Photo by H.

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Sample Size Power, Sample Size, and the FDR How many observations do we need? Depends on

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

Quasi-Resonant Converters Introduction 20.1 The zero-current-switching quasi-resonant switch

Well quasi-ordering Aronszajn lines. Carlos Martinez-Ranero Centro de Ciencias Matematicas March

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

Sample Preparation Sample Preparation Sample Size 6 mm x 12 mm x 50 mm 10 mm x 12 mm

Math 1710 Class 24 Examples Power 2-Sample CIs Dr. Allen Back and HTs 2-Sample

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Lumber Size Lumber Size Control Control Studies Studies Lumber Size Control Lumber Size

Lab 2 discussion Last Time Debugging Its a science use experiments to refine

SEM Photographs of Activated ash samples SEM Micrographs (Original ash samples) (a) Sample S1F1

Excursion 5 Tours I &amp; II: Power: Pre-data, Post-data &amp; How not to corrupt power A

Ambiguity and the Lexicon in Natural Language 2 The Lexicon Informatics 2A: Lecture 12 Closed vs.

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 12 2 The Lexicon Word

CIDA funded ICCIDD GN Project: Sustained Elimina&lt;on of

Session Transcript: 07-07-2020 Yoga Alliance - CE Workshop | Yoga Philosophy Closed Captioning/

Fighting against theft, cloning and counterfeiting of integrated circuits Lilian Bossuet Associate

An Introduction to Analysing Qualitative Data PAUL GREENBANK CENTRE FOR LEARNING AND TEACHING

I believe that in all men's lives at certain periods, and in many men's lives at all periods

Excursion 5 Tours I & II: Power: Pre-data, Post-data & How not to corrupt power A

CIDA funded ICCIDD GN Project: Sustained Elimina<on of