Stratified Markov Chain Monte Carlo Brian Van Koten University of - PowerPoint PPT Presentation

Stratified Markov Chain Monte Carlo Brian Van Koten University of Massachusetts, Amherst Department of Mathematics and Statistics with A. Dinner, J. Tempkin, E. Thiede, B. Vani, and J. Weare June 28, 2019

Sampling Problems What is the probability of Bayesian inference for ODE finding a protein in a given model of circadian rhythms. conformation? Figure from Folding@home Figure from Phong, et al, PNAS, 2012 Compute sample from Compute sample from Boltzmann distribution . posterior distribution .

Markov Chain Monte Carlo (MCMC) � Goal: Compute π ( g ) := g ( x ) π ( dx ). MCMC Method: Choose Markov chain X n so that N − 1 1 � lim g ( X n ) = π ( g ) . N N →∞ n =0 � �� “ X n samples π .” MCMC trajectory X n Target Density 2 1 position ( x ) 0 1 2 0 5000 10000 0 1 time step ( n ) density ( )

Difficulties with MCMC MCMC trajectory X n Target Density M position ( x ) 0 0 5000 10000 0 1 time step ( n ) density ( ) Multimodality: Multimodality = ⇒ slow convergence Tails: Need large sample to compute small probabilities, e.g. π ([ M , ∞ )).

Sketch of Stratified MCMC Target Distribution 1. Choose family of strata , i.e. distributions 1 π i whose supports cover support of target π . 2. Sample strata by MCMC. 3. Estimate π ( g ) from samples of strata. 0 Typical Strata: π i ( dx ) ∝ ψ i ( x ) π ( dx ) Strata i for “localized” ψ i . 2 Why Stratify? 1 • Strata may be unimodal , even if π is multimodal • Can concentrate sampling in tail 0 2 1 0 1 2

History of Stratification Surveys: [Russian census, late 1800s] , [Neyman, 1937] Bayes factors: [Geyer, 1994] Selection bias models: [Vardi, 1985] Free energy: [Umbrella Sampling, 1977] , [WHAM, 1992] , [MBAR, 2008] Ion channels: [Berneche, et al, 2001] Protein folding: [Boczko, et al, 1995] Problems: 1. WHAM/MBAR are complicated iterative methods . . . 2. No clear story explaining benefits of stratification. 3. Stratification underappreciated as a general strategy. 4. Need good error bars for adaptivity.

History of Stratification Surveys: [Russian census, late 1800s] , [Neyman, 1937] Bayes factors: [Geyer, 1994] Selection bias models: [Vardi, 1985] Free energy: [Umbrella Sampling, 1977] , [WHAM, 1992] , [MBAR, 2008] Ion channels: [Berneche, et al, 2001] Protein folding: [Boczko, et al, 1995] Problems: 1. WHAM/MBAR are complicated iterative methods . . . 2. No clear story explaining benefits of stratification. 3. Stratification underappreciated as a general strategy. 4. Need good error bars for adaptivity. BvK, et al: Propose Eigenvector Method for Umbrella Sampling, develop story , error bars , stratification for dynamical quantities . . .

Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Target Distribution 1 Bias Functions: { ψ i } L i =1 with L 0 � ψ i ( x ) = 1 and ψ i ( x ) ≥ 0 . Bias Functions i =1 i 1 Note: User chooses bias functions. 0 Weights: z i = π ( ψ i ) Strata i 2 Strata: π i ( dx ) = z − 1 ψ i ( x ) π ( dx ) i 1 0 2 1 0 1 2

Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: L � � π ( g ) = g ( x ) ψ i ( x ) π ( dx ) i =1 � �� ψ i ’s sum to one L � L g ( x ) ψ i ( x ) π ( dx ) � � = z i = z i π i ( g ) . z i i =1 i =1 � �� π i ( dx )

Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: L � � π ( g ) = g ( x ) ψ i ( x ) π ( dx ) i =1 � �� ψ i ’s sum to one L � L g ( x ) ψ i ( x ) π ( dx ) � � = z i = z i π i ( g ) . z i i =1 i =1 � �� π i ( dx ) How to express weights z i = π ( ψ i ) as averages over strata?

Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: π ( g ) = � L i =1 z i π i ( g ). How to express weights z i = π ( ψ i ) as averages over strata?

Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: π ( g ) = � L i =1 z i π i ( g ). How to express weights z i = π ( ψ i ) as averages over strata? L � z T = z T F z j = π ( ψ j ) = z i π i ( ψ j ) ⇐ ⇒ , where F ij = π i ( ψ j ) . � �� i =1 eigenproblem overlap matrix

Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: π ( g ) = � L i =1 z i π i ( g ). To express weights z i = π ( ψ i ) as averages over strata, z T = z T F , where F ij = π i ( ψ j ) . � �� eigenproblem overlap matrix

Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: π ( g ) = � L i =1 z i π i ( g ). To express weights z i = π ( ψ i ) as averages over strata, z T = z T F , where F ij = π i ( ψ j ) . � �� eigenproblem overlap matrix Why does eigenproblem determine z? 1. F is stochastic; z is a probability vector. 2. If F irreducible, z is unique solution of eigenproblem.

Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] i =1 z i π i ( g ), and z T = z T F for F ij = π i ( ψ j ). Recall: π ( g ) = � L EMUS Algorithm: 1. Choose bias functions ψ i and processes X i n sampling the strata. � N i 1 n =1 g ( X i 2. Compute ¯ g i := n ) to estimate π i ( g ). N i � N i 3. Compute ¯ 1 n =1 ψ j ( X i F ij := n ) to estimate F . N i z T = ¯ z T ¯ 4. Solve eigenproblem ¯ F to estimate weights z . 5. Output g EM = � L i =1 ¯ z i ¯ g i .

Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] i =1 z i π i ( g ), and z T = z T F for F ij = π i ( ψ j ). Recall: π ( g ) = � L EMUS Algorithm: 1. Choose bias functions ψ i and processes X i n sampling the strata. � N i 1 n =1 g ( X i 2. Compute ¯ g i := n ) to estimate π i ( g ). N i � N i 3. Compute ¯ 1 n =1 ψ j ( X i F ij := n ) to estimate F . N i z T = ¯ z T ¯ 4. Solve eigenproblem ¯ F to estimate weights z . 5. Output g EM = � L i =1 ¯ z i ¯ g i . Key Point: Simplicity of EMUS enables analysis of stratification.

EMUS Analysis: Outline 1. Sensitivity of g EM to sampling error. 2. Dependence of sampling error on choice of strata. 3. Stories involving multimodality and tails.

Quantifying Sensitivity to Sampling Error I For F irreducible and stochastic, let z ( F ) be the unique solution of z ( F ) T = z ( F ) T F . P F i [ τ j < τ i ]: probability of hitting j before i , conditioned on starting from i , for a Markov chain on 1 , . . . , L with transition matrix F . Theorem [BvK, et al] : � � 1 1 ∂ log z m i [ τ j < τ i ] ≤ 1 1 � � i [ τ j < τ i ] ≤ max ( F ) � ≤ . � � 2 P F ∂ F ij P F � F ij m =1 ,... L Led to new perturbation bounds for Markov chains [BvK, et al] .

Quantifying Sensitivity to Sampling Error II Assumption: CLT holds for MCMC averages: � g i − π i ( g )) d N i (¯ − → N (0 , C (¯ g i ) ) . � �� asymptotic variance √ � d � � � g EM �� g EM − π ( g ) Theorem [BvK, et al] : N − → N 0 , C , where   � ¯ � g EM � � L � L j =1 C F ij C 1 C (¯ g i ) � �   + z 2 var π ( g ) � × .   i P F i [ τ j < τ i ] 2 κ i κ i   i =1 j � = i � �� F ij > 0 error in ¯ F � �� sensitivity to error in ¯ F Notation: N is total sample size, with N i = κ i N from π i .

EMUS Analysis: Outline 1. Sensitivity of g EM to sampling error. 2. Dependence of sampling error on choice of strata. 3. Stories involving multimodality and tails.

Dependence of Sampling Error on Strata I Write π ( dx ) = Z − 1 exp( − V ( x ) /ε ) for some potential V : potential ( V ) 6 1 density ( ) 0 0 2 0 2 Assume bias functions ψ i piecewise constant: D Assume X i t is overdamped Langevin with reflecting boundaries: √ dX i t = − ∇ V ( X i 2 ε dB i t ) dt + + reflecting BCs t � �� gradient descent noise

Dependence of Sampling Error on Strata II Let π ( dx ) = Z − 1 exp( − V ( x ) /ε ) for some potential V : 6 1 = 1 potential (V) density ( ) = 5 3 0 0 1 2 0 2 Theorem [BvK, et al] : For overdamped Langevin with reflecting BCs, � max supp π i V − min supp π i V � D 2 C(¯ g i ) � × exp . var π i ( g ) ε ε �� diffusion scaling Arrhenius Notation: D is diameter of support of π i .

EMUS Analysis: Outline 1. Dependence of sampling error on choice of strata. 2. Sensitivity of g EM to sampling error. 3. Stories involving multimodality and tails.

Stratified Markov Chain Monte Carlo Brian Van Koten University of - PowerPoint PPT Presentation

Stratified Markov Chain Monte Carlo Brian Van Koten University of Massachusetts, Amherst Department of Mathematics and Statistics with A. Dinner, J. Tempkin, E. Thiede, B. Vani, and J. Weare June 28, 2019 Sampling Problems What is the

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Principal Process Analysis of biological models Stefano Casagranda, Delphine Ropers, Jean-Luc

Light and Sound as serious pollutants Jan Hollan CzechGlobe Global Change Research

Simple Solar Sunlight, anytime I would never use my Happy Light if my friends were

CONNECTING EUROPEAN NEUROSCIENCE WWW.FENS.ORG THE VOICE OF EUROPEAN

Integrative modeling of ! General aspects of docking ! Information-driven docking with HADDOCK

Statistical analysis applied to genome and proteome analyses (lecture 5) Warning: These notes

Click To Edit Master Title Style June 10, 2015 1:00 PM - 2:00 PM EDT 1-877-309-2074 Access

cer 2012 Can Cancer How might power frequency magnetic fields cause increased risk of childhood

Stratified Markov Chain Monte Carlo Brian Van Koten University of - PowerPoint PPT Presentation

Stratified Markov Chain Monte Carlo Brian Van Koten University of Massachusetts, Amherst Department of Mathematics and Statistics with A. Dinner, J. Tempkin, E. Thiede, B. Vani, and J. Weare June 28, 2019 Sampling Problems What is the

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Principal Process Analysis of biological models Stefano Casagranda, Delphine Ropers, Jean-Luc

Light and Sound as serious pollutants Jan Hollan CzechGlobe Global Change Research

Simple Solar Sunlight, anytime I would never use my Happy Light if my friends were

CONNECTING EUROPEAN NEUROSCIENCE WWW.FENS.ORG THE VOICE OF EUROPEAN

Integrative modeling of ! General aspects of docking ! Information-driven docking with HADDOCK

Statistical analysis applied to genome and proteome analyses (lecture 5) Warning: These notes

Click To Edit Master Title Style June 10, 2015 1:00 PM - 2:00 PM EDT 1-877-309-2074 Access

cer 2012 Can Cancer How might power frequency magnetic fields cause increased risk of childhood

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly