Adaptive and Interacting Markov chain Monte Carlo Gersende FORT - PowerPoint PPT Presentation

Adaptive and Interacting Markov chain Monte Carlo Adaptive and Interacting Markov chain Monte Carlo Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Talk based on joint works with Eric Moulines (Telecom ParisTech, France) , Pierre Priouret (Univ. Paris 6, France) and Pierre Vandekerkhove (Univ. Marne-la-Vall´ ee, France) . Amandine Schreck (Telecom ParisTech, France) . Benjamin Jourdain (ENPC, France) , Estelle Kuhn (INRA, France) , Tony Leli` evre (ENPC, France) and Gabriel Stoltz (ENPC, France) .

Adaptive and Interacting Markov chain Monte Carlo Introduction Hastings-Metropolis algorithm (1/2) Given on X ⊆ R d (to simplify the talk) a target density π a proposal transition kernel q ( x, y ) define { X k , k ≥ 0 } iteratively as ( i ) draw Y ∼ q ( X k , · ) ( ii ) compute α ( X k , Y ) = 1 ∧ π ( Y ) q ( Y, X k ) π ( X k ) q ( X k , Y ) � Y with prob. α ( X k , Y ) ( iii ) set X k +1 = X k with prob. 1 − α ( X k , Y )

Adaptive and Interacting Markov chain Monte Carlo Introduction Hastings-Metropolis algorithm (2/2) Then ( X k ) k ≥ 0 is a Markov chain with transition kernel P � � P ( x, A ) = α ( x, y ) q ( x, y ) λ ( dy ) + 1 I A ( x ) (1 − α ( x, y )) q ( x, y ) λ ( dy ) Under conditions on π and q d P k ( x, · ) Ergodic behavior : − → π � P k ( x, · ) − π � TV ≤ B ( x, k ) Explicit control of ergodicity Law of Large Numbers n � 1 � a.s. − → f ( X k ) f π dλ n k =1 Central Limit Theorem � � n � √ n 1 � d → N (0 , σ 2 f ( X k ) − f π dλ − f ) n k =1

Adaptive and Interacting Markov chain Monte Carlo Introduction ex. : Efficiency of a Gaussian Random Walk Hastings-Metropolis When λ ≡ Lebesgue on R and q ( x, · ) ≡ N ( x, θ ) efficiency compared through the (estimated) lag- s autocovariance function γ s = E [ X 0 X s ] − ( E [ X 0 ]) 2 when X 0 ∼ π 3 2.5 3 2 2 2 1.5 1 1 1 0 0 0.5 −1 −1 0 −2 −2 −0.5 −3 −1 −3 0 500 1000 0 500 1000 0 500 1000 1 1 1.2 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 −0.2 0 50 100 0 50 100 0 50 100 For 3 different values of θ : [top] a path ( X k , k ≥ 1) [bottom] s �→ γ ( s ) /γ (0) ֒ → Online Adaption of the design parameters θ

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC Introduction Examples of adaptive and interacting MCMC The Adaptive Metropolis sampler The Wang-Landau sampler The Equi-Energy sampler Convergence results Unfortunately ... Ergodic behavior Central Limit Theorems Conclusion Bibliography

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Adaptive Metropolis sampler Example 1 : Adaptive Metropolis (1/2) Proposed by Haario et al. (2001) : learn on the fly the optimal covariance of the Gaussian proposal distribution Define a process { X k , k ≥ 0 } such that (i) update the chain : P ( X k +1 ∈ A |F k ) ≡ one step of Gaussian HM, with covariance matrix θ k (ii) update the estimate of the covariance matrix θ k +1 = function ( k, θ k , X k +1 ) .

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Adaptive Metropolis sampler Example 1 : Adaptive Metropolis (2/2) The general framework : Let P θ be a Gaussian Hastings-Metropolis kernel ; θ is the covariance matrix of the Gaussian proposal distribution. For any θ : πP θ = π The adaptive algorithm : (i) Sample X k +1 |F k ∼ P θ k ( X k , · ) (ii) Update the parameter θ k +1 by using θ k , X k +1 . Here, θ is a covariance matrix.

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (1/4) Proposed by Wang and Landau (2001) for sampling systems in molecular dynamics ; many metastable states ↔ many local modes separated with deep valleys. Idea : Let X 1 , · · · , X d be a partition of X . Set d π ( x ) � π θ ⋆ ( x ) ∝ θ ⋆ ( i )1 I X i ( x ) θ ⋆ ( i ) = π ( X i ) i =1 The idea is to obtain samples (approx.) under π θ ⋆ . Then, by an importance ratio, these samples will approximate π . n n 1 ⇒ 1 � � δ X k ≈ π θ ⋆ = I X k ∈ X i δ X k ≈ π roughly : θ ⋆ ( i )1 n n k =1 k =1 WL is an algorithm which provides an estimation of θ ⋆ and samples approx. distributed under π θ ⋆ .

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (2/4) Define { X k , k ≥ 0 } iteratively (i) Sample X k +1 |F k ∼ MCMC sampler with target distribution π θ k (ii) Update the parameter θ k +1 = function ( k, θ k , X k +1 ) The parameter { θ k , k ≥ 0 } is updated through a Stochastic Approximation procedure θn +1 = θn + γn +1 h ( θn ) + γn +1noise n +1 with mean field h such that if { θ k , k ≥ 0 } converges, its limiting value is θ ⋆ .

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (3/4) 8 7 2.5 2.5 6 2 2 5 1.5 4 1.5 1 3 1 2 0.5 0.5 1 0 0 0 −0.5 3 −0.5 2 −1 1 −1 0 −1.5 −1.5 −1 −2.5 −2 −1 −1.5 −2 −2 0.5 0 −0.5 −2 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 2 1.5 1 2.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure : [left] level curves of π [center] Target density π [right] Partition of the state space 0.14 0.12 0.12 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0 0.5 e6 1 e6 1.5 e6 2 e6 2.5 e6 3 e6 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure : [left] The sequences ( θk ( i )) k . [right] The limiting value θ⋆ ( i )

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (3/4) 8 7 2.5 2.5 6 2 5 2 1.5 4 1.5 1 3 1 0.5 2 0.5 1 0 0 0 −0.5 3 −0.5 2 −1 1 −1 0 −1.5 −1.5 −1 −2 −2.5 −2 −0.5 −1 −1.5 −2 1 0.5 0 −2 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 2.5 2 1.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure : [left] level curves of π [center] Target density π [right] Partition of the state space beta=4 beta=4 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 −2 −2 0 2 4 6 8 10 12 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 4 6 x 10 x 10 Figure : [left] Wang Landau, T = 110 000 . [right] Hastings Metropolis, T = 2 106 ; the red line is at x = 110 000

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (4/4) The general framework : Let π θ be a distribution. Let P θ be MCMC sampler with target distribution π θ . For any θ : π θ P θ = π θ The adaptive algorithm : (i) Sample X k +1 |F k ∼ P θ k ( X k , · ) (ii) Update the parameter θ k +1 by using θ k , X k +1 . Here, θ = ( θ (1) , · · · , θ ( d )) is a probability on { 1 , · · · , d } .

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Equi-Energy sampler Example 3 : Equi-Energy (1/3) Proposed by Kou et al. (2006) to sample multimodal target density π π 1 /T Based on an auxiliary process designed to admit ( T > 1) as target distribution. 0.5 target distribution equi-energy jump 0.4 local move 0.3 0.2 tempered distribution boundary 2 0.1 boundary 1 0 -6 -4 -2 0 2 4 6 current state The transition kernel X k → X k +1 is ˜ P θ k ( X k , · ) = (1 − ǫ ) Q ( X k , · ) + ǫ Q θ k ( X k , · ) � �� MCMC with target π kernel depending on the empirical distribution θ k of the auxiliary process

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Equi-Energy sampler Example 3 : Equi-Energy (2/3) Target density : mixture of 2−dim Gaussian 10 target density : π = � 20 8 i =1 N 2 ( µ i , Σ i ) 6 5 processes with target distribution π 1 /T k 4 2 ( T K = 1) 0 draws means of the components −2 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 1 Target density at temperature 2 Target density at temperature 3 14 12 12 12 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 −2 draws draws draws means of the components means of the components means of the components −4 −2 −2 −2 0 2 4 6 8 10 12 −2 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 4 Target density at temperature 5 Hastings−Metropolis 12 10 10 9 10 8 8 8 7 6 6 6 4 5 4 4 2 3 2 2 0 0 1 draws draws draws means of the components means of the components means of the components −2 −2 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Equi-Energy sampler Example 3 : Equi-Energy (3/3) The general framework : Let P θ be the kernel associated to a EE-transition when the equi-energy jump uses a point sampled under the distribution θ . Under assumptions, for any θ : ∃ π θ s.t. π θ P θ = π θ . The adaptive algorithm : (i) Sample X k +1 |F k ∼ P θ k ( X k , · ) (ii) Update the distribution θ k +1 by using θ k and (auxiliary process) k +1 . Here, θ k is an empirical distribution on X .

Adaptive and Interacting Markov chain Monte Carlo Gersende FORT - PowerPoint PPT Presentation

Adaptive and Interacting Markov chain Monte Carlo Adaptive and Interacting Markov chain Monte Carlo Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Talk based on joint works with Eric Moulines (Telecom ParisTech, France) , Pierre

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Distributed Markov chain Monte Carlo Lawrence Murray CSIRO Mathematics, Informatics and

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

The magic of correlation in measurements from dc to optics Enrico Rubiola FEMTO-ST Institute,

Outline Stochastic processes Time Series Modelling and Kalman Filters AR, MA and ARMA

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

System Analysis and Optimizations of Human Actuated Dynamical Systems Sangjae Bae, Sang Min Han,

tr rst r

TheoryGuru: A Mathematica Package to Apply Quantifier Elimination Technology to Economics C.

Tracking Predictable Drifting Parameters Paulo Serra of a Time Series The Model Joint work with

Conditional Entropy and Failed Error Propagation in Software Testing Rob Hierons Brunel

Adaptive and Interacting Markov chain Monte Carlo Gersende FORT - PowerPoint PPT Presentation

Adaptive and Interacting Markov chain Monte Carlo Adaptive and Interacting Markov chain Monte Carlo Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Talk based on joint works with Eric Moulines (Telecom ParisTech, France) , Pierre

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Distributed Markov chain Monte Carlo Lawrence Murray CSIRO Mathematics, Informatics and

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

The magic of correlation in measurements from dc to optics Enrico Rubiola FEMTO-ST Institute,

Outline Stochastic processes Time Series Modelling and Kalman Filters AR, MA and ARMA

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

System Analysis and Optimizations of Human Actuated Dynamical Systems Sangjae Bae, Sang Min Han,

tr rst r

TheoryGuru: A Mathematica Package to Apply Quantifier Elimination Technology to Economics C.

Tracking Predictable Drifting Parameters Paulo Serra of a Time Series The Model Joint work with

Conditional Entropy and Failed Error Propagation in Software Testing Rob Hierons Brunel

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly