Convergence and Efficiency of Adaptive Importance Sampling - PowerPoint PPT Presentation

Convergence and Efficiency of Adaptive Importance Sampling techniques with partial biasing Gersende Fort Institut de Math´ ematiques de Toulouse CNRS France Joint work with B. Jourdain, T. Leli` evre and G. Stoltz Talk based on the paper G.F., B. Jourdain, T. Leli` evre, G. Stoltz Convergence and Efficiency of Adaptive Importance Sampling techniques with partial biasing , J. Stat. Phys (2018) 1 / 17

The goal Assumption Let π · d µ be a probability distribution on X ⊆ R p assumed to be highly metastable and (possibly) known up to a normalizing constant. Question 1: How to design a MC sampler for an approximation of � f π d µ X Question 2: How to compute the free energy � − ln π d µ X i ⊂ X X i In this talk, an approach by Free Energy-based Adaptive Importance Sampling technique which is a generalization of Wang Landau, Self Healing Umbrella Sampling, Well tempered metadynamics. 2 / 17

The intuition (1/3) - a family of auxiliary distributions π ( x ) = 1 Z exp( − V ( x )) ◮ The auxiliary distribution Choose a partition X 1 , · · · , X d of X 8 7 2.5 6 2 5 1.5 4 1 3 2 0.5 1 0 0 −0.5 3 2 −1 1 0 −1.5 −1 −2 −2.5 −2 −0.5 −1 −1.5 −2 0.5 0 1.5 1 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 2.5 2 � def θ ∗ ,i = π d µ X i 3 / 17

The intuition (1/3) - a family of auxiliary distributions π ( x ) = 1 Z exp( − V ( x )) ◮ The auxiliary distribution Choose a partition X 1 , · · · , X d of X and for positive weights ∗ θ = ( θ 1 , · · · , θ d ) set � d π θ ( x ) ∝ 1 I X i ( x ) exp ( − V ( x ) − ln θ i ) i =1 ◮ Property 1: � � π θ d µ ∝ θ ∗ ,i def ∀ i ∈ { 1 , · · · , d } , θ ∗ ,i = π d µ θ i X i X i ◮ Property 2: � π θ ∗ d µ = 1 ∀ i ∈ { 1 , · · · , d } , d. X i ∗ θ i ∈ (0 , 1) , � d i =1 θ i = 1 3 / 17

Intuition (2/3) - How to choose θ ? � d � def π θ ( x ) ∝ 1 I X i ( x ) exp ( − V ( x ) − ln θ i ) θ ∗ ,i = π d µ X i i =1 ◮ If θ = θ ∗ Efficient exploration under π θ ∗ : each subset X i has the same weight under π θ ∗ Poor ESS: The IS approximation gets into � d � � N � � f π d µ ≈ d 1 I X i ( X n ) θ ∗ ,i f ( X n ) N X n =1 i =1 ◮ Choose ρ ∈ (0 , 1) and set θ ρ ∗ ∝ ( θ ρ ∗ , 1 , · · · , θ ρ ∗ ,d ) : � d � � d � � N � � � 1 θ 1 − ρ I X i ( X n ) θ ρ f π d µ ≈ 1 f ( X n ) ∗ ,i ∗ ,i N X n =1 i =1 i =1 ◮ But θ ∗ is unknown 4 / 17

Intuition (3/3) -Estimation of the free energy � C n,i def def θ ∗ ,i = π d µ ≈ θ n,i = ” Normalized count of the visits to X i ” � d j =1 C n,j X i ◮ Exact sampling If X n +1 ∼ π d µ : C n +1 ,i = C n,i + 1 I X i ( X n +1 ) This yields for all i = 1 , · · · , d n +1 d � � def C n +1 ,i = 1 I X i ( X k ) S n +1 = C n +1 ,i = ( n + 1) = O ( n ) k =1 i =1 and n +1 � 1 1 θ n +1 ,i = 1 I X i ( X n +1 ) = θ n,i + n + 1 (1 I X i ( X n +1 ) − θ n,i ) n + 1 k =1 i.e. Stochastic Apprimation scheme with learning rate 1 /S n +1 , and limiting point θ ∗ ,i 5 / 17

Intuition (3/3) -Estimation of the free energy � C n,i def def θ ∗ ,i = π d µ ≈ θ n,i = ” Normalized count of the visits to X i ” � d j =1 C n,j X i ◮ Exact sampling If X n +1 ∼ π d µ : C n +1 ,i = C n,i + 1 I X i ( X n +1 ) C n +1 ,i = C n,i + γ � ◮ IS sampling If X n +1 ∼ π � θ d µ : θ i 1 I X i ( X n +1 ) This yields for all i = 1 , · · · , d n +1 d � � def C n +1 ,i = γ � θ i 1 I X i ( X n +1 ) S n +1 = C n +1 ,i = O wp 1 ( n ) k =1 i =1 and S n +1 H i ( θ n , X n +1 ) + O ( 1 γ θ n +1 ,i = θ n,i + n 2 ) i.e. Stochastic Apprimation scheme with learning rate 1 /S n +1 , and limiting point θ ∗ ,i 5 / 17

Intuition (3/3) -Estimation of the free energy � C n,i def def θ ∗ ,i = π d µ ≈ θ n,i = ” Normalized count of the visits to X i ” � d j =1 C n,j X i ◮ Exact sampling If X n +1 ∼ π d µ : C n +1 ,i = C n,i + 1 I X i ( X n +1 ) C n +1 ,i = C n,i + γ � ◮ IS sampling If X n +1 ∼ π � θ d µ : θ i 1 I X i ( X n +1 ) ◮ IS sampling with a leverage effect If X n +1 ∼ π � θ d µ : S n � C n +1 ,i = C n,i + γ θ i 1 I X i ( X n +1 ) + ∞ g = + ∞ , lim inf lim s/g ( s ) > 0 g ( S n ) s This yields S n +1 − S n γ � S n +1 ↑ + ∞ = θ i 1 I X i ( X n +1 ) S n g ( S n ) and � � γ 2 γ θ n +1 ,i = θ n,i + g ( S n ) H i ( θ n , X n +1 ) + O g 2 ( S n ) i.e. S.A. scheme with learning rate γ/g ( S n ) , and limiting point θ ∗ ,i . 5 / 17

Intuition (3/3) -Estimation of the free energy � C n,i def def θ ∗ ,i = π d µ ≈ θ n,i = ” Normalized count of the visits to X i ” � d j =1 C n,j X i ◮ Exact sampling If X n +1 ∼ π d µ : C n +1 ,i = C n,i + 1 I X i ( X n +1 ) C n +1 ,i = C n,i + γ � ◮ IS sampling If X n +1 ∼ π � θ d µ : θ i 1 I X i ( X n +1 ) ◮ IS sampling with a leverage effect If X n +1 ∼ π � θ d µ : S n � C n +1 ,i = C n,i + γ θ i 1 I X i ( X n +1 ) + ∞ g = + ∞ , lim inf lim s/g ( s ) > 0 g ( S n ) s If g ( s ) = ln(1 + s ) α/ (1+ α ) , the learning rate is O ( t − α ) ◮ Key property: if X n +1 ∈ X i , then for any j � = i π θ n +1 ( X j ) > π θ n ( X j ) the probability of stratum # j increases 5 / 17

The algorithm: Adaptive IS with partial biasing def = (ln(1 + s )) α/ (1 − α ) . ◮ Fix: ρ ∈ (0 , 1) and α ∈ (1 / 2 , 1) . Set g ( s ) ◮ Initialisation: X 0 ∈ X, a positive weight vector θ 0 , ◮ Repeat, for n = 0 , · · · , N − 1 sample X n +1 ∼ P θ ρ n ( X n , · ) , a Markov kernel invariant wrt π θ ρ n d µ compute γ g ( S n ) S n θ ρ C n +1 ,i = C n,i + n,i 1 I X i ( X n +1 ) d � θ n +1 ,i = C n +1 ,i S n +1 = C n +1 ,i S n +1 i =1 ◮ Return ( θ n ) n sequ. of estimates of θ ∗ ; and the IS estimator � d � � d � � N � � � 1 θ 1 − ρ I X i ( X n ) θ ρ f π d µ ≈ 1 f ( X n ) n − 1 ,i n − 1 ,i N n =1 i =1 i =1 6 / 17

Convergence results 1 The limiting behavior of the estimates ( θ n ) n 2 The limiting distribution of X n 3 The limiting behavior of the IS estimator 7 / 17

Assumptions 1 On the target density and the strata X i : sup π < ∞ , 1 ≤ i ≤ d θ ∗ ( i ) > 0 min X 2 On the kernels P θ : Hastings-Metropolis kernel, with symmetric proposal q ( x, y ) d µ ( y ) such that inf X 2 q > 0 . for any compact subset K , there exists C and λ ∈ (0 , 1) s.t. � P n θ ( x, · ) − π θ � TV ≤ Cλ n sup θ ∈ K 3 ρ ∈ (0 , 1) 4 g ( s ) = (ln(1 + s )) α/ (1 − α ) with α ∈ (1 / 2 , 1) . 8 / 17

Convergence results: on the sequence θ n ◮ Recall def θ n +1 = θ n + γ n +1 H ( X n +1 , θ n ) + γ 2 n +1 Λ n +1 γ n +1 = γ/g ( S n ) where γ n is a positive random learning rate sup n � Λ n +1 � is bounded a.s. � H ( · , θ ) π θ ρ d µ = 0 iff θ = θ ∗ . ◮ Result 1   d � n γ n n α = (1 − α ) α γ 1 − α θ 1 − ρ   lim a.s. ∗ ,j j =1 ◮ Result 2: lim n θ n = θ ∗ a.s. 9 / 17

Convergence results - on the samples X n ◮ Recall X n +1 ∼ P θ ρ n ( X n , · ) π θ P θ = π θ ◮ Result 1 For any bounded function f � lim n E [ f ( X n )] = f π θ ρ ∗ d µ ◮ Result 2 For any bounded function f � N � 1 lim f ( X n ) = f π θ ρ ∗ d µ a.s. N N n =1 10 / 17

Convergence results - on the IS estimator ◮ Result 1 For any bounded function f       � � N � d � d  1 θ ρ θ 1 − ρ      = lim f ( X n ) n − 1 ,j 1 I X j ( X n ) f π d µ N E n − 1 ,j N n =1 j =1 j =1 ◮ Result 1 For any bounded function f , a.s.:     � N d d � � � 1  θ ρ   θ 1 − ρ  = lim f ( X n ) n − 1 ,j 1 I X j ( X n ) f π d µ n − 1 ,j N N n =1 j =1 j =1 11 / 17

Is it new ? ◮ Theoretical contribution Self Healing Umbrella Sampling ρ = 1 (no biasing intensity) g ( s ) = s (also covered by the theory; not detailed here) Well-tempered metadynamics ρ ∈ (0 , 1) (biasing intensity) g ( s ) = s 1 − ρ (also covered by the theory; not detailed here) ◮ Methodological contribution: the introduction of a function g ( s ) in the updating scheme of the estimator θ n , allowing a random learning rate γ n ∼ O wp 1 ( n − α ) for α ∈ (1 / 2 , 1) . 12 / 17

Is there a gain in such a self-tuned and partially biasing algorithm ? 8 7 2.5 6 2 5 1.5 4 1 3 2 0.5 1 0 0 −0.5 3 2 −1 1 0 −1.5 −1 −2.5 −1.5 −2 −2 −0.5 −1 −2 0.5 0 1.5 1 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 2.5 2 beta=1 beta=5 8 x 10 60 5 50 4 40 3 30 2 20 1 10 0 0 4 4 2 3 2 3 2 2 0 0 1 1 −2 0 −2 0 −1 −1 −4 −2 −4 −2 Make the metastability larger by increasing β . 13 / 17

Convergence and Efficiency of Adaptive Importance Sampling - PowerPoint PPT Presentation

Convergence and Efficiency of Adaptive Importance Sampling techniques with partial biasing Gersende Fort Institut de Math ematiques de Toulouse CNRS France Joint work with B. Jourdain, T. Leli` evre and G. Stoltz Talk based on the paper

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

The Importance of The Importance of The Importance of The Importance of Mechanical Insulation

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM

Local convergence of adaptive finite element methods for nonlinear problems Gantumur Tsogtgerel

Estimation of cosmological parameters using adaptive importance sampling Gersende FORT LTCI,

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

II of large Number Lattin in probability almost convergence convergence sure - - "

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Efficiency of Gaussian and Cauchy functions in Function method the Filled Function method Jos

Efficiency-Improvement Techniques Overview Reading: Ch. 11 in Law & Ch. 10 in Handbook of

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018

HMFEv - An Efficient Multivariate Signature Scheme Albrecht Petzoldt, Ming-Shing Chen, Jintai

Synthesising Efficient and Effective Security Protocols Chen Hao, John Clark, Jeremy Jacob

Teaching logic using a web interface for Coq October 31 Cezary Kaliszyk Radboud University

Efficiency for continuous facility location problems with attraction and repulsion Abderrahim

FCC-hh: Collimation system design M. Fiascaris with R. Bruce and S. Redaelli Acknowledgements

Sambuz

Useful Links

Newsletter

Mail Us

Convergence and Efficiency of Adaptive Importance Sampling - PowerPoint PPT Presentation

Convergence and Efficiency of Adaptive Importance Sampling techniques with partial biasing Gersende Fort Institut de Math ematiques de Toulouse CNRS France Joint work with B. Jourdain, T. Leli` evre and G. Stoltz Talk based on the paper

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

The Importance of The Importance of The Importance of The Importance of Mechanical Insulation

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM

Local convergence of adaptive finite element methods for nonlinear problems Gantumur Tsogtgerel

Estimation of cosmological parameters using adaptive importance sampling Gersende FORT LTCI,

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

II of large Number Lattin in probability almost convergence convergence sure - - &quot;

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Efficiency of Gaussian and Cauchy functions in Function method the Filled Function method Jos

Efficiency-Improvement Techniques Overview Reading: Ch. 11 in Law &amp; Ch. 10 in Handbook of

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018

HMFEv - An Efficient Multivariate Signature Scheme Albrecht Petzoldt, Ming-Shing Chen, Jintai

Synthesising Efficient and Effective Security Protocols Chen Hao, John Clark, Jeremy Jacob

Teaching logic using a web interface for Coq October 31 Cezary Kaliszyk Radboud University

Efficiency for continuous facility location problems with attraction and repulsion Abderrahim

FCC-hh: Collimation system design M. Fiascaris with R. Bruce and S. Redaelli Acknowledgements

Sambuz

Useful Links

Newsletter

Mail Us

II of large Number Lattin in probability almost convergence convergence sure - - "

Efficiency-Improvement Techniques Overview Reading: Ch. 11 in Law & Ch. 10 in Handbook of