Properties of the Stochastic Approximation Schedule in the - PowerPoint PPT Presentation

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Properties of the Stochastic Approximation Schedule in the Wang-Landau Algorithm Pierre E. Jacob CEREMADE, Universit´ e Paris Dauphine funded by AXA research MCQMC – February 2012 joint work with Luke Bornn (UBC), Arnaud Doucet (Oxford), Pierre Del Moral (INRIA & Universit´ e de Bordeaux), Robin J. Ryder (Dauphine) P.E.JACOB Wang-Landau 1/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Outline The algorithm 1 Unsettled issues 2 Flat Histogram in finite time 3 Parallel Interacting Chains 4 P.E.JACOB Wang-Landau 2/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Motivation 0.5 0.4 density 0.3 0.2 0.1 0.0 −4 −2 0 2 4 X Figure: A normal distribution biased to get desired frequencies in specific parts of the space. Here we use φ = { 75% , 25% } on { ] − ∞ , 0] , [0 , + ∞ [ } . P.E.JACOB Wang-Landau 3/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Motivation Histogram of the binned coordinate 0.4 0.3 Density 0.2 0.1 0.0 −4 −2 0 2 4 binned coordinate Figure: Normal biased to get the same frequency in each of 5 bins. P.E.JACOB Wang-Landau 4/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Setting Partition the state space d � X = X i i =1 Desired frequencies � φ = ( φ 1 , . . . , φ d ) such that φ i = 1 i Penalized distribution π θ ( x ) ∝ π ( x ) ∀ i ∈ { 1 , . . . , d } ∀ x ∈ X i θ ( i ) P.E.JACOB Wang-Landau 5/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains First algorithm Algorithm 1 Wang-Landau with deterministic schedule ( γ t ) 1: Init ∀ i ∈ { 1 , . . . , d } set θ 0 ( i ) ← 1 / d . 2: Init X 0 ∈ X . 3: for t = 1 to T do Sample X t from K θ t − 1 ( X t − 1 , · ), MH kernel targeting π θ t − 1 . 4: Update the penalties: 5: log θ t ( i ) ← log θ t − 1 ( i ) + γ t (1 I X i ( X t ) − φ i ) 6: end for P.E.JACOB Wang-Landau 6/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Flat Histogram Issues with the first version Choice of γ has a huge impact on the results. Flat Histogram Define the counters: t � ν t ( i ) := 1 I X i ( X n ) n =1 Flat Histogram (FH) is reached when: � � ν t ( i ) � � max − φ i � < c � � t i ∈{ 1 ,..., d } � P.E.JACOB Wang-Landau 7/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Flat Histogram Idea Instead of decreasing γ t at each time step t , decrease only when the Flat Histogram criterion is reached. In practice Denote by κ t the number of FH criteria reached up to time t . Use γ κ t instead of γ t at time t . If FH is reached at time t , reset ν t ( i ) to 0 for all i . P.E.JACOB Wang-Landau 8/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Wang–Landau with Flat Histogram Algorithm 2 Wang-Landau with stochastic schedule ( γ κ t ) 1: Init as before: X 0 , θ 0 ( i ). 2: Init κ 0 ← 0. 3: for t = 1 to T do Sample X t from K θ t − 1 ( X t − 1 , · ), MH kernel targeting π θ t − 1 . 4: If (FH) then κ t ← κ t − 1 + 1, otherwise κ t ← κ t − 1 . 5: Update the penalties: 6: log θ t ( i ) ← log θ t − 1 ( i ) + γ κ t (1 I X i ( X t ) − φ i ) 7: end for P.E.JACOB Wang-Landau 9/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Understanding the algorithm Pros and cons . . . it works much better than the first version. . . however it is a bit tricky to analyse. Putting a label on the algorithm It is an adaptive MCMC algorithm, ie the kernel changes at every time step. Here the target distribution changes at every time step but the proposal stays the same. Between two FH, γ κ t stays constant, so there is no diminishing adaptation . Hence the FH version is a bit more complicated than the deterministic version. P.E.JACOB Wang-Landau 10/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Understanding the algorithm A reasonable first step Proof that FH is met in finite time. (under strong assumptions) Note: it means the desired frequencies are reached, when γ stays constant. ⇒ it might be a hint that the diminishing γ does not play a big part in the algorithm. P.E.JACOB Wang-Landau 11/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains FH is met in finite time To be sure that eventually, for any c > 0: � � ν t ( i ) � � max − φ i � < c � � t i ∈{ 1 ,..., d } � we want to prove: ν t ( i ) P ∀ i ∈ { 1 , . . . , d } − t →∞ φ i − − → t P.E.JACOB Wang-Landau 12/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Various updates Right update log θ t ( i ) ← log θ t − 1 ( i ) + γ (1 I X i ( X t ) − φ i ) (1) Wrong update θ t ( i ) ← θ t − 1 ( i ) [1 + γ (1 I X i ( X t ) − φ i )] ⇔ log θ t ( i ) ← log θ t − 1 ( i ) + log [1 + γ (1 I X i ( X t ) − φ i )] (2) (actually not wrong if ∀ i φ i = 1 d ) P.E.JACOB Wang-Landau 13/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Assumptions From now on, there are only two bins: d = 2. Additionally: Assumption The bins are not empty with respect to µ and π : ∀ i ∈ { 1 , 2 } µ ( X i ) > 0 and π ( X i ) > 0 Assumption The state space X is compact. P.E.JACOB Wang-Landau 14/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Assumptions Assumption The proposition distribution Q ( x , y ) is such that: ∃ q min > 0 ∀ x ∈ X ∀ y ∈ X Q ( x , y ) > q min Assumption The MH acceptance ratio is bounded from both sides: m < π ( y ) Q ( y , x ) ∃ m > 0 ∃ M > 0 ∀ x ∈ X ∀ y ∈ X Q ( x , y ) < M π ( x ) P.E.JACOB Wang-Landau 15/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Theorem Theorem Consider the sequence of penalties θ t introduced in the WL algorithm. We define: Z t = log θ t (1) θ t (2) = log θ t (1) − log θ t (2) Then: Z t L 1 − t →∞ 0 − − → t and consequently, with update (1) (FH) is reached in finite time for any precision threshold c, whereas this is not guaranteed for update (2). P.E.JACOB Wang-Landau 16/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Consequence Recall t � ν t ( i ) := 1 I X i ( X n ) n =1 Using update (1), and starting from Z 0 = 0: Z t = log θ t (1) − log θ t (2) = ( ν t (1) γ (1 − φ 1 ) − ( t − ν t (1)) γφ 1 ) − ( ν t (2) γ (1 − φ 2 ) − ( t − ν t (2)) γφ 2 ) = ν t (1) (2 γ ) − t (2 γφ 1 ) L 1 using ν t (1) + ν t (2) = t and φ 1 + φ 2 = 1. Hence if Z t − t →∞ 0 then − − → t ν t (1) L 1 − t →∞ φ 1 − − → t P.E.JACOB Wang-Landau 17/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Proof ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Z s ● ● ● ● ● ● ● ● ~ ● ● ● ● ~ ● Z ● s + T ● ● ● ● ● ● ● ● ● Z s + T ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 30 35 time Figure: We prove that Z t returns below a given horizontal bar whenever it goes above it, and it does so in finite time. It then implies Z t / t → 0. P.E.JACOB Wang-Landau 18/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Parallel Interacting Chains A parallel version of the algorithm runs N chains in parallel (see e.g. F.Liang, JSP 2006). Target the same distribution Each new value ( X ( k ) ) is drawn from a MH kernel K θ t − 1 ( X ( k ) t − 1 , · ) t using the same penalties ( θ t ). Interaction between chains To update θ t use an average: N 1 I X i ( X ( k ) � 1 ) t N k =1 instead of 1 I X i ( X t ). P.E.JACOB Wang-Landau 19/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Parallel Interacting Chains Reaching Flat Histogram 40 30 #FH N = 1 N = 10 20 N = 100 10 2000 4000 6000 8000 10000 iterations P.E.JACOB Wang-Landau 20/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Parallel Interacting Chains Stabilization of the log penalties 10 5 value 0 −5 −10 2000 4000 6000 8000 10000 iterations Figure: log θ t against t , for N = 1 P.E.JACOB Wang-Landau 21/ 25

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Parallel Interacting Chains Stabilization of the log penalties 10 5 value 0 −5 −10 2000 4000 6000 8000 10000 iterations Figure: log θ t against t , for N = 10 P.E.JACOB Wang-Landau 22/ 25

Properties of the Stochastic Approximation Schedule in the - PowerPoint PPT Presentation

The algorithm Unsettled issues Flat Histogram in finite time Parallel Interacting Chains Properties of the Stochastic Approximation Schedule in the Wang-Landau Algorithm Pierre E. Jacob CEREMADE, Universit e Paris Dauphine funded by AXA

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

6. Approximation and fitting norm approximation least-norm problems regularized

Multi-level stochastic approximation algorithms Noufel Frikha Universit e Paris Diderot, LPMA

Stochastic approximation-based algorithms, when the Monte Carlo bias does not vanish Gersende

Bridging the gap between Stochastic Approximation and Markov chains Aymeric DIEULEVEUT ENS

Stochastic Approximation in Hilbert Spaces Aymeric DIEULEVEUT Supervised by Francis BACH

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

Some References P. Carpentier Master MMMEF Cours MNOS 2014-2015 263 / 263 Stochastic

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

Moderately exponential approximation Bridging the gap between exact computation and polynomial

6. Approximation and fitting Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao

Deep Approximation via Deep Learning Zuowei Shen Department of Mathematics National University

ECS 231 Lecture on Approximation and Error Analysis 1 / 9 Approximation and error analysis 1.

On Testing Marginal versus Conditional Independence Richard Guo ricguo@uw.edu Nov, 2019

x = , , P r=2 : R 1 = , 1 , ,

Bayesian Calibration of Simulators with Structured Discretization Uncertainty Oksana A. Chkrebtii

Comparison of semi-parametric reduced bias quantile estimators Maria Ivette Gomes (CEAUL and

Rare decays at LHCb: looking for new physics in b s + - transitions Luca

Reproducible Research, Replicability, and Ethical Practice Ronald A. Thisted Departments of

Advanced Machine Learning CS 7140 - Spring 2018 Lecture 13: Project Discussion Jan-Willem van de

High Performance Data Intensive Computing Dongfang Zhao, Assistant Professor Department of