m c
play

m , , C ? 1. Adapting the mean m 2. Adapting the step-size 3. - PowerPoint PPT Presentation

How to update the di ff erent parameters m , , C ? 1. Adapting the mean m 2. Adapting the step-size 3. Adapting the covariance matrix C Why Step-size Adaptation? Assume a (1+1)-ES algorithm with fi xed step-size (and n x 2 i


  1. How to update the di ff erent parameters m , σ , C ? 1. Adapting the mean m 2. Adapting the step-size σ 3. Adapting the covariance matrix C

  2. Why Step-size Adaptation? Assume a (1+1)-ES algorithm with fi xed step-size (and σ n ∑ x 2 i = ∥ x ∥ 2 ) optimizing the function . C = I d f ( x ) = i =1 Initialize m , σ While (stopping criterion not met) sample new solution: What will happen if you x ← m + σ 𝒪 (0, I d ) look at the convergence if f ( x ) ≤ f ( m ) of f ( m )? m ← x

  3. Why Step-size Adaptation? red curve: (1+1)-ES with optimal step-size (see later) σ = 10 − 3 green curve: (1+1)-ES with constant step-size ( )

  4. Why Step-size Adaptation? We need step-size adaptation to approach the optimum fast (converge linearly) red curve: (1+1)-ES with optimal step-size (see later) σ = 10 − 3 green curve: (1+1)-ES with constant step-size ( )

  5. Methods for Step-size Adaptation 1/5th success rule, typically applied with “+” selection [Rechenberg, 73][Schumer and Steiglitz, 78][Devroye, 72] [Schwefel, 81] -self adaptation, applied with “,” selection σ random variation is applied to the step-size and the better one, according to the objective function value, is selected path-length control or Cumulative step-size adaptation (CSA), applied with “,” selection [Ostermeier et al. 84][Hansen, Ostermeier, 2001] two-point adaptation (TPA), applied with “,” selection [Hansen 2008] test two solutions in the direction of the mean shift, increase or decrease accordingly the step-size

  6. Step-size control: 1/5th Success Rule

  7. Step-size control: 1/5th Success Rule

  8. Step-size control: 1/5th Success Rule probability of success per iteration: ps = #candidate solutions better than m #candidate solutions [ f ( x ) ≤ f ( m )]

  9. (1+1)-ES with One- fi fth Success Rule - Convergence

  10. Path Length Control - Cumulative Step-size Adaptation (CSA) step-size adaptation used in the -ES algorithm framework (in ( μ / μ w , λ ) CMA-ES in particular) Main Idea:

  11. CSA-ES The Equations

  12. Convergence of -CSA-ES ( μ / μ w , λ ) 2x11 runs

  13. Convergence of -CSA-ES ( μ / μ w , λ ) σ 0 = 10 − 2 Note: initial step-size taken too small ( ) to illustrate the step-size adaptation

  14. Convergence of -CSA-ES ( μ / μ w , λ )

  15. Optimal Step-size - Lower-bound for Convergence Rates In the previous slides we have displayed some runs with “optimal” step-size. Optimal step-size relates to step-size proportional to the distance to σ t = σ ∥ x − x ⋆ ∥ x ⋆ the optimum: where is the optimum of the optimized function (with properly chosen). σ The associated algorithm is not a real algorithm (as it needs to know the distance to the optimum) but it gives bounds on convergence rates and allows to compute many important quantities. The goal for a step-size adaptive algorithm is to achieve convergence rates close to the one with optimal step-size

  16. We will formalize this in the context of the (1+1)-ES. Similar results can be obtained for other algorithm frameworks.

  17. Optimal Step-size - Bound on Convergence Rate - (1+1)-ES Consider a (1+1)-ES algorithm with any step-size adaptation mechanism: X t +1 = { X t + σ t 𝒪 t +1 if f ( X t + σ t 𝒪 t +1 ) ≤ f ( X t ) X t otherwise with i.i.d. { 𝒪 t , t ≥ 1} ∼ 𝒪 (0, I d ) equivalent writing: X t +1 = X t + σ t 𝒪 t +1 1 { f ( X t + σ t 𝒪 t +1 ) ≤ f ( X t )}

  18. Bound on Convergence Rate - (1+1)-ES f : ℝ n → ℝ Theorem: For any objective function , for any y ⋆ ∈ ℝ n E [ ∥ X t +1 − y ⋆ ∥ ] ≥ E [ ∥ X t − y ⋆ ∥ ] − τ lower bound σ ∈ℝ > E [ln − ∥ e 1 + σ 𝒪∥ ] where with τ = max e 1 = (1,0,…,0) =: φ ( σ ) Theorem: The convergence rate lower-bound is reached on f ( x ) = g ( ∥ x − x ⋆ ∥ ) spherical functions (with strictly g : ℝ ≥ 0 → ℝ increasing) and step-size proportional to the distance to the σ t = σ opt ∥ x − x ⋆ ∥ optimum with such that . σ opt φ ( σ opt ) = τ

  19. Log-Linear Convergence of scale-invariance step-size ES Theorem: The (1+1)-ES with step-size proportional to the distance to the optimum converges (log)-linearly σ t = σ ∥ x ∥ on the sphere function almost surely: f ( x ) = g ( ∥ x ∥ ) t ln ∥ X t ∥ 1 ∥ X 0 ∥ t →∞ − φ ( σ ) =: CR (1+1) ( σ )

  20. Asymptotic Results ( n → ∞ )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend