Benchmarking the PSA-CMA-ES on the BBOB Noiseless Testbed Kouhei - - PowerPoint PPT Presentation

benchmarking the psa cma es on the bbob noiseless testbed
SMART_READER_LITE
LIVE PREVIEW

Benchmarking the PSA-CMA-ES on the BBOB Noiseless Testbed Kouhei - - PowerPoint PPT Presentation

1 Benchmarking the PSA-CMA-ES on the BBOB Noiseless Testbed Kouhei Nishida, Youhei Akimoto Shinshu University, University of Tsukuba 2 CMA-ES It maintains a multivariate normal distribution ( m , ) = 2 C Step1 Sample


slide-1
SLIDE 1

Benchmarking the PSA-CMA-ES

  • n the BBOB Noiseless Testbed

Kouhei Nishida, Youhei Akimoto Shinshu University, University of Tsukuba

1

slide-2
SLIDE 2
  • It maintains a multivariate normal distribution
  • All of its hyper-parameters have their default values
  • The population size needs tuning

if the objective function is a noisy or multimodal function [Hansen 2004]

Step1 Sample Step2 Rank Step3 Estimate Step4 Update Step1 Sample Step2 Rank Step3 Estimate Step4 Update Step1 Sample Step2 Rank Step3 Estimate Step4 Update Step1 Sample Step2 Rank Step3 Estimate Step4 Update i.e. the learning rate, the population size

CMA-ES

2

𝒪(m, Σ)

m σ C

: mean vector : step-size : covariance matrix

Σ = σ2C

Step1 Sample Step2 Rank Step3 Estimate Step4 Update

1 2 3 4 5 6 Population Size

slide-3
SLIDE 3

3

CMA-ES: Population Size Tuning

Approach to Avoid Tuning by Users BIPOP-CMA-ES

  • To utilize a multi-run strategy with different population sizes
  • To adapt the population size

First run: CMA-ES with the default population size Additional runs:

  • CMA-ES with an increased population size
  • CMA-ES with a relatively small step-size and population size

→ well-structured multimodal or noisy functions → weakly-structured multimodal functions

CMA-ES

→ unimodal functions

slide-4
SLIDE 4

4

PSA-CMA-ES [Nishida2018, Thursday 19, ENUM4]

  • Based on tendency of the parameter update

On multimodal functions and noisy functions, the parameter update has less tendency than on noiseless unimodal functions.

Key Observation

CMA-ES: Population Size Tuning CMA-ES

PSA Approach to Avoid Tuning by Users

  • To utilize a multi-run strategy with different population sizes
  • To adapt the population size
slide-5
SLIDE 5
  • Based on tendency of the parameter update

In the parameter space of the sampling distribution…

Population Size Adaptation

5

P S A

On multimodal functions and noisy functions, the parameter update has less tendency than on noiseless unimodal functions.

Key Observation

𝒪(m(t+1), Σ(t+1)) 𝒪(m(t), Σ(t))

time: t → t + 1

Δθ

θ = [m, Σ]

Update step

slide-6
SLIDE 6
  • Based on tendency of the parameter update

In the parameter space of the sampling distribution…

6

On

  • noiseless unimodal function

On multimodal functions and noisy functions, the parameter update has less tendency than on noiseless unimodal functions.

Key Observation

Population Size Adaptation P S A

slide-7
SLIDE 7
  • Based on tendency of the parameter update

In the parameter space of the sampling distribution…

7

  • multimodal functions
  • noisy functions

On

On multimodal functions and noisy functions, the parameter update has less tendency than on noiseless unimodal functions.

Key Observation

Population Size Adaptation P S A

slide-8
SLIDE 8

PSA: Evolution Path

  • It accumulates steps in the parameter space

8

p(t+1)

θ

← (1 − β) p(t)

θ +

β (2 − β) ℐ

1 2

θ(t)Δθ(t+1)

𝔽[∥ℐ

1 2

θ(t)Δθ(t+1)∥2]

P A S :

normalization factor

→ To absorb the effect of…

  • Parameterization of the sampling distribution
  • Change of the population size

: cumulation factor : Fisher information matrix under : expectation under a random function

β ℐθ 𝔽[ ⋅ ] θ f(x) = ϵ

under a random function

∥pθ∥2 ≈ 1 ∥pθ∥2 ≫ 1

when is too large λ

λ: population size

slide-9
SLIDE 9

PSA: Population Size Update

9

P A S :

λ(t+1) ← λ(t) exp β (γ(t+1) − ∥p(t+1)

θ

∥2 α )

α : threshold : normalization factor γ(t) ≈ 1 (t ≫ 1)

γ(t+1) ← (1 − β)2γ(t) + β(2 − β)

∥pθ∥2 < α ⇒ ∥pθ∥2 > α ⇒ The population size increases The population size decreases → the population size is adapted so that the parameter update has sufficient tendency

slide-10
SLIDE 10

PSA: Step-size Correction

  • Based on the quality gain analysis [Akimoto 2017]
  • A practical step-size adaptation in the CMA-ES

usually well follows the optimal value [Krause 2017]

  • It implies that the step-size is increased

when the population size increases, and vice versa.

  • The step-size adaptation is corrupted

by the population size adaptation.

10

P A S :

σ*(λ) = c(λ) ⋅ n ⋅ μw(λ) n − 1 + c(λ)2 ⋅ μw(λ)

σ(t+1) ← σ(t+1) ⋅ σ*(λ(t+1)) σ*(λ(t))

c(λ) = − ∑λ

i=1 𝔽[𝒪i:λ]

The optimal step-size depends on the population size

After updating the population size…

slide-11
SLIDE 11

PSA-CMA-ES

11

P A S

Step1 Sample Step2 Rank Step3 Estimate Step4 Update

Δθ = [Δm, ΔΣ] Δm = m(t+1) − m(t) ΔΣ = (σ(t+1))2C(t+1) − (σ(t))2C(t)

A step in the parameter space

p(t+1)

θ

← (1 − β) p(t)

θ

+ β (2 − β) ℐ

1 2

θ(t)Δθ(t+1)

𝔽[∥ℐ

1 2

θ(t)Δθ(t+1)∥2]

λ(t+1) ← λ(t) exp β (γ(t+1) − ∥p(t+1)

θ

∥2 α )

  • 1. An iteration of CMA-ES
  • 2. Update the evolution path

and the population size

  • 3. Correct the step-size

σ(t+1) ← σ*(λ(t+1)) σ*(λ(t)) σ(t+1)

Step1 Sample Step2 Rank Step3 Estimate Step4 Update Step1 Sample Step2 Rank Step3 Estimate Step4 Update

1 2 3 4 5 6

Step1 Sample Step2 Rank Step3 Estimate Step4 Update Step1 Sample Step2 Rank Step3 Estimate Step4 Update

𝒪(m(t), (σ(t))2C(t)) 𝒪(m(t+1), (σ(t+1))2C(t+1))

slide-12
SLIDE 12

PSA-CMA-ES Restart Strategy for PSA-CMA-ES

12

Additional runs: PSA-CMA-ES with a relatively small step-size Simple Restart First run: CMA-ES with the default population size (σ(0) = 2) Second run: PSA-CMA-ES (σ(0) = 2) σ(0) = 2 ⋅ 10−2⋅𝒱[0,1] All runs: PSA-CMA-ES (σ(0) = 2, λmax = ∞) λmax = 29 ⋅ λdef

Max population size

P A S

→ well-structured multimodal → weakly-structured multimodal functions → unimodal functions

slide-13
SLIDE 13

PSA: PSA-CMA-ES with the simple restart PSAwRS: PSA-CMA-ES with the proposed restart strategy BIPOP: BIPOP-CMA-ES [Hansen 2009]

  • Initialization:
  • Termination:
  • The target function value is reached
  • The number of evaluation is over
  • One of the termination conditions [Hansen 2009] is satisfied

Simulation

13

m(0) ∼ 𝒱[4,4)D (D ) : problem dimension 106 ⋅ D

Algorithm Variants Common Setting

slide-14
SLIDE 14

Overall Performance (f1-f24)

14

PSA PSAwRS BIPOP

5D 10D 20D 40D

slide-15
SLIDE 15

Unimodal Functions

15

λ fbest

Number of Iteration

PSA PSAwRS BIPOP

λdefault

20D λ = 102

slide-16
SLIDE 16

PSA PSAwRS BIPOP

Unimodal Functions

16

λmedian λdefault

101 102 2 3 5 10 20 40 Dimension 1 Sphere

slide-17
SLIDE 17

Well-structured Multimodal Functions

17

λ fbest

PSA PSAwRS BIPOP

Number of Iteration

20D

λdefault

λ = 102

slide-18
SLIDE 18

Repetitive Multimodal Functions

18

λ fbest

PSA PSAwRS BIPOP

Number of Iteration

λdefault

λ = 105 20D (σ(0) = 2) λ = 102

slide-19
SLIDE 19

Repetitive Multimodal Functions

19

λ fbest

PSA PSAwRS BIPOP

Number of Iteration

λdefault

20D λ = 102 (σ(0) = 2/100)

slide-20
SLIDE 20

Summary

20

  • PSA-CMA-ESwRS is comparable with BIPOP-CMA-ES.

On unimodal functions

  • PSA-CMA-ES performs worse as dimension gets greater.

On well-structured multimodal functions

  • PSA-CMA-ES works better than BIPOP-CMA-ES.

On repetitive multimodal functions

  • An initial step-size is important to avoid inefficient increase
  • f the population size.

Future Work

  • To investigate the hyper-parameter setting