Evaluating the Population Size Adaptation Mechanism for CMA-ES on - PowerPoint PPT Presentation

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Evaluating the Population Size Adaptation Mechanism for CMA-ES on the BBOB Noiseless Testbed Kouhei Nishida 1 Youhei Akimoto 1 1 Shinshu University, Japan 1 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Introduction: CMA-ES ◮ The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a stochastic search algorithm using the multivariate normal distribution. 1. Generate candidate solutions ( x ( t ) i ) i = 1 , 2 ,...,λ from N ( m ( t ) , C ( t ) ) . 2. Evaluate f ( x ( t ) i ) and sort them, f ( x 1: λ ) < · · · < f ( x λ : λ ) . 3. Update the distribution parameters θ ( t ) = ( m ( t ) , C ( t ) ) using the ranking of candidate solutions. ◮ The CMA-ES has the default value for all strategy parameters (such as the population size λ , the learning rate η c ). ◮ A larger population size than the default value improves its performance on following scenarios. 1. Well-structured multimodal function 2. Noisy function ◮ It can be easily very expensive to tune the population size in advance. 2 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Introduction: Population Size Adaptation ◮ As a measure for the adaptation, we consider the randomness of the parameter update. ◮ To quantify the randomness of the parameter update, we introduce the evolution path in the parameter space. ◮ To keep the randomness of the parameter update in a certain level, the population size is adapted online. Advantage of adapting the population size online: ◮ It doesn’t require tuning of the population size in advance. ◮ On rugged function, it may accelerate the search by reducing the population size after converging in a basin of a local minimum. 3 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Rank- µ update CMA-ES ◮ The rank- µ update CMA-ES, which is a component of the CMA-ES, repeats the following procedure. 1. Generate candidate solutions ( x ( t ) i ) i = 1 , 2 ,...,λ from N ( m ( t ) , C ( t ) ) . 2. Evaluate f ( x ( t ) i ) and sort them, f ( x 1: λ ) < · · · < f ( x λ : λ ) . 3. Update the distribution parameters θ ( t ) = ( m ( t ) , C ( t ) ) using the ranking of candidate solutions. θ ( t + 1 ) = θ ( t ) + ∆ θ ( t ) λ � w i ( x ( t ) ∆ m ( t ) = η m i : λ − m ( t ) ) , i λ i : λ − m ( t ) ) T − C ( t ) ) � ∆ C ( t ) = η c w i (( x ( t ) i : λ − m ( t ) )( x ( t ) i 4 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Population Size Adaptation: Measurement To quantify the randomness of the parameter update, we introduce the evolution path in the space Θ of the distribution parameter θ = ( m , C ) . p ( t + 1 ) = ( 1 − β ) p ( t ) + � β ( 2 − β ) ∆ θ ( t ) The evolution path accumulates the successive steps in the parameter space Θ . (a) less tendency (b) strong tendency Figure: An image of the evolution path 5 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Population Size Adaptation: Measurement ◮ We measure the length of the evolution path based on the KL-divergence. � p � 2 θ = p T I ( θ ) p ≈ KL ( θ � θ + p ) The KL-divergence measures the defference between two probability distributions. ◮ We measure the randomness of the parameter update by the ratio between θ and its expected value γ ( t + 1 ) ≈ E [ � p ( t + 1 ) � 2 � p ( t + 1 ) � 2 θ ] under a random function. λ m + d ( d + 1 ) � γ ( t + 1 ) = ( 1 − β ) 2 γ ( t ) + β ( 2 − β ) w 2 i ( d η 2 η 2 c ) 2 i ◮ Two important cases � p � 2 ◮ a random function: θ ≈ 1 γ � p � 2 ◮ too large λ : θ → ∞ γ 6 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Population Size Adaptation: Adaptation � p ( t + 1 ) � 2 θ ( t ) ◮ If < α , regarding the update as inaccurate, the population size γ ( t + 1 ) is increased with � � � p ( t + 1 ) � 2 λ ( t ) exp � β � � � θ ( t ) λ ( t + 1 ) = ∨ λ ( t ) + 1 . α − � � � � γ ( t + 1 ) � p ( t + 1 ) � 2 θ ( t ) ◮ If > α , regarding the update as sufficiently accurate, the γ ( t + 1 ) population size is decreased with � � � p ( t + 1 ) � 2 λ ( t ) exp � β � � � θ ( t ) λ ( t + 1 ) = α − ∨ λ min . � � � � γ ( t + 1 ) 7 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Algorithm Variant We use the default setting for most of parameters. The modified parameters are the learning rate for the mean vector, c m , and the threshold α to decide whether the parameter update is considered accurate or not. √ PSAaLmC α = 2 , c m = 0 . 1 √ PSAaLmD α = 2 , c m = 1 / D PSAaSmC α = 1 . 1 , c m = 0 . 1 PSAaSmD α = 1 . 1 , c m = 1 / D ◮ The greater α is, the greater the population size tends to be kept ◮ From our preliminaly study, we set c c = √ 2 / ( D + 1 ) c m . 8 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Restart Strategy For each (re-)start of the algorithm, we initialize the mean vector m ∼ U [ − 4 , 4] D and the covariance matrix C = 2 2 I . The maximum #f-call is set to 10 5 D . Termination conditions tolf: median(fiqr_hist) < 10 - 12abs(median(fmin_hist)) ◮ the objective function value differences are too small to sort them without being affected by numerical errors. tolx: median(xiqr_hist) < 10 - 12min(abs(xmed_hist)) ◮ the coordinate value differences are too small to update parameters without being affected by numerical errors. maxcond: cond(C) > 10 14 ◮ the matrix operations using C are not reliable due to numerical errors. maxeval: #f-call ≥ 5 × 10 4 D (for noiseless) or 10 5 D (for noisy) 9 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion BIPOP-CMA-ES BIPOP restart strategy: A restart strategy with two budgets of function evaluations. ◮ one is for incremental population size. ◮ to tackle well-structured multimodal functions or noisy functions ◮ the other is for relatively small population size and a relatively small step-size. ◮ to tackle weakly-structured multimodal functions 10 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Noiseless: Unimodal Function 1 Sphere 5 Linear slope 4 4 3 3 2 2 PSAaLmC PSAaLmD 1 1 PSAaSmC PSAaSmD BIPOP-CMA-ES 0 0 target Df: 1e-8 target Df: 1e-8 2 3 5 10 20 40 2 3 5 10 20 40 7 Step-ellipsoid 8 Rosenbrock original 4 6 5 3 4 2 3 2 1 1 0 0 target Df: 1e-8 target Df: 1e-8 2 3 5 10 20 40 2 3 5 10 20 40 The aRT is higher for most of the unimodal functions than the best 2009 portfolio due to lack of the step-size adaptation. On Step-ellipsoid function, where the step-size adaptaiton is less important, our algorithm performs well. 11 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Noiseless: Well-structured Multimodal Function 15 Rastrigin 17 Schaffer F7, condition 10 6 5 5 4 4 3 3 2 2 1 1 0 0 target Df: 1e-8 target Df: 1e-8 2 3 5 10 20 40 2 3 5 10 20 40 18 Schaffer F7, condition 1000 19 Griewank-Rosenbrock F8F2 5 7 6 4 5 3 4 3 2 2 1 1 0 0 target Df: 1e-8 target Df: 1e-8 2 3 5 10 20 40 2 3 5 10 20 40 The performance of the tested algorithms is similar to the performance of the BIPOP-CMA-ES without the step-size adaptation. Especially on Griewank-Rosenbrock, the tested algorithm is partly better than the best 2009 portfolio. 12 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Noiseless: Weakly-structured Multimodal Function 1.0 1.0 f20-f24,5-D best 2009 f20-f24,20-D best 2009 Proportion of function+target pairs Proportion of function+target pairs 0.8 0.8 BIPOP-CMA BIPOP-CMA 0.6 0.6 PSAaLmD PSAaSmD 0.4 0.4 PSAaLmC PSAaSmC 0.2 PSAaSmC 0.2 PSAaLmC PSAaSmD PSAaLmD 0.0 0.0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 log10 of (# f-evals / dimension) log10 of (# f-evals / dimension) The BIPOP-CMA-ES performs better than the tested algorithm because the tested algorithms doesn’t have the mechanism to tackle weakly-structure. 13 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Noiseless: Comparing the variants 1.0 PSAaSmC f15-f19,10-D Proportion of function+target pairs 0.8 best 2009 0.6 BIPOP-CMA 0.4 PSAaSmD PSAaLmD 0.2 PSAaLmC 0.0 0 1 2 3 4 5 6 7 8 log10 of (# f-evals / dimension) √ Variants with α = 1 . 1 are better than ones with α = 2. 14 / 23

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Noiseless Summary ◮ On Well-structured multimodal function, the tested algorithm performs well without the step-size adaptaiton. ◮ For lack of the step-size adaptation, the aRT is higher for most of the unimodal functions and the than the best 2009 portfolio. ◮ When the step-size is less important, the tested algorithm performs well. √ ◮ Variants with α = 1 . 1 tends to be better than ones with α = 2 15 / 23

Evaluating the Population Size Adaptation Mechanism for CMA-ES on - PowerPoint PPT Presentation

Introduction Algorithm Discription Noiseless Testbed Noisy Testbed Conclusion Evaluating the Population Size Adaptation Mechanism for CMA-ES on the BBOB Noiseless Testbed Kouhei Nishida 1 Youhei Akimoto 1 1 Shinshu University, Japan 1 / 23

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Vickery-Clark-Groves Mechanism Maria Serna Fall 2016 AGT-MIRI VCG mechanism Selling one item

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Recent Advances and Techniques in Algorithmic Mechanism Design Part 2: Bayesian Mechanism Design

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Lumber Size Lumber Size Control Control Studies Studies Lumber Size Control Lumber Size

Lab 2 discussion Last Time Debugging Its a science use experiments to refine

World Population Trends January 26, 2012 World Population Trends World Population Growth

Handling array size limitations Handling array size limitations Issue: array size is fixed

Innovative Climate Financing for Adaptation Mainstreaming Adaptation Financing in Development

Climate Adaptation Intro and Workshop Overview Paul Moss MPCA Adaptation/Mitigation

IUCN Ecosystem based approaches to adaptation and risk reduction and risk reduction 1. What is

Biodiversity, Ecosystem Services and Adaptation and Adaptation Dr Pushpam Kumar Associate

Learning Conditional Distributions using Mixtures of Truncated Basis Functions Inmaculada

important notion of probability theory What is Pearsons correlation? Sample: (X k ,Y k )

Dynamic Thresholds and a Summary ROC Curve: Assessing the Prognostic Accuracy of Longitudinal

Retaining through Training, Even for Older Workers Matteo Picchio CentER, ReflecT, Tilburg

Numerical Optimization Biostatistics 615/815 Lecture 17: . . . . . . . Summary .

Stochastic Computing by Stochastic Computing by a New Polynomial a New Polynomial Dimensional

Sub-seasonal and seasonal forecast verification Young Scientists School, CITES 2019 Debbie

Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van