Surrogate models for Single and Multi-Objective Stochastic - - PowerPoint PPT Presentation

surrogate models for single and multi objective
SMART_READER_LITE
LIVE PREVIEW

Surrogate models for Single and Multi-Objective Stochastic - - PowerPoint PPT Presentation

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Surrogate models for Single and Multi-Objective Stochastic


slide-1
SLIDE 1

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag TAO CNRS − INRIA − Univ. Paris-Sud May 23rd, 2011

Michèle Sebag Surrogate optimization: SVM for CMA 1/ 47

slide-2
SLIDE 2

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization

Motivations

Find Argmin {F : X → I R} Context: ill-posed optimization problems continuous Function F (fitness function) on X ⊂ I Rd Gradient not available or not useful F available as an oracle (black box) Build {x1, x2, . . .} → Argmin(F) Black-box approaches + Applicable + Robust

comparison-based approaches are invariant

− High computational costs: number of function evaluations

Michèle Sebag Surrogate optimization: SVM for CMA 2/ 47

slide-3
SLIDE 3

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization

Surrogate optimization

Principle Gather E = {(xi, F(xi))} training set Build ˆ F from E learn surrogate model Use surrogate model ˆ F for some time:

Optimization: use ˆ F instead of true F in std algo Filtering: select promising xi based on ˆ F in population-based algo.

Compute F(xi) for some xi Update ˆ F Iterate

Michèle Sebag Surrogate optimization: SVM for CMA 3/ 47

slide-4
SLIDE 4

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization

Surrogate optimization, cont

Issues Learning

Hypothesis space (polynoms, neural nets, Gaussian processes,...) Selection of training set (prune, update, ...) What is the learning target ?

Interaction of Learning & Optimization modules

Schedule (when to relearn) ∗ How to use ˆ F to support optimization search ∗∗ How to use search results to support learning ˆ F

This talk ∗ Using Covariance-Matrix Estimation within Support Vector Machines ∗∗ Using SVM for multi-objective optimization

Michèle Sebag Surrogate optimization: SVM for CMA 4/ 47

slide-5
SLIDE 5

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization

Content

1

Covariance Matrix Adaptation-Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

2

Support Vector Machines Statistical Machine Learning Linear classifiers The kernel trick

3

Comparison-Based Surrogate Model for CMA-ES Previous Work Mixing Rank-SVM and Local Information Experiments

4

Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Michèle Sebag Surrogate optimization: SVM for CMA 5/ 47

slide-6
SLIDE 6

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Stochastic Search

A black box search template to minimize f : Rn → R Initialize distribution parameters θ, set sample size λ ∈ N While not terminate

1

Sample distribution P (x|θ) → x1, . . . , xλ ∈ Rn

2

Evaluate x1, . . . , xλ on f

3

Update parameters θ ← Fθ(θ, x1, . . . , xλ, f(x1), . . . , f(xλ)) Covers Deterministic algorithms, Evolutionary Algorithms, PSO, DE

P implicitly defined by the variation

  • perators

Estimation of Distribution Algorithms

Michèle Sebag Surrogate optimization: SVM for CMA 6/ 47

slide-7
SLIDE 7

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

The (µ, λ)−Evolution Strategy

Gaussian Mutations xi ∼ m + σ Ni(0, C) for i = 1, . . . , λ

as perturbations of m where xi, m ∈ Rn, σ ∈ R+, and C ∈ Rn×n

where the mean vector m ∈ Rn represents the favorite solution the so-called step-size σ ∈ R+ controls the step length the covariance matrix C ∈ Rn×n determines the shape of the distribution ellipsoid How to update m, σ, and C?

Michèle Sebag Surrogate optimization: SVM for CMA 7/ 47

slide-8
SLIDE 8

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

History

The one-fifth rule

Rechenberg, 73

One single parameter σ for the whole population Measure empirical success rate Increase σ if too large, decrease σ if too small Often wrong in non-smooth landscapes Self-adaptive mutations

Schwefel, 81

Each individual carries its own mutation parameter

from 1 to n2−n

2

Log-normal mutation of mutation parameters (Normal) mutation of individual Adaptation is slow for full covariance case

Michèle Sebag Surrogate optimization: SVM for CMA 8/ 47

slide-9
SLIDE 9

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Cumulative Step-Size Adaptation (CSA)

xi = m + σ yi m ← m + σyw Measure the length of the evolution path the pathway of the mean vector m in the generation sequence ↓ decrease σ ↓ increase σ loosely speaking steps are perpendicular under random selection (in expectation) perpendicular in the desired situation (to be most efficient)

Michèle Sebag Surrogate optimization: SVM for CMA 9/ 47

slide-10
SLIDE 10

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Covariance Matrix Adaptation

Rank-One Update

m ← m + σyw, yw = µ

i=1 wi yi:λ,

yi ∼ Ni(0, C) initial distribution, C = I new distribution: C ← 0.8 × C + 0.2 × ywyT

w

ruling principle: the adaptation increases the probability of successful steps, yw, to appear again

Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

slide-11
SLIDE 11

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Covariance Matrix Adaptation

Rank-One Update

m ← m + σyw, yw = µ

i=1 wi yi:λ,

yi ∼ Ni(0, C) yw, movement of the population mean m (disregarding σ) new distribution: C ← 0.8 × C + 0.2 × ywyT

w

ruling principle: the adaptation increases the probability of successful steps, yw, to appear again

Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

slide-12
SLIDE 12

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Covariance Matrix Adaptation

Rank-One Update

m ← m + σyw, yw = µ

i=1 wi yi:λ,

yi ∼ Ni(0, C) mixture of distribution C and step yw, C ← 0.8 × C + 0.2 × ywyT

w

new distribution: C ← 0.8 × C + 0.2 × ywyT

w

ruling principle: the adaptation increases the probability of successful steps, yw, to appear again

Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

slide-13
SLIDE 13

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Covariance Matrix Adaptation

Rank-One Update

m ← m + σyw, yw = µ

i=1 wi yi:λ,

yi ∼ Ni(0, C) new distribution (disregarding σ) new distribution: C ← 0.8 × C + 0.2 × ywyT

w

ruling principle: the adaptation increases the probability of successful steps, yw, to appear again

Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

slide-14
SLIDE 14

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Covariance Matrix Adaptation

Rank-One Update

m ← m + σyw, yw = µ

i=1 wi yi:λ,

yi ∼ Ni(0, C) movement of the population mean m new distribution: C ← 0.8 × C + 0.2 × ywyT

w

ruling principle: the adaptation increases the probability of successful steps, yw, to appear again

Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

slide-15
SLIDE 15

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Covariance Matrix Adaptation

Rank-One Update

m ← m + σyw, yw = µ

i=1 wi yi:λ,

yi ∼ Ni(0, C) mixture of distribution C and step yw, C ← 0.8 × C + 0.2 × ywyT

w

new distribution: C ← 0.8 × C + 0.2 × ywyT

w

ruling principle: the adaptation increases the probability of successful steps, yw, to appear again

Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

slide-16
SLIDE 16

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Covariance Matrix Adaptation

Rank-One Update

m ← m + σyw, yw = µ

i=1 wi yi:λ,

yi ∼ Ni(0, C) new distribution: C ← 0.8 × C + 0.2 × ywyT

w

ruling principle: the adaptation increases the probability of successful steps, yw, to appear again

Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

slide-17
SLIDE 17

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Rank-µ Update

xi = m + σ yi, yi ∼ Ni(0, C) , m ← m + σyw yw = µ

i=1 wi yi:λ xi = m + σ yi, yi ∼ N (0, C)

sampling of λ = 150 solutions where C = I and σ = 1

Cµ = 1 µ yi:λyT i:λ C ← (1 − 1) × C + 1 × Cµ

calculating C from µ = 50 points, w1 = · · · = wµ = 1

µ

mnew ← m + 1 µ yi:λ

new distribution

Remark: the old (sample) distribution shape has a great influence on the new distribution − → iterations needed

Michèle Sebag Surrogate optimization: SVM for CMA 11/ 47

slide-18
SLIDE 18

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

Invariance: Guarantee for Generalization

Invariance properties of CMA-ES Invariance to order preserving transformations in function space

like all comparison-based algorithms

Translation and rotation invariance

to rigid transformations of the search space

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 −3 −2 −1 1 2 3

CMA-ES is almost parameterless Tuning of a small set of functions

Hansen & Ostermeier 2001

Default values generalize to whole classes Exception: population size for multi-modal functions

but try the Restart-CMA-ES Auger & Hansen, 2005

Michèle Sebag Surrogate optimization: SVM for CMA 12/ 47

slide-19
SLIDE 19

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

State-of-the-art Results

BBOB – Black-Box Optimization Benchmarking ACM-GECCO workshop, in 2009 and 2010 Set of 25 benchmark functions, dimensions 2 to 40 With known difficulties (ill-conditioning, non-separability, . . . ) Noisy and non-noisy versions Competitors include BFGS (Matlab version), Fletcher-Powell, DFO (Derivative-Free Optimization, Powell 04) Differential Evolution Particle Swarm Optimization and many more

Michèle Sebag Surrogate optimization: SVM for CMA 13/ 47

slide-20
SLIDE 20

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Contents

1

Covariance Matrix Adaptation-Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

2

Support Vector Machines Statistical Machine Learning Linear classifiers The kernel trick

3

Comparison-Based Surrogate Model for CMA-ES Previous Work Mixing Rank-SVM and Local Information Experiments

4

Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Michèle Sebag Surrogate optimization: SVM for CMA 14/ 47

slide-21
SLIDE 21

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Supervised Machine Learning

Context Universe → instance xi → Oracle ↓ yi Input: Training set E = {(xi, yi), i = 1 . . . n, xi ∈ X, yi ∈ Y} Output: Hypothesis h : X → Y Criterion: Quality of h

Michèle Sebag Surrogate optimization: SVM for CMA 15/ 47

slide-22
SLIDE 22

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Supervised Machine Learning, 2

Definitions E = {(xi, yi), xi ∈ X, yi ∈ Y, i = 1 . . . n}

Classification : Y finite failure/ok Regression : Y ⊆ I R time to failure

Hypothesis space H : X → Y Tasks Select H model selection Assess h ∈ H expected accuracy I E[h(x) = y] Find h∗ in H minimizing the error cost in expectation h∗ = Arg min {I E[ℓ(h(x) = y), h ∈ H}

Michèle Sebag Surrogate optimization: SVM for CMA 16/ 47

slide-23
SLIDE 23

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Dilemma

Fitting the data Bias variance tradeoff

Michèle Sebag Surrogate optimization: SVM for CMA 17/ 47

slide-24
SLIDE 24

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Statistical Machine Learning

Minimize expected loss Minimize I E[ℓ(h(x), y)] Principle If h is well-behaved on the training set, if the training set is “representative” and if h is “regular”, then h is well-behaved in expectation. E[F] ≤ n

i=1 F(xi)

n + c(F, n)

Michèle Sebag Surrogate optimization: SVM for CMA 18/ 47

slide-25
SLIDE 25

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Linear classification; the noiseless case

H : X ⊂ I Rd → I R prediction = sgn(h(x)) h(x) = < w, x > + b

L1

2 3

L L

b

w

Michèle Sebag Surrogate optimization: SVM for CMA 19/ 47

slide-26
SLIDE 26

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Linear classification; the noiseless case, 2

Example → Constraint yi( < w, xi > + b) ≥ margin ≥ 0 Maximize minimum margin 2/||w|| Formalisation

  • Minimize

1 2 ||w||2

subject to ∀ i, yi( < w, xi > + b) ≥ 1

w w 2/|| ||

  • 1

b b

+ 1

b

i

support vector

Michèle Sebag Surrogate optimization: SVM for CMA 20/ 47

slide-27
SLIDE 27

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Linear classification; the noiseless case, 3

Primal form Minimize

1 2 ||w||2

subject to ∀ i, yi( < w, xi > + b) ≥ 1 Using Lagrange multipliers: Minimize w,bmaxα

  • 1

2 ||w||2 −

n

  • i=1

αi[yi(w · xi − b) − 1]

  • Dual form

Maximizeα   

n

  • i=1

αi − 1 2

  • i,j

αiαjyiyjxi, xj    subject to αi ≥ 0, i = 1 . . . n Optimization: quadratic programming w =

  • αiyixi

Michèle Sebag Surrogate optimization: SVM for CMA 21/ 47

slide-28
SLIDE 28

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Linear classification; the noisy case

Allow constraint violation; consider slack variables Primal form

  • Minimize

1 2 ||w||2 + Cn i=1 ξi

subject to ∀ i, yi( < w, xi > + b) ≥ 1−ξi, 0 ≤ ξi Lagrange multipliers: Minimize w,b,ξmaxα,β

  • 1

2 ||w||2 + C

n

  • i=1

ξi −

n

  • i=1

αi[yi(w · xi − b) − 1 + ξi] −

n

  • i=1

βiξi

  • Dual form

Maximizeα   

n

  • i=1

αi − 1 2

  • i,j

αiαjyiyjxi, xj    subject to 0 ≤ αi ≤ C, i = 1 . . . n, αiyi = 0 Solution support vectors h(x) = w, x =

  • αiyixi, x

Michèle Sebag Surrogate optimization: SVM for CMA 22/ 47

slide-29
SLIDE 29

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

The kernel trick

Intuition X → Ω Φ : x = (x1, x2) → (x2

1,

√ 2x1.x2, x2

2)

Principle: choose Φ, K such that Φ(x), Φ(x′) = K(x, x′)

Michèle Sebag Surrogate optimization: SVM for CMA 23/ 47

slide-30
SLIDE 30

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

The kernel trick, 2

SVM only considers the scalar product h(x) =

i αiyixi, x

linear case h(x) =

i αiyiK(xi, x)

mboxkerneltrick PROS A rich hypothesis space No computational overhead: no explicit mapping on the feature space Open problem: kernel design

Michèle Sebag Surrogate optimization: SVM for CMA 24/ 47

slide-31
SLIDE 31

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

The kernel trick, 3

Kernels Polynomial: k(xi, xj) = (xi, xj + 1)d Gaussian or Radial Basis Function: k(xi, xj) = exp( xi−xj2

2σ2

) Hyperbolic tangent: k(xi, xj) = tanh(k xi, xj + c) Examples for Polynomial (left) and Gaussian (right) Kernels:

Michèle Sebag Surrogate optimization: SVM for CMA 25/ 47

slide-32
SLIDE 32

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Statistical Machine Learning Linear classifiers The kernel trick

Rank-based SVM

Learning to order things On training set E = {xi, i = 1 . . . n} expert gives preferences: (xik ≻ xjk), k = 1 . . . K underconstrained regression Order constraints Primal form

  • Minimize

1 2 ||w||2 + CK k=1 ξk

subject to ∀ k, w, xik − w, xjk ≥ 1 − ξk

Michèle Sebag Surrogate optimization: SVM for CMA 26/ 47

slide-33
SLIDE 33

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

Contents

1

Covariance Matrix Adaptation-Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

2

Support Vector Machines Statistical Machine Learning Linear classifiers The kernel trick

3

Comparison-Based Surrogate Model for CMA-ES Previous Work Mixing Rank-SVM and Local Information Experiments

4

Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Michèle Sebag Surrogate optimization: SVM for CMA 27/ 47

slide-34
SLIDE 34

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

Surrogate Models for CMA-ES

lmm-CMA-ES Build a full quadratic meta-model around current point Weighted by Mahalanobis distance from covariance matric Speed-up: a factor of 2-3 for n ≥ 4 Complexity: from O(n4) to O(n6) (intractable for n>16) Rank-invariance is lost

  • S. Kern et al. (2006). "Local Meta-Models for Optimization Using Evolution Strategies"
  • Z. Bouzarkouna et al. (2010). “Investigating the lmm-CMA-ES for Large Population Sizes“

Michèle Sebag Surrogate optimization: SVM for CMA 28/ 47

slide-35
SLIDE 35

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

Surrogate Models for CMA-ES, cont

Using Rank-SVM Builds a global model using Rank-SVM xi ≻ xj iff F(xi) < F(xj) Kernel and parameters highly problem-dependent Note: no use of information from current state of CMA

  • T. Runarsson (2006). "Ordinal Regression in Evolutionary Computation"

ACM Algorithm Use C from CMA-ES as Gaussian kernel

  • I. Loschilov et al. (2010). "Comparison-based optimizers need comparison-based surrogates”

Michèle Sebag Surrogate optimization: SVM for CMA 29/ 47

slide-36
SLIDE 36

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

Model Learning

Non-separable Ellipsoid problem

−1 −0.5 0.5 1 −1 −0.5 0.5 1 X1 X2 −1 −0.5 0.5 1 −1 −0.5 0.5 1 X1 X2

Rank-SVM regression in original coordinate system Rank-SVM regression in transformed coordinate system given by current covariance matrix C and mean m: x′ = C− 1

2 (x − m) Michèle Sebag Surrogate optimization: SVM for CMA 30/ 47

slide-37
SLIDE 37

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

Using the Surrogate Model

Optimization: Significant Speed-Up . . . if global and accurate model Filtering: "Guaranteed" Speed-Up

100 200 300 400 500 0.2 0.3 0.4 0.5 0.6

Rank Probability Density

~ Retain with rank , Prescreen (λ ) Evaluate (λ′ ) Retain with rank , λ ~λ

Michèle Sebag Surrogate optimization: SVM for CMA 31/ 47

slide-38
SLIDE 38

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

ACM-ES Optimization Loop

  • 4. Generate pre-children and rank them

according to surrogate fitness function.

  • 1
  • 0.5

0.5 1

  • 1
  • 0.5

0.5 1

X1 X2

  • 1
  • 0.5

0.5 1

  • 1
  • 0.5

0.5 1

X1 X2

A Select training points . B Build a surrogate model . C Generate pre-children . D Select most promising children .

  • 1. Select best training points.
  • 3. Build a surrogate model using Rank SVM.
  • 7. Add new

training points and update parameters of CMA-ES. λ′

k

2 [4] . The change of coordinates, defined from the current covariance matrix and the current mean value , reads :

Surrogate Model Rank-based

100 200 300 400 500 0.2 0.3 0.4 0.5 0.6

Rank Probability Density ~ Retain with rank , Prescreen (λ ) Evaluate (λ′ ) Retain with rank , λ ~ 5. 6. λ

Michèle Sebag Surrogate optimization: SVM for CMA 32/ 47

slide-39
SLIDE 39

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

Parameters

SVM Learning Number of training points: Ntraining = 30 √ d for all problems, except Rosenbrock and Rastrigin, where Ntraining = 70

  • (d)

Number of iterations: Niter = 50000 √ d Kernel function: RBF function with σ equal to the average distance of the training points The cost of constraint violation: Ci = 106(Ntraining − i)2.0 Offspring Selection Number of test points: Ntest = 500 Number of evaluated offsprings: λ′ = λ

3

Offspring selection pressure parameters: σ2

sel0 = 2σ2 sel1 = 0.8

Michèle Sebag Surrogate optimization: SVM for CMA 33/ 47

slide-40
SLIDE 40

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

Results

Speed-up

5 10 15 20 25 30 35 40 1 2 3 4 5

Problem Dimension Speedup

Schwefel Schwefel1/4 Rosenbrock Noisy Sphere Ackley 1 2 3 4 5 Ellipsoid Function n λ λ′ e ACM-ES spu CMA-ES Schwefel 10 10 3 801 36 3.3 2667 87 20 12 4 3531 179 2.0 7042 172 40 15 5 13440 281 1.7 22400 289 Schwefel 10 10 3 1774 37 4.1 7220 206 20 12 4 6138 82 2.5 15600 294 40 15 5 22658 390 1.8 41534 466 Rosenbrock 10 10 3 2059 143

(0.95) 3.7

7669 691

(0.90)

20 12 4 11793 574

(0.75) 1.8

21794 1529 40 15 5 49750 2412 (0.9) 1.6 82043 3991 NoisySphere 10 10 3 0.15 766 90

(0.95) 2.7

2058 148 20 12 4 0.11 1361 212 2.8 3777 127 40 15 5 0.08 2409 120 2.9 7023 173 Ackley 10 10 3 892 28 4.1 3641 154 20 12 4 1884 50 3.5 6641 108 40 15 5 3690 80 3.3 12084 247 Ellipsoid 10 10 3 1628 95 3.8 6211 264 20 12 4 8250 393 2.3 19060 501 40 15 5 33602 548 2.1 69642 644 Rastrigin 5 140 70 23293 1374 (0.3) 0.5 12310 1098

(0.75) 1/4

Michèle Sebag Surrogate optimization: SVM for CMA 34/ 47

slide-41
SLIDE 41

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

Results

Learning Time

10 10

1

10

2

10

3

10

1

10

2

10

3

10

Problem Dimension Learning time (ms)

Slope=1.13

Cost of model learning/testing increases quasi-linearly with

  • n Sphere function:

d

Michèle Sebag Surrogate optimization: SVM for CMA 35/ 47

slide-42
SLIDE 42

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Previous Work Mixing Rank-SVM and Local Information Experiments

ACM-ES: conclusion

ACM-ES From 2 to 4 times faster on Uni-Modal Problems Invariance to rank-preserving transformations preserved Computation complexity is O(n) Available online at

http://www.lri.fr/~ilya/publications/ACMESppsn2010.zip

Open Issues Extention to multi-modal optimization On-line adaptation of selection pressure and surrogate model complexity

Michèle Sebag Surrogate optimization: SVM for CMA 36/ 47

slide-43
SLIDE 43

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Contents

1

Covariance Matrix Adaptation-Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization

2

Support Vector Machines Statistical Machine Learning Linear classifiers The kernel trick

3

Comparison-Based Surrogate Model for CMA-ES Previous Work Mixing Rank-SVM and Local Information Experiments

4

Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Michèle Sebag Surrogate optimization: SVM for CMA 37/ 47

slide-44
SLIDE 44

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Multi-objective CMA-ES (MO-CMA-ES)

MO-CMA-ES = µmo independent (1+1)-CMA-ES. Each (1+1)-CMA samples new offspring. The size of the temporary population is 2µmo. Only µmo best solutions should be chosen for new population after the hypervolume-based non-dominated sorting. Update of CMA individuals takes place.

Objective 2 Dominated Pareto

Michèle Sebag Surrogate optimization: SVM for CMA 38/ 47

slide-45
SLIDE 45

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

A Multi-Objective Surrogate Model

Rationale Rationale: find a unique function F(x) that defines the aggregated quality of the solution x in multi-objective case. Idea originally proposed using a mixture of One-Class SVM and regression-SVMa

  • aI. Loshchilov, M. Schoenauer, M. Sebag (GECCO 2010). "A Mono Surrogate for Multiobjective Optimization"

F

S V M

Objective 1 Objective 2

p

  • e

p + e

Dominated Pareto New Pareto X1 X2

p+e p-e 2e

Michèle Sebag Surrogate optimization: SVM for CMA 39/ 47

slide-46
SLIDE 46

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Unsing the Surrogate Model

Filtering Generate Ninform pre-children For each pre-children A and the nearest parent B calculate Gain(A, B) = Fsvm(A) − Fsvm(B) New children is the point with the maximum value of Gain

X1 X2 true Pareto SVM Pareto

Michèle Sebag Surrogate optimization: SVM for CMA 40/ 47

slide-47
SLIDE 47

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Dominance-Based Surrogate

Using Rank-SVM

Which ordered pairs? Considering all possible ≻ relations may be too expensive. Primary constraints: x and its nearest dominated point Secondary constraints: any 2 points not belonging to the same front (according to non-dominated sorting)

Objective 1 Objective 2

FSVM

Primary Secondary

  • constraints:

“>”

a b c d e f

All primary constraints, and a limited number of secondary constraints

Michèle Sebag Surrogate optimization: SVM for CMA 41/ 47

slide-48
SLIDE 48

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Dominance-Based Surrogate (2)

Construction of the surrogate model Initialize archive Ωactive as the set of Primary constraints, and Ωpassive as the set of Secondary constraints. Learn the model for 1000 |Ωactive| iterations. Add the most violated passive contraint from Ωpassive to Ωactive and optimize the model for 10 |Ωactive| iterations. Repeat the last step 0.1|Ωactive| times.

Michèle Sebag Surrogate optimization: SVM for CMA 42/ 47

slide-49
SLIDE 49

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Experimental Validation

Parameters

Surrogate Models ASM - aggregated surrogate model based on One-Class SVM and Regression SVM RASM - proposed Rank-based SVM SVM Learning Number of training points: at most Ntraining = 1000 points Number of iterations: 1000 |Ωactive| + |Ωactive|2 ≈ 2N 2

training

Kernel function: RBF function with σ equal to the average distance of the training points The cost of constraint violation: C = 1000 Offspring Selection Number of pre-children: p = 2 and p = 10

Michèle Sebag Surrogate optimization: SVM for CMA 43/ 47

slide-50
SLIDE 50

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Experimental Validation

Comparative Results

ASM and Rank-based ASM applied on top of NSGA-II (with hypervolume secondary criterion) and MO-CMA-ES,

  • n ZDT and IHR

functions. N = How many more true evaluations than best performer

Michèle Sebag Surrogate optimization: SVM for CMA 44/ 47

slide-51
SLIDE 51

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Discussion

Results on ZDT and IHR problems Rank-SVM versions are

1.5 times faster for p = 2 2-5 for p = 10

before the algorithm reaches nearly-optimal Pareto points premature convergence of approximation of optimal µ-distribution: the surrogate model only enforces convergence toward Pareto front, but does not care about diversity.

Michèle Sebag Surrogate optimization: SVM for CMA 45/ 47

slide-52
SLIDE 52

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization Background Dominance-based Surrogate Experimental Validation

Dominance-based Surrogate: Conclusion

The proposed aggregated surrogate model is invariant to ≻ preserving transformation of the objective functions. The speed-up is significant, but limited to the convergence to the

  • ptimal Pareto front.

Objective 1 Objective 2

a b c d e f g

FSVM

Primary Secondary

  • constraints:

“>”

Primary Secondary

  • constraints:

“=”

Thanks to the flexibility of SVM, "any" kind of preference could be taken into account: extreme points, "=" relations, Hypervolume Contribution, Decision Maker - defined ≻ relations.

Michèle Sebag Surrogate optimization: SVM for CMA 46/ 47

slide-53
SLIDE 53

Covariance Matrix Adaptation-Evolution Strategy Support Vector Machines Comparison-Based Surrogate Model for CMA-ES Dominance-based Surrogate Model for Multi-Objective Optimization

Machine Learning for Optimization: Discussion

Learning about the landscape easy Using available samples Using prior knowledge / constraints ...using it to speed-up the search very doable; more difficult Dilemma Exploration vs Exploitation Multi-modal optimization

Michèle Sebag Surrogate optimization: SVM for CMA 47/ 47