Comparison of Ordinal and Metric Gaussian Process Regression as - - PowerPoint PPT Presentation

comparison of ordinal and metric gaussian process
SMART_READER_LITE
LIVE PREVIEW

Comparison of Ordinal and Metric Gaussian Process Regression as - - PowerPoint PPT Presentation

DTS-CMA-ES Surrogate models Experimental results Comparison of Ordinal and Metric Gaussian Process Regression as Surrogate Models for CMA Evolution Strategy ek Pitra 1 , 2 , 3 , Luk Bajer 1 , 4 , Jakub Repick 1 , 4 , Zbyn na 1 Martin


slide-1
SLIDE 1

DTS-CMA-ES Surrogate models Experimental results

Comparison of Ordinal and Metric Gaussian Process Regression as Surrogate Models for CMA Evolution Strategy

Zbynˇ ek Pitra1,2,3, Lukáš Bajer1,4, Jakub Repický1,4, Martin Holeˇ na1

1Institute of Computer Science, Czech Academy of Sciences 2Faculty of Nuclear Sciences and Physical Engineering 3National Institute of Mental Health 4Faculty of Mathematics and Physics, Charles University

Prague, Czech Republic

GECCO 2017

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 1

slide-2
SLIDE 2

DTS-CMA-ES Surrogate models Experimental results

Contents

1

DTS-CMA-ES

2

Surrogate models Metric Gaussian Processes Ordinal Gaussian Processes

3

Experimental results

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 2

slide-3
SLIDE 3

DTS-CMA-ES Surrogate models Experimental results

DTS-CMA-ES

Initialize: standard CMA-ES initialization with population doubled while not terminate

1

CMA-ES sampling of population xi ∼ N(m, σ2C), for i = 1, . . . , λ

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3

m1, σ1 1 sampling from N(m1, σ1) CMA-ES

slide-4
SLIDE 4

DTS-CMA-ES Surrogate models Experimental results

DTS-CMA-ES

Initialize: standard CMA-ES initialization with population doubled while not terminate

1

CMA-ES sampling of population xi ∼ N(m, σ2C), for i = 1, . . . , λ

2

train the first model fM1 on the so-far original-evaluated points

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3

m1, σ1 2 1st model training

slide-5
SLIDE 5

DTS-CMA-ES Surrogate models Experimental results

DTS-CMA-ES

Initialize: standard CMA-ES initialization with population doubled while not terminate

1

CMA-ES sampling of population xi ∼ N(m, σ2C), for i = 1, . . . , λ

2

train the first model fM1 on the so-far original-evaluated points

3

get mean ˆ µi and variance ˆ s2

i of all xi with the model fM1

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3

m1, σ1 3 distribution prediction according to 1st model s2

slide-6
SLIDE 6

DTS-CMA-ES Surrogate models Experimental results

DTS-CMA-ES

Initialize: standard CMA-ES initialization with population doubled while not terminate

1

CMA-ES sampling of population xi ∼ N(m, σ2C), for i = 1, . . . , λ

2

train the first model fM1 on the so-far original-evaluated points

3

get mean ˆ µi and variance ˆ s2

i of all xi with the model fM1

4

select the most promising ⌈αλ⌉ points accord. to the model fM1

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3

m1, σ1

3rd 3rd

4 criterion ranking according to 1st model

1st 1st 2nd 2nd

s2

slide-7
SLIDE 7

DTS-CMA-ES Surrogate models Experimental results

DTS-CMA-ES

Initialize: standard CMA-ES initialization with population doubled while not terminate

1

CMA-ES sampling of population xi ∼ N(m, σ2C), for i = 1, . . . , λ

2

train the first model fM1 on the so-far original-evaluated points

3

get mean ˆ µi and variance ˆ s2

i of all xi with the model fM1

4

select the most promising ⌈αλ⌉ points accord. to the model fM1

5

evaluate the chosen points with the original fitness f

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3

m1, σ1 5 fitness evaluation

  • f a few

chosen points

slide-8
SLIDE 8

DTS-CMA-ES Surrogate models Experimental results

DTS-CMA-ES

Initialize: standard CMA-ES initialization with population doubled while not terminate

1

CMA-ES sampling of population xi ∼ N(m, σ2C), for i = 1, . . . , λ

2

train the first model fM1 on the so-far original-evaluated points

3

get mean ˆ µi and variance ˆ s2

i of all xi with the model fM1

4

select the most promising ⌈αλ⌉ points accord. to the model fM1

5

evaluate the chosen points with the original fitness f

6

re-train the second model fM2 with these new points

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3

m1, σ1 6 2nd model training

slide-9
SLIDE 9

DTS-CMA-ES Surrogate models Experimental results

DTS-CMA-ES

Initialize: standard CMA-ES initialization with population doubled while not terminate

1

CMA-ES sampling of population xi ∼ N(m, σ2C), for i = 1, . . . , λ

2

train the first model fM1 on the so-far original-evaluated points

3

get mean ˆ µi and variance ˆ s2

i of all xi with the model fM1

4

select the most promising ⌈αλ⌉ points accord. to the model fM1

5

evaluate the chosen points with the original fitness f

6

re-train the second model fM2 with these new points

7

predict the fitness for the non-original-evaluated points with fM2

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3

m1, σ1 7

population mean-prediction 2nd model for the rest of

slide-10
SLIDE 10

DTS-CMA-ES Surrogate models Experimental results

DTS-CMA-ES

Initialize: standard CMA-ES initialization with population doubled while not terminate

1

CMA-ES sampling of population xi ∼ N(m, σ2C), for i = 1, . . . , λ

2

train the first model fM1 on the so-far original-evaluated points

3

get mean ˆ µi and variance ˆ s2

i of all xi with the model fM1

4

select the most promising ⌈αλ⌉ points accord. to the model fM1

5

evaluate the chosen points with the original fitness f

6

re-train the second model fM2 with these new points

7

predict the fitness for the non-original-evaluated points with fM2

8

CMA-ES update of m, σ, C

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3

m2, σ2 8

update CMA-ES m, σ, C

slide-11
SLIDE 11

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Gaussian Process

GP is a stochastic process, where any finite collection of random variables has a joint Gaussian distribution fGP(x) ∼ GP(µ(x), k(x1, x2)) Defined by the mean function µ(x) (usually constant) and covariance function k(x1, x2) and their (hyper)parameters

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 4

slide-12
SLIDE 12

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Gaussian Process

GP is a stochastic process, where any finite collection of random variables has a joint Gaussian distribution fGP(x) ∼ GP(µ(x), k(x1, x2)) Defined by the mean function µ(x) (usually constant) and covariance function k(x1, x2) and their (hyper)parameters GP can express uncertainty of the prediction in a new point x: it gives a probability distribution of the output value

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 4

slide-13
SLIDE 13

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Gaussian Process

given a set of N training points XN = (x1 . . . xN), xi ∈ Rd, and corresponding measured values yN = (y1, . . . , yN)⊤

  • f a function f being approximated

yi = f(xi), i = 1, . . . , N

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 5

slide-14
SLIDE 14

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Gaussian Process

given a set of N training points XN = (x1 . . . xN), xi ∈ Rd, and corresponding measured values yN = (y1, . . . , yN)⊤

  • f a function f being approximated

yi = f(xi), i = 1, . . . , N GP considers vector of these function values as a sample from N-variate Gaussian distribution yN ∼ N(0, CN)

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 5

slide-15
SLIDE 15

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Gaussian Process prediction

When considering a new point (x∗, y∗), the prob. density of its f-values is 1D Gaussian p(y∗ | XN, x∗, yN) ∼ N(ˆ µN+1,ˆ s2N+1)

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 6

slide-16
SLIDE 16

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Gaussian Process prediction

When considering a new point (x∗, y∗), the prob. density of its f-values is 1D Gaussian p(y∗ | XN, x∗, yN) ∼ N(ˆ µN+1,ˆ s2N+1) with the mean and variance given by ˆ µN+1 = k⊤CN−1yN, s2N+1 = κ − k⊤CN−1k where CN is GP covariance matrix – matrix of covariance function’s values k(xi, xj) for each pair xi, xj k is vector of covariance function’s values k(x∗, xi) between the new point x∗ and xi ∈ XN κ is the variance of the new point itself k(x∗, x∗)

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 6

slide-17
SLIDE 17

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Ordinal Gaussian Processes

Ordinal GP = Gaussian process fGP(x) ∼ GP(µ(x), k(x1, x2)) trained on ordinal values 0, 1, . . . , r instead of original f-values (including the following transformation) linearly mapped via set of additional parameters α0, α, b1, . . . , br−1 onto the space of ordinal values 0, 1, . . . , r as fORD(x) = α0 − α fGP(x) where −∞ = b0 < b1 < · · · < br−1 < br = ∞.

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 7

slide-18
SLIDE 18

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Ordinal Gaussian Processes

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 8

Training

1

(xi, yi)N

i=1 ← A

{load data from archive} A – original data archive

1 training set

x1 xn

slide-19
SLIDE 19

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Ordinal Gaussian Processes

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 8

Training

1

(xi, yi)N

i=1 ← A

{load data from archive}

2

{yord

i

}N

i=1 ← cluster({yi}N i=1, r)

A – original data archive r – number of cluster levels

yord

3

yord

2

yord

1

2 clustered data

slide-20
SLIDE 20

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Ordinal Gaussian Processes

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 8

Training

1

(xi, yi)N

i=1 ← A

{load data from archive}

2

{yord

i

}N

i=1 ← cluster({yi}N i=1, r)

3

(α, {βj}r−1

j=1 , θ)∗ ← arg max α,{βj}r−1

j=1 ,θ

log ˆ L({yord

i

}N

i=1|{xi}N i=1, α, {βj}r−1 j=1 , θ)

A – original data archive r – number of cluster levels α, α0 – linear mapping parameters βi = α0 + bi θ – latent GP hyperparameters

yord

3

yord

2

yord

1

3

  • rdinal

GP model

b1 b2 b0 = −∞ b3 = ∞ α0 − αµ I3 I2 I1 model trained through likelihood maximization

slide-21
SLIDE 21

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Ordinal Gaussian Processes

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 9

Prediction {xi}λ

i=1 – population to predict

α0 − αµ

new population

slide-22
SLIDE 22

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Ordinal Gaussian Processes

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 9

Prediction

1

pi,k ← P(f(xi) ∈ Ik|xi, α, {βj}r−1

j=1 , θ)

∀k = 1, . . . , r, ∀i = 1, . . . , λ {xi}λ

i=1 – population to predict

r – number of cluster levels α, α0 – linear mapping parameters βi = α0 + bi θ – latent GP hyperparameters

b1 b2 b0 = −∞ b3 = ∞ α0 − αµ

p3

1 I3 I2 I1

p2 p1

Iarg maxk pk

1 probability prediction

slide-23
SLIDE 23

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Ordinal Gaussian Processes

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 9

Prediction

1

pi,k ← P(f(xi) ∈ Ik|xi, α, {βj}r−1

j=1 , θ)

∀k = 1, . . . , r, ∀i = 1, . . . , λ

2

qi ← r

k=1 pi,kk

∀i = 1, . . . , λ {xi}λ

i=1 – population to predict

r – number of cluster levels α, α0 – linear mapping parameters βi = α0 + bi θ – latent GP hyperparameters

2 weighted

b1 b2 b0 = −∞ b3 = ∞ α0 − αµ

p3

1 I3 I2 I1

p2 p1 3

k=1 pkk

prediction

Iarg maxk pk mapping a new population to intervals using probability

slide-24
SLIDE 24

DTS-CMA-ES Surrogate models Experimental results Metric Gaussian Processes Ordinal Gaussian Processes

Ordinal Gaussian Processes

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 9

Prediction

1

pi,k ← P(f(xi) ∈ Ik|xi, α, {βj}r−1

j=1 , θ)

∀k = 1, . . . , r, ∀i = 1, . . . , λ

2

qi ← r

k=1 pi,kk

∀i = 1, . . . , λ

3

{xi:λ}λ

i=1 ← order {xi}λ i=1 according to q1:λ ≤ q2:λ ≤ · · · ≤ qλ:λ

{xi}λ

i=1 – population to predict

r – number of cluster levels α, α0 – linear mapping parameters βi = α0 + bi θ – latent GP hyperparameters

b1 b2 b0 = −∞ b3 = ∞ I3 I2 I1

3

k=1 pkk

3

  • rdered

population

1st 2nd 3rd

slide-25
SLIDE 25

DTS-CMA-ES Surrogate models Experimental results

Experimental settings

Noiseless part of the BBOB 100 FE/D budget Algorithms

CMA-ES DTS-CMA-ES Ord-N-DTS – no clustering Ord-Q-DTS – quantile-based clustering Ord-H-DTS – aglomerative hierarchical clustering

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 10

slide-26
SLIDE 26

DTS-CMA-ES Surrogate models Experimental results

Experimental settings

Noiseless part of the BBOB 100 FE/D budget Algorithms

CMA-ES DTS-CMA-ES Ord-N-DTS – no clustering Ord-Q-DTS – quantile-based clustering Ord-H-DTS – aglomerative hierarchical clustering

Ordinal settings

λ ordinal levels Matérn GP kernel

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 10

slide-27
SLIDE 27

DTS-CMA-ES Surrogate models Experimental results

Experimental results on BBOB (2 D)

1 2 3

log10 of (# f-evals / dimension)

0.0 0.2 0.4 0.6 0.8 1.0

Proportion of function+target pairs

CMA-ES Ord-H-DTS Ord-N-DTS Ord-Q-DTS DTS-CMA-E best 2009

bbob - f1-f24, 2-D 31 target RLs/dim: 0.5..50 from refalgs/best2009-bbob.tar.gz 15 instances

v2.1

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 11

slide-28
SLIDE 28

DTS-CMA-ES Surrogate models Experimental results

Experimental results on BBOB (5 D)

1 2 3

log10 of (# f-evals / dimension)

0.0 0.2 0.4 0.6 0.8 1.0

Proportion of function+target pairs

CMA-ES Ord-N-DTS Ord-H-DTS Ord-Q-DTS DTS-CMA-E best 2009

bbob - f1-f24, 5-D 31 target RLs/dim: 0.5..50 from refalgs/best2009-bbob.tar.gz 15 instances

v2.1

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 12

slide-29
SLIDE 29

DTS-CMA-ES Surrogate models Experimental results

Experimental results on BBOB (10 D)

1 2 3

log10 of (# f-evals / dimension)

0.0 0.2 0.4 0.6 0.8 1.0

Proportion of function+target pairs

CMA-ES Ord-H-DTS Ord-N-DTS Ord-Q-DTS DTS-CMA-E best 2009

bbob - f1-f24, 10-D 31 target RLs/dim: 0.5..50 from refalgs/best2009-bbob.tar.gz 15 instances

v2.1

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 13

slide-30
SLIDE 30

DTS-CMA-ES Surrogate models Experimental results

ECDF results on the whole BBOB (5 D)

separable moderate ill-conditional

1 2 3 log10 of (# f-evals / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of function+target pairs

CMA-ES Ord-N-DTS Ord-H-DTS Ord-Q-DTS DTS-CMA-E best 2009 bbob - f1-f5, 5-D 31 target RLs/dim: 0.5..50 from refalgs/best2009-bbob.tar.gz 15 instances

v2.1

1 2 3 log10 of (# f-evals / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of function+target pairs

CMA-ES Ord-N-DTS Ord-H-DTS Ord-Q-DTS DTS-CMA-E best 2009 bbob - f6-f9, 5-D 31 target RLs/dim: 0.5..50 from refalgs/best2009-bbob.tar.gz 15 instances

v2.1

1 2 3 log10 of (# f-evals / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of function+target pairs

CMA-ES Ord-N-DTS Ord-H-DTS Ord-Q-DTS DTS-CMA-E best 2009 bbob - f10-f14, 5-D 31 target RLs/dim: 0.5..50 from refalgs/best2009-bbob.tar.gz 15 instances

v2.1

multi-modal weakly structured multi-modal

1 2 3 log10 of (# f-evals / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of function+target pairs

CMA-ES Ord-Q-DTS Ord-H-DTS Ord-N-DTS DTS-CMA-E best 2009 bbob - f15-f19, 5-D 31 target RLs/dim: 0.5..50 from refalgs/best2009-bbob.tar.gz 15 instances

v2.1

1 2 3 log10 of (# f-evals / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of function+target pairs

CMA-ES Ord-N-DTS Ord-H-DTS Ord-Q-DTS DTS-CMA-E best 2009 bbob - f20-f24, 5-D 31 target RLs/dim: 0.5..50 from refalgs/best2009-bbob.tar.gz 15 instances

v2.1

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 14

slide-31
SLIDE 31

DTS-CMA-ES Surrogate models Experimental results

Results on f6 and f22

2 3 5 10 20 40 1 2 3

15 instances target RL/dim: 10

v2.1

6 Attractive sector

2 3 5 10 20 40 1 2 3

15 instances target RL/dim: 10

v2.1

22 Gallagher 21 peaks 2 3 5 10 20 40 1 2 3 4

15 instances target RL/dim: 10

v2.1

1 Sphere

Ord-N-DTS Ord-H-DTS Ord-Q-DTS DTS-CMA-ES CMA-ES

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 15

slide-32
SLIDE 32

DTS-CMA-ES Surrogate models Experimental results

Conclusions

Effect of different clustering methods not crucial Performance of the ordinal GP models is considerably lower than the standard GP models with few exceptions (e. g., attractive sector f6) Further investigation:

Adaptive switch between metric and ordinal models

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 16

slide-33
SLIDE 33

DTS-CMA-ES Surrogate models Experimental results

Thank you!

z.pitra@gmail.com bajeluk@gmail.com j.repicky@gmail.com martin@cs.cas.cz

Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 17