[PPT] - Black-box expensive optimization: Learn to optimize S ebastien PowerPoint Presentation

SLIDE 1

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Black-box expensive optimization: Learn to optimize

S´ ebastien Verel

Laboratoire d’Informatique, Signal et Image de la Cˆ

te d’opale (LISIC)

Universit´ e du Littoral Cˆ

te d’Opale, Calais, France

http://www-lisic.univ-littoral.fr/~verel/

August, 29th/30th, 2019

SLIDE 2

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

AI : Machine Learning, Optimization, perception, etc.

Learning :

Minimize an error function

{Mθ} : models to learn on data Search θ⋆ = arg minθ Error(Mθ, data) According to the model dimension, variables, error function, etc., huge number of optimization algorithms Optimization :

Learn a design algorithm

{Aθ} : search algorithms for problems (X, f ) Learn Aθ⋆ such that x = Aθ⋆(X, f ) is a good solution According to the class of algorithms, search spaces, functions, etc., huge number of learning algorithms

SLIDE 3

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Black-box (expensive) optimization

x − → − → f (x)

No direct analytic definition of the objective function f

Objective fonction : can be irregular, non continuous, non differentiable, etc. given by a computation or an (expensive) simulation

Few examples from the OSMOSE team at LISIC :

Nuclear power plant (Valentin Drouet),
Logistic simulation (Brahim Aboutaib),
Mobility simulation (Florian Leprˆ

etre),

Plant’s biology, plant growth (Amaury Dubois),
Cellular automata, grammatical inference (Bilal El Ghadyry),

SLIDE 4

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Real-world black-box expensive optimization

PhD of Mathieu Muniglia, 2014-2017, Valentin Drouet, 2017-2020, Jean-Michel Do, Jean-Charles Le Pallec, CEA-Saclay, Paris

x − → − → f (x) (73, .., 8) → → (∆zP, Ve) Multi-physic simulator Expensive multiobjective optimization : parallel computing, and surrogate model.

SLIDE 5

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Search algorithms

Principle Enumeration of the search space A lot of ways to enumerate the search space Main topics : local search, evolutionary algorithms, metaheuristics, etc. Main challenges : Understand search dynamics, Parameters, design components

SLIDE 6

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Multiobjective optimization

Multiobjective optimization problem X : set of feasible solutions in the decision space M 2 objective functions f = (f1, f2, . . . , fM) (to maximize) Z = f (X) ⊆ I RM : set of feasible outcome vectors in the

bjective space

Decision space

x2 x1

Objective space f f1

2

SLIDE 7

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Definition

Pareto dominance relation A solution x ∈ X dominates a solution x′ ∈ X (x′ ≺ x) iff ∀i ∈ {1, 2, . . . , M}, fi(x′) fi(x) ∃j ∈ {1, 2, . . . , M} such that fj(x′) < fj(x)

Decision space

x2 x1

Objective space f f1

2

non- dominated vector non- dominated solution vector dominated

SLIDE 8

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Pareto set, Pareto front

Decision space

x2 x1

Objective space f f1

2

Pareto front Pareto optimal set

Pareto

ptimal

solution

Goal Find the Pareto Optimal Set,

r a good approximation of the Pareto Optimal Set

NPP problem discrete optimization problem, ≈ 30min of computation per value, 24h total, 103 to 3.103 cores.

SLIDE 9

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Decomposition based approaches, MOEA/D

f f1

2

λ

z*

λ λ λ λ λ λ z1 z z z z z z

2 i-1 i i+1

1 2 i i+1 i-1 μ μ-1

μ-1 μ

Population at iteration t Minimization problem One solution xi for each sub pb. i Representation of solutions in

bjective space : zi = g(xi|λi, z⋆

i )

Same reference point for all sub-pb. z⋆ = z⋆

1 = . . . = z⋆ µ

Scalar function g : Weighted Tchebycheff Neighborhood size ♯B(i) = T = 3

SLIDE 10

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Decomposition based approaches, MOEA/D

f f1

2

λ

z*

λ λ λ λ λ λ z1 z z z z z z

2 i-1 i i+1

B(i)

1 2 i i+1 i-1 μ μ-1

μ-1 μ

From the neigh. B(i) of sub-pb. i, xi+1 is selected Minimization problem One solution xi for each sub pb. i Representation of solutions in

bjective space : zi = g(xi|λi, z⋆

i )

Same reference point for all sub-pb. z⋆ = z⋆

1 = . . . = z⋆ µ

Scalar function g : Weighted Tchebycheff Neighborhood size ♯B(i) = T = 3

SLIDE 11

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Decomposition based approaches, MOEA/D

f f1

2

λ

z*

λ λ λ λ λ λ z1 z z z z z z

2 i-1 i i+1

1 2 i i+1 i-1 μ μ-1

μ-1 μ

The mutated solution y is created Minimization problem One solution xi for each sub pb. i Representation of solutions in

bjective space : zi = g(xi|λi, z⋆

i )

Same reference point for all sub-pb. z⋆ = z⋆

1 = . . . = z⋆ µ

Scalar function g : Weighted Tchebycheff Neighborhood size ♯B(i) = T = 3

SLIDE 12

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Decomposition based approaches, MOEA/D

f f1

2

λ

z*

λ λ λ λ λ λ z1 z z z z z z

2 i-1 i i+1

1 2 i i+1 i-1 μ μ-1

μ-1 μ

According to scalar fonction, y is worst than xi−1, y is better than xi and replaces it. Minimization problem One solution xi for each sub pb. i Representation of solutions in

bjective space : zi = g(xi|λi, z⋆

i )

Same reference point for all sub-pb. z⋆ = z⋆

1 = . . . = z⋆ µ

Scalar function g : Weighted Tchebycheff Neighborhood size ♯B(i) = T = 3

SLIDE 13

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Decomposition based approaches, MOEA/D

f f1

2

λ

z*

λ λ λ λ λ λ z1 z z z z z z

2 i-1 i i+1

1 2 i i+1 i-1 μ μ-1

μ-1 μ

According to scalar fonction, y is also better than xi+1 and replaces it for the next iteration. Minimization problem One solution xi for each sub pb. i Representation of solutions in

bjective space : zi = g(xi|λi, z⋆

i )

Same reference point for all sub-pb. z⋆ = z⋆

1 = . . . = z⋆ µ

Scalar function g : Weighted Tchebycheff Neighborhood size ♯B(i) = T = 3

SLIDE 14

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Asynchronous distributed (1 + λ)-Evolution Strategy

Master-slaves architecture

Master Slave Slave Slave i Slave 3- Send xi 1-Receive fi 2- update xbest xi = mutation(xbest)

Algorithm on Master

{x1, . . . , xλ} ← Initialization() for i = 1..λ do Send (Non-blocking) xi to slave Si end for xbest ← ∅, and fbest ← ∞ repeat if there is a pending mess. from Si then Receive fitness fi of xi from Si if fi fbest then xbest ← xi, and fbest ← fi end if xi ← mutation(xbest) Send (Non-blocking) xi to slave Si end if until time limit

Parameters of mutation : fitness landscape analysis (random walk, and var. importance)

SLIDE 15

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

R´ esultat

SLIDE 16

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Adaptive distributed optimization algorithms

B. Aboutaib (2018-), C. Jankee (2014-2018), C. Fonlupt (LISIC), B. Derbel (univ. Lille)

SLIDE 17

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Adaptive distributed optimization algorithms

B. Aboutaib (2018-), C. Jankee (2014-2018), C. Fonlupt (LISIC), B. Derbel (univ. Lille)

SLIDE 18

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Main challenges

How to select an algorithm ? Design reinforcement learning methods for distributed computing (ǫ-greedy, adapt. pursuit, UCB, ...) How to compute a reward ? Aggregation function of local rewards (mean, max, etc.) for a global selection

3 4 4 8 2 5

Methodology : Use designed benchmark functions with designed properties and experimental analysis

SLIDE 19

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Features to learn : Mult.-Obj. fitness landscape

K. Tanaka, H. Aguirre (Univ. Shinshu), A. Liefooghe, B. Derbel (univ. Lille), 2010 - 2018

Fitness landscape : (X, f , N), Search space, obj. func., neighborhood relation

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Objective 2 Objective 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Objective 2 Objective 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Objective 2 Objective 1

conflicting objectives independent objectives correlated objectives

Fitness landscape analysis :

Search space Fitness ƒ1 ƒ2 ƒ1 ƒ2 ƒ1 ƒ2

dominating neighbors dominated neighbors incomparable neighbors incomparable neighbors locally non-dominated non-supported supported

SLIDE 20

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Features to learn : Mult.-Obj. fitness landscape

Performance prediction of MO algorithms based on fitness landscape features

(regression model : random forest, etc.)

f_cor_rws rho #lsupp_avg_rws #lsupp_avg_aws #lnd_avg_rws #lnd_avg_aws length_aws #sup_avg_aws #sup_avg_rws #inc_avg_rws #inf_avg_rws #inc_avg_aws #inf_avg_aws hv_r1_rws #sup_r1_rws #inf_r1_rws #lnd_r1_rws nhv_r1_rws hvd_r1_rws #inc_r1_rws k_n #lsupp_r1_rws n hvd_avg_rws hvd_avg_aws hv_avg_aws hv_avg_rws m nhv_avg_rws nhv_avg_aws nhv_avg_aws nhv_avg_rws m hv_avg_rws hv_avg_aws hvd_avg_aws hvd_avg_rws n #lsupp_r1_rws k_n #inc_r1_rws hvd_r1_rws nhv_r1_rws #lnd_r1_rws #inf_r1_rws #sup_r1_rws hv_r1_rws #inf_avg_aws #inc_avg_aws #inf_avg_rws #inc_avg_rws #sup_avg_rws #sup_avg_aws length_aws #lnd_avg_aws #lnd_avg_rws #lsupp_avg_aws #lsupp_avg_rws rho f_cor_rws

−1 1 Value 40 Kendall's tau

Count

Perf. prediction (cross-val.)

feature set MAE MSE R2 rank GSEMO all 0.007781 0.000118 0.951609 1 enumeration 0.008411 0.000142 0.943046 2 sampling all 0.009113 0.000161 0.932975 3 sampling rws 0.009284 0.000167 0.930728 4 sampling aws 0.010241 0.000195 0.917563 5 {r, m, n, k/n} 0.010609 0.000215 0.911350 6 {r, m, n} 0.026974 0.001123 0.518505 7 {m, n} 0.032150 0.001545 0.340715 8

SLIDE 21

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Learning/tuning parameters according to features

SLIDE 22

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Learning/tuning parameters according to features

SLIDE 23

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Learning/tuning parameters according to features

Overview of some results

50 100 150 200 N K type p | q avg fitness r1 fitness neutral rate

number of test instances

landscape−aware configuration is better tied baseline configuration is better

N K type p | q avg fitness r1 fitness neutral rate landscape−aware

config. is better

tied baseline config. is better landscape−aware

config. is better

tied baseline config. is better 500 1000 1500 20000.0 2.5 5.0 7.5 10.00.00 0.25 0.50 0.75 1.00 2.5 5.0 7.5 10.0 0.1 0.2 0.3 0.4 0.5 0.96 0.97 0.98 0.99 0.0 0.2 0.4 0.6 0.8

feature value instance partition

1 2 3

SLIDE 24

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Surrogate-assisted opt. of pseudo-boolean problems

Florain Lepretre, Virginie Marion, Cyril Fonlupt,

K. Tanaka, H. Aguirre (Univ. Shinshu), A. Liefooghe, B. Derbel (univ. Lille)

Goal : Replace/learn the (expensive) objective function with a (sheep) meta-model during the optimization process Continuous optimization : huge number of works Gaussian Process/krigging (EGO), polynomial chaos, NN, etc. Combinatorial optimization : few number of works. f : {0, 1}n − → I R Surrogate-Assisted Optimization

X ← initial sample repeat M ← Build model from X x⋆ ← Optimize w.r.t. an acquisition function X ← X ∪ {x⋆} until time limit

SLIDE 25

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Example : Efficient Global Optimizer

Model : Gaussian Process (GP) Acquisition function : Expected Improvement

GP : Random variables which have joint Gaussian distribution. mean : m(y(x)) = µ covariance : cov(y(x), y(x′)) = exp(−θd(x, x′)p)

from : Rasmussen, Williams, GP for ML, MIT Press, 2006.

A Hamming distance can be used for pseudo-boolean functions.

SLIDE 26

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Surrogate model for pseudo-boolean functions

Walsh functions ∀x ∈ {0, 1}n, ϕk(x) = (−1)

n−1

j=0 kjxj

Normal, and orthogonal basis

1 2n

x∈{0,1}n ϕi(x)ϕj(x) = δij

Any function can be written as : f (x) =

2n−1

k=0

wk.ϕk(x) Example, with order 2 : f (x) = w0 +

n

i=1

wi.(−1)xi +

n

i<j=1

wij.(−1)xi+xj

SLIDE 27

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Walsh Surrogate-assisted Optimizer (WSaO)

Surrogate model : W (x) =

k : ord(ϕk)d
wk.ϕk(x)

Estimation of the coefficients : linear regression with sparse techniques, LARS, or LASSO

0.00

0.01 0.02 0.03 0.04 100 200 300 400 500

Sample size Mean Abs. Error of fitness

method

kriging

walsh

SLIDE 28

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Walsh Surrogate-assisted Optimizer (WSaO)

X ← initial sample repeat W ← Build Walsh model from X x⋆ ← Optimize W X ← X ∪ {x⋆} until time limit Optimizer

Efficient optimization algorithm (ILS) using the additive property : δi(x) = W (x i) − W (x) = −2

k⊃i wkϕk(x)

δij(x) = δi(x j) − δi(x) = 4

k⊃i&k⊃j wkϕk(x)

Find best improving move in O(d) at each step

SLIDE 29

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Walsh Surrogate-assisted Optimizer (WSaO)

UBQP (min.) n = 10 n = 25 n = 100 Krigging : distance in high dimension BOCS : bayesian estimation of multilinear polynome, basis, optimization method (SA) Will be applied to bus-stop position problem for Calais city

SLIDE 30

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Agronomy : plant irrigation

Amaury Dubois, 2018-2021, J´ erˆ

me Leroy, Weenat

, F. Teytaud, Eric Ramat

Numerical simulator Goal : Setting of plant and sol parameters for hydric stress prediction Key points : Algorithm design for multimodal optimization problems (QRDS) Combine offline learning (optimization techniques), and online learning during saison (ensemble learning techniques)

SLIDE 31

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Agronomy : plant irrigation

Amaury Dubois, 2018-2021, J´ erˆ

me Leroy, Weenat

, F. Teytaud, Eric Ramat

Goal : Setting of plant and sol parameters for hydric stress prediction Key points : Algorithm design for multimodal optimization problems (QRDS) Combine offline learning (optimization techniques), and online learning during saison (ensemble learning techniques)

SLIDE 32

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Grammatical Inference

Bilal El Ghadyry (2019-2022), Faissal Ouardi (Univ. M5, Rabat)

Goal Learning a Determinist Finite Automata from examples of words Example

{abbab, bababaa, bbb, bbbabbb} ∈ L {bb, ba, bbbb, abbabbb} ∈ L

2 1 b b b a,c a,c a,c

Applications : biologie (g´ en´ etique), physique, informatique, etc. Combinatorial optimization, genetic programing Active learning Representation

SLIDE 33

Introduction Nuclear power plant control design Learning for optimization Surrogate-assisted optimization Conclusions

Conclusions

Combinatorial optimization : black-box, expensive Multiobjective optimization Fitness landscapes for performance prediction Adaptive algorithm Surrogate model for combinatorial optimization