(1, )-Evolution Strategy with Cumulative Step size Adaptation on - - PowerPoint PPT Presentation

1 evolution strategy with cumulative step size adaptation
SMART_READER_LITE
LIVE PREVIEW

(1, )-Evolution Strategy with Cumulative Step size Adaptation on - - PowerPoint PPT Presentation

(1, )-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions Adrien Coutoux, Auger Anne, Hansen Nikolaus INRIA Saclay - Ile-de-France, Project team TAO (1, )-Evolution Strategy with Cumulative Step size


slide-1
SLIDE 1

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions

Adrien Couëtoux, Auger Anne, Hansen Nikolaus INRIA Saclay - Ile-de-France, Project team TAO

slide-2
SLIDE 2

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Problem statement

Continuous optimization: minimization of a cost f(x), depending on , d being the dimension of the search space, that is unbounded. We study the behavior of one evolutionary strategy, the (1,λ)- ES with Cumulative Step size Adaptation (CSA) We limit our study to the case where the cost function f(.) is

  • linear. This case is important because most cost functions can

be approximated, on small intervals, by linear functions.

x ∈ Rd

slide-3
SLIDE 3

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Problem statement - (1,λ)-Evolution Strategy

Representation in 2-D, with the cost function f(x1,x2)=x1

from one parent, we generate λ off-springs, and select one of them as the next parent.

x2
 x1
 Candidates Parent

slide-4
SLIDE 4

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Problem statement - (1,λ)-Evolution Strategy

Representation in 2-D, with the cost function f(x1,x2)=x1

from one parent, we generate λ off-springs, and select one of them as the next parent.

x2
 x1
 Candidates Parent Selected Candidate

slide-5
SLIDE 5

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Problem statement - Cumulative Step size Adaptation

Given a point X in the search space, one standard way of generating a candidate solution is the following: σ being called the step size In our case, the cost function is linear, and the search space is

  • unbounded. This means that the optimal solution is infinite.

Hence, we want our population to diverge towards the optimal direction (this direction being the opposite of the gradient of f(.)). To have a population that quickly moves toward the optimum, we need to have a large step size.

Xcandidate ~ X + σ N 0,Id

( )

slide-6
SLIDE 6

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

We want to show that CSA is proficient on linear functions. This means that we have to make sure that the step size is growing, and that our population moves toward the right direction. For practical reasons, we will study the series rather than the step size itself. To simplify our computations, we consider the case where the cost function is f(x1…,xd)=x1. (1,λ)-ES is rotational invariant, and the gradient of a linear function is a constant

  • vector. Hence, our results do not suffer from this

simplification.

Our goals

ln σ n+1

( ) − ln σ n ( )

slide-7
SLIDE 7

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Parameters

 0<c<1 (usually between 1/d1/2 and 1/d); it represents the weight of the

past of the search in the update procedure

 λ (number of candidates sampled at each iteration)  X0 the initial parent,σ0>0 the initial step size and p0 a vector of the

search space, the initial path (usually 0search_space)

When at the nth iteration, given the current population Xn in S, step size σn and path pn:

 We sample λ candidates (Xn,i)1≤i≤λ, independent and identically

distributed with

 We select the one member that minimizes the cost function:

Xn+1=argmin1≤i≤λ {f(Xn,i )} (=min{Xn,i,1} in our case )

CSA algorithm

Xn,i ~ Xn + σ nN 0,Id

( )

slide-8
SLIDE 8

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

CSA algorithm – update of the step size and the path

We update the path: with We update the step size:

pn+1 = 1− c

( )pn +

c 2 − c

( )Yn

Yn = Xn+1 − Xn σ n

σ n+1 = σ n exp c 2 pn+1

2

d −1                

slide-9
SLIDE 9

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Simulations-logarithm of the step size

Evolution of the logarithm of the step size, as a function of the number of iterations (d=20, λ=15, c=0.05):

slide-10
SLIDE 10

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Study of the step size

We study the growth of the step size, through the difference between the logarithm of the updated step size and the one of the current step size: Note that . We compare the expectation of the path to its expectation if the selection was done completely randomly (i.e. “random search”). This clearly suggest that to obtain results on the step size, we first need to study the path, or at least its squared norm.

ln σ n+1

( ) − ln σ n ( ) = c

2 pn+1

2

d −1        

E N 0,Id

( )

2

( ) = d

slide-11
SLIDE 11

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Theoretical results-How does the path evolves?

Remember: Looks like a Markov chain…

pn+1 = 1− c

( )pn +

c 2 − c

( )Yn

slide-12
SLIDE 12

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Theoretical results-How does the path evolves?

Remember: Looks like a Markov chain… Selection only relies on the first coordinate. The selected member is the one with the lowest first coordinate, among λ independent and identically distributed random variables. is the first order statistic of λ normal variables. We can show that indeed (pn)n≥0 is a Markov chain, and that:

N1:λ

Y = N1:λ N2  Nd            

ϕ1:λ(x) = λϕ(x) 1− Φ(x)

( )

λ−1

pn+1 = 1− c

( )pn +

c 2 − c

( )Yn

slide-13
SLIDE 13

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Theoretical results-How does the path evolves?

Remember: Looks like a Markov chain… Selection only relies on the first coordinate. The selected member is the one with the lowest first coordinate, among λ independent and identically distributed random variables. is the first order statistic of λ normal variables. We can show that indeed (pn)n≥0 is a Markov chain, and that:

N1:λ

Y = N1:λ N2  Nd            

ϕ1:λ(x) = λϕ(x) 1− Φ(x)

( )

λ−1

pn+1 = 1− c

( )pn +

c 2 − c

( )Yn

Independent

  • f n
slide-14
SLIDE 14

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Expectation of the squared norm of the path

Using nothing but the previous equations, we prove that has a limit in expectation, and we find the following explicit expression and In our simulations, with d=20, c=0.01, and λ=15, this gives Lp=144.4945

Lp := limn→∞ E pn

2 = d −1+ E N1:λ 2

( ) + 2 1− c

( )

c E N1:λ

( )

2

pn+1

2

E N1:λ

2

( ) −1≥ 0

slide-15
SLIDE 15

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Expectation of the squared norm of the path

Using nothing but the previous equations, we prove that has a limit in expectation, and we find the following explicit expression and In our simulations, with d=20, c=0.01, and λ=15, this gives Lp=144.4945 Why is it interesting?

Lp := limn→∞ E pn

2 = d −1+ E N1:λ 2

( ) + 2 1− c

( )

c E N1:λ

( )

2

pn+1

2

E N1:λ

2

( ) −1≥ 0

slide-16
SLIDE 16

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Expectation of the squared norm of the path

Using nothing but the previous equations, we prove that has a limit in expectation, and we find the following explicit expression and In our simulations, with d=20, c=0.01, and λ=15, this gives Lp=144.4945 Why is it interesting?

Lp := limn→∞ E pn

2 = d −1+ E N1:λ 2

( ) + 2 1− c

( )

c E N1:λ

( )

2

pn+1

2

E N1:λ

2

( ) −1≥ 0

ln σ n+1

( ) − ln σ n ( ) = c

2 pn+1

2

d −1        

slide-17
SLIDE 17

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Simulations-

Evolution of the squared norm of the path (blue), and of its expectation (red), with respect to n (d=20, λ=15, c=0.05)

pn+1

2

slide-18
SLIDE 18

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Maximum number of iterations to obtain a growing step size

If we add the (very reasonable) assumption that, as one usually does, p0 is initialized such that E(p0)=0, we can provide an upper bound to the number of iterations required for the squared norm

  • f the path to reach a certain level. For all, for all n>M,

with

E pn

2 ≥ d + c 2 − c

( ) E N1:λ

2

( ) −1

( ) > d

M = d c 2 − c

( ) E N1:λ

2

( ) −1

( )

= 20.5426

slide-19
SLIDE 19

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Simulations - Evolution of

The evolution of as a function of n

ln σ n+1

( ) − ln σ n ( )

c 2 pn+1

2

d −1        

slide-20
SLIDE 20

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Theoretical results: stability properties

The Law of large numbers holds for the Markov chain (pn)n≥0. We have the limit of the expectation of the squared norm of the path. Combining this with we show that: almost surely. Using our parameters (d=20, c=0.01, λ=15), this gives a limit of 0.1556

Lσ := limn→∞ ln σ n

( )

n = c 2d E N1:λ

2

( ) −1+ 2 1− c

( )

c E N1:λ

( )

2

     

ln σ n+1

( ) − ln σ n ( ) = c

2 pn+1

2

d −1        

slide-21
SLIDE 21

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Simulation- logarithm of the step size

Evolution of the logarithm of the step size, divided by n, as a function of n (in green, the limit Lσ):

slide-22
SLIDE 22

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Stability properties

The central limit theorem holds for the Markov chain (pn), which leads to:

 As n grows to infinity:

γ being a stricly positive constant, function of (pn)n≥0

 In our simulations we have estimated γ2 to 0.2051.

1 nγ 2 ln σ n

( )

n − Lσ       → N(0,1)

slide-23
SLIDE 23

(1,λ)-Evolution Strategy with Cumulative Step size Adaptation on linear cost functions - Adrien Couëtoux – 25/01/10

Conclusion

We proved that, asymptotically, the expectation of the logarithm of the step size grows linearly. Under very reasonable assumptions, we also give an upper bound to the number of required steps before the step size starts growing. We studied the path, the underlying Markov chain, proving that the Law of Large Numbers and the Central Limit Theorem hold. This gives the limit of ln(σn)/n, and the asymptotic distribution of ln(σn) This directly relates to the performance of (1,λ)-ES with CSA through:

Xn+1 = Xn + σ nY

Y = N1:λ N2  Nd            