On the Analysis of the Simple Genetic Algorithm Pietro Oliveto 1 - - PowerPoint PPT Presentation

on the analysis of the simple genetic algorithm
SMART_READER_LITE
LIVE PREVIEW

On the Analysis of the Simple Genetic Algorithm Pietro Oliveto 1 - - PowerPoint PPT Presentation

Introduction Proof Ideas End On the Analysis of the Simple Genetic Algorithm Pietro Oliveto 1 Carsten Witt 2 1 School of Computer Science, The University of Birmingham, UK 2 Technical University of Denmark, Kgs. Lyngby, Denmark ThRaSH 2012, 2nd


slide-1
SLIDE 1

1/10 Introduction Proof Ideas End

On the Analysis of the Simple Genetic Algorithm

Pietro Oliveto1 Carsten Witt2

1School of Computer Science, The University of Birmingham, UK 2Technical University of Denmark, Kgs. Lyngby, Denmark

ThRaSH 2012, 2nd May 2012

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-2
SLIDE 2

2/10 Introduction Proof Ideas End

Context

Runtime Analysis of Evolutionary Computation (EC) has made significant progress in the last 15 years, is believed to explain the working principles of EC, is often able to capture intuitive ideas about these working principles in rigorous theorems.

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-3
SLIDE 3

2/10 Introduction Proof Ideas End

Context

Runtime Analysis of Evolutionary Computation (EC) has made significant progress in the last 15 years, is believed to explain the working principles of EC, is often able to capture intuitive ideas about these working principles in rigorous theorems. One Long-Term Aim of Runtime Analysis Prove rigorous statements about the working principles of “realistic”/standard EC, e. g. Genetic Algorithms with crossover.

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-4
SLIDE 4

2/10 Introduction Proof Ideas End

Context

Runtime Analysis of Evolutionary Computation (EC) has made significant progress in the last 15 years, is believed to explain the working principles of EC, is often able to capture intuitive ideas about these working principles in rigorous theorems. One Long-Term Aim of Runtime Analysis Prove rigorous statements about the working principles of “realistic”/standard EC, e. g. Genetic Algorithms with crossover. Is state-of-the-art runtime analysis sufficiently far developed for this?

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-5
SLIDE 5

3/10 Introduction Proof Ideas End

A Challenge

Runtime Analysis of Crossover-Based EC is often avoided (→ (1+1) EA), but has done with success in several cases (e.g., Real Royal Road functions, crossover in combinatorial optimization, . . . ), is an emergent issue (→ next talk(s)), definitely gives insight into the working principles of crossover.

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-6
SLIDE 6

3/10 Introduction Proof Ideas End

A Challenge

Runtime Analysis of Crossover-Based EC is often avoided (→ (1+1) EA), but has done with success in several cases (e.g., Real Royal Road functions, crossover in combinatorial optimization, . . . ), is an emergent issue (→ next talk(s)), definitely gives insight into the working principles of crossover. But Do these analyses study “standard GAs” on “simple standard problems”?

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-7
SLIDE 7

3/10 Introduction Proof Ideas End

A Challenge

Runtime Analysis of Crossover-Based EC is often avoided (→ (1+1) EA), but has done with success in several cases (e.g., Real Royal Road functions, crossover in combinatorial optimization, . . . ), is an emergent issue (→ next talk(s)), definitely gives insight into the working principles of crossover. But Do these analyses study “standard GAs” on “simple standard problems”? Not necessarily: mostly non-standard selection or non-standard problems

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-8
SLIDE 8

4/10 Introduction Proof Ideas End

A Standard GA

The Simple GA (SGA) for maximization of f : {0, 1}n → R

1 Create population P of µ randomly chosen individuals. 2 C := ∅. 3 While |C| < µ do

Fitness-proportional selection: Select two parents x ′ and x ′′ from P proportional to their fitness without replacement. Uniform crossover: Create an offspring x by setting each bit xi = x ′

i

with probability 1/2 and xi = x ′′

i

  • therwise, for 1 ≤ i ≤ n.

Standard Bit Mutation: Flip each bit xi of x with probability 1/n. C := C ∪ {x}.

4 Set P := C and go to 2.

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-9
SLIDE 9

4/10 Introduction Proof Ideas End

A Standard GA

The Simple GA (SGA) for maximization of f : {0, 1}n → R

1 Create population P of µ randomly chosen individuals. 2 C := ∅. 3 While |C| < µ do

Fitness-proportional selection: Select two parents x ′ and x ′′ from P proportional to their fitness without replacement. Uniform crossover: Create an offspring x by setting each bit xi = x ′

i

with probability 1/2 and xi = x ′′

i

  • therwise, for 1 ≤ i ≤ n.

Standard Bit Mutation: Flip each bit xi of x with probability 1/n. C := C ∪ {x}.

4 Set P := C and go to 2.

Why “Standard” studied in monographs on GAs (Goldberg 1989) theoretically analyzed in the infinite-population model (Vose 1999)

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-10
SLIDE 10

5/10 Introduction Proof Ideas End

The Challenge, More Precisely

Can we do a runtime analysis for the Simple GA, e. g., on OneMax?

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-11
SLIDE 11

5/10 Introduction Proof Ideas End

The Challenge, More Precisely

Can we do a runtime analysis for the Simple GA, e. g., on OneMax? “It might be simple, but it is not easy” (from an amazon.com review)

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-12
SLIDE 12

5/10 Introduction Proof Ideas End

The Challenge, More Precisely

Can we do a runtime analysis for the Simple GA, e. g., on OneMax? “It might be simple, but it is not easy” (from an amazon.com review) What Would We Expect

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-13
SLIDE 13

5/10 Introduction Proof Ideas End

The Challenge, More Precisely

Can we do a runtime analysis for the Simple GA, e. g., on OneMax? “It might be simple, but it is not easy” (from an amazon.com review) What Would We Expect Backward drift due to mutation close to the optimum, no positive drift due to crossover, selection too weak to keep positive fluctuations.

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-14
SLIDE 14

5/10 Introduction Proof Ideas End

The Challenge, More Precisely

Can we do a runtime analysis for the Simple GA, e. g., on OneMax? “It might be simple, but it is not easy” (from an amazon.com review) What Would We Expect Backward drift due to mutation close to the optimum, no positive drift due to crossover, selection too weak to keep positive fluctuations. Main New Result Let µ ≤ n1/8−ǫ for an arbitrarily small constant ǫ > 0. Then with probability 1 − 2−Ω(nǫ/9), the SGA on OneMax does not create individuals with more than (1 + c) n

2 or less than (1 − c) n 2 one-bits, for arbitrarily

small constant c > 0, within the first 2nǫ/10 generations. In particular, it does not reach the optimum then.

(1, . . . , 1) (0, . . . , 0) Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-15
SLIDE 15

6/10 Introduction Proof Ideas End

Towards A Proof

Prior Work Happ/Johannsen/Klein/Neumann (GECCO 08): (1+1) EA with fitness-proportional selection needs exponential time on OneMax Neumann/Oliveto/W. (GECCO 09): same for a variant of SGA without crossover

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-16
SLIDE 16

6/10 Introduction Proof Ideas End

Towards A Proof

Prior Work Happ/Johannsen/Klein/Neumann (GECCO 08): (1+1) EA with fitness-proportional selection needs exponential time on OneMax Neumann/Oliveto/W. (GECCO 09): same for a variant of SGA without crossover Difficulties When Introducing Crossover Lehre’s drift theorem for populations doesn’t allow crossover

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-17
SLIDE 17

6/10 Introduction Proof Ideas End

Towards A Proof

Prior Work Happ/Johannsen/Klein/Neumann (GECCO 08): (1+1) EA with fitness-proportional selection needs exponential time on OneMax Neumann/Oliveto/W. (GECCO 09): same for a variant of SGA without crossover Difficulties When Introducing Crossover Lehre’s drift theorem for populations doesn’t allow crossover Variance of offspring distribution

# flipping bits due to mutation Poisson-distributed → variance O(1)

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-18
SLIDE 18

6/10 Introduction Proof Ideas End

Towards A Proof

Prior Work Happ/Johannsen/Klein/Neumann (GECCO 08): (1+1) EA with fitness-proportional selection needs exponential time on OneMax Neumann/Oliveto/W. (GECCO 09): same for a variant of SGA without crossover Difficulties When Introducing Crossover Lehre’s drift theorem for populations doesn’t allow crossover Variance of offspring distribution

# flipping bits due to mutation Poisson-distributed → variance O(1) # of one-bits created by crossover binomially distributed according to Hamming distance of parents and 1/2 → variance Ω(√n) possible

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-19
SLIDE 19

6/10 Introduction Proof Ideas End

Towards A Proof

Prior Work Happ/Johannsen/Klein/Neumann (GECCO 08): (1+1) EA with fitness-proportional selection needs exponential time on OneMax Neumann/Oliveto/W. (GECCO 09): same for a variant of SGA without crossover Difficulties When Introducing Crossover Lehre’s drift theorem for populations doesn’t allow crossover Variance of offspring distribution

# flipping bits due to mutation Poisson-distributed → variance O(1) # of one-bits created by crossover binomially distributed according to Hamming distance of parents and 1/2 → variance Ω(√n) possible Classical negative (“simplified”) drift theorem needs variance O(1).

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-20
SLIDE 20

7/10 Introduction Proof Ideas End

Negative Drift Theorem With Scaling

Theorem Xt, t ≥ 0, r.vs. for stochastic process over finite S ⊆ R; ∆t(i) := (Xt+1 − Xt | Xt = i) for i ∈ S and t ≥ 0. Suppose: ∃ interval [a, b] and, possibly depending on ℓ := b − a, bound ǫ(ℓ) > 0 and scaling factor r(ℓ) s. t. for all t ≥ 0:

1 E(∆t(i)) ≤ −ǫ(ℓ) for a < i < b, 2 Prob(∆t(i) ≥ j · r(ℓ)) ≤ e−j+1 for i > a and j ∈ N, 3 r(ℓ) ≤ min{ℓ,

  • ǫ(ℓ)ℓ/(1352 log(ℓ/ǫ(ℓ))}.

For the hitting time T ∗ := min{t ≥ 0: Xt ≥ b | X0 ≤ a} it then holds Prob(T ∗ ≤ eǫ(ℓ)ℓ/(1352r 2(ℓ))) = O(e−ǫ(ℓ)ℓ/(1352r 2(ℓ))).

start

a b

target

drift away from target large jumps towards target unlikely Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-21
SLIDE 21

7/10 Introduction Proof Ideas End

Negative Drift Theorem With Scaling

Theorem Xt, t ≥ 0, r.vs. for stochastic process over finite S ⊆ R; ∆t(i) := (Xt+1 − Xt | Xt = i) for i ∈ S and t ≥ 0. Suppose: ∃ interval [a, b] and, possibly depending on ℓ := b − a, bound ǫ(ℓ) > 0 and scaling factor r(ℓ) s. t. for all t ≥ 0:

1 E(∆t(i)) ≤ −ǫ(ℓ) for a < i < b, 2 Prob(∆t(i) ≥ j · r(ℓ)) ≤ e−j+1 for i > a and j ∈ N, 3 r(ℓ) ≤ min{ℓ,

  • ǫ(ℓ)ℓ/(1352 log(ℓ/ǫ(ℓ))}.

For the hitting time T ∗ := min{t ≥ 0: Xt ≥ b | X0 ≤ a} it then holds Prob(T ∗ ≤ eǫ(ℓ)ℓ/(1352r 2(ℓ))) = O(e−ǫ(ℓ)ℓ/(1352r 2(ℓ))). Problem: maybe r(ℓ) = Ω( √ ℓ) (Hajek’s version breaks down then, too.) Solution Find bits that are “converged” within population, i. e., either ones or zeros only. Crossover is irrelevant for these.

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-22
SLIDE 22

8/10 Introduction Proof Ideas End

Diversity and Bandwidth

Diversity s: # bits where both values are present in population Show: diversity collapses, quickly s = O( 4 √n).

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-23
SLIDE 23

8/10 Introduction Proof Ideas End

Diversity and Bandwidth

Diversity s: # bits where both values are present in population Show: diversity collapses, quickly s = O( 4 √n). Proof Idea It: # individuals with 1 in some fixed position at time t

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-24
SLIDE 24

8/10 Introduction Proof Ideas End

Diversity and Bandwidth

Diversity s: # bits where both values are present in population Show: diversity collapses, quickly s = O( 4 √n). Proof Idea It: # individuals with 1 in some fixed position at time t Assume uniform selection. Then:

E(It+1 | It) = It (martingale) But random fluctuations absorbing state 0 or µ.

µ µ 2 Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-25
SLIDE 25

8/10 Introduction Proof Ideas End

Diversity and Bandwidth

Diversity s: # bits where both values are present in population Show: diversity collapses, quickly s = O( 4 √n). Proof Idea It: # individuals with 1 in some fixed position at time t Assume uniform selection. Then:

E(It+1 | It) = It (martingale) But random fluctuations absorbing state 0 or µ.

Pietro Oliveto, Carsten Witt On the Analysis of the SGA µ µ 2

slide-26
SLIDE 26

8/10 Introduction Proof Ideas End

Diversity and Bandwidth

Diversity s: # bits where both values are present in population Show: diversity collapses, quickly s = O( 4 √n). Proof Idea It: # individuals with 1 in some fixed position at time t Assume uniform selection. Then:

E(It+1 | It) = It (martingale) But random fluctuations absorbing state 0 or µ. Not a new idea: similarly Neumann/Sudholt/W. (Swarm Intelligence J., 2009) analyzed “random” bits in ACO

µ µ 2 Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-27
SLIDE 27

8/10 Introduction Proof Ideas End

Diversity and Bandwidth

Diversity s: # bits where both values are present in population Show: diversity collapses, quickly s = O( 4 √n). Proof Idea It: # individuals with 1 in some fixed position at time t Assume uniform selection. Then:

E(It+1 | It) = It (martingale) But random fluctuations absorbing state 0 or µ. Not a new idea: similarly Neumann/Sudholt/W. (Swarm Intelligence J., 2009) analyzed “random” bits in ACO

µ µ 2

Compare fitness-prop. and uniform selection:

Basically no difference for small population bandwidth (difference of best and worst OneMax-value in pop.) Neither a new idea: see J¨ agersk¨ upper/W. (GECCO 05), Zarges (FOGA 09) and own prior work (GECCO 09)

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-28
SLIDE 28

9/10 Introduction Proof Ideas End

Putting It Together

Overall Proof Structure

Small diversity Small bandw. fitness- prop. ≈ uni- form Drift n 2 init. Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-29
SLIDE 29

9/10 Introduction Proof Ideas End

Putting It Together

Overall Proof Structure

Small diversity Small bandw. fitness- prop. ≈ uni- form Drift n 2 init.

Not a loop, but in each step only exponentially small failure prob.

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-30
SLIDE 30

9/10 Introduction Proof Ideas End

Putting It Together

Overall Proof Structure

Small diversity Small bandw. fitness- prop. ≈ uni- form Drift n 2 init.

Not a loop, but in each step only exponentially small failure prob. Potential Function For drift theorem, capture whole population in one value: For X = {x1, . . . , xµ} let g(X) := µ

i=1 eκOneMax(xi) (similar in GECCO 09).

Choose κ = Θ(1/ 4 √n) (to dampen variance due to non-converged bits)

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-31
SLIDE 31

9/10 Introduction Proof Ideas End

Putting It Together

Overall Proof Structure

Small diversity Small bandw. fitness- prop. ≈ uni- form Drift n 2 init.

Not a loop, but in each step only exponentially small failure prob. Potential Function For drift theorem, capture whole population in one value: For X = {x1, . . . , xµ} let g(X) := µ

i=1 eκOneMax(xi) (similar in GECCO 09).

Choose κ = Θ(1/ 4 √n) (to dampen variance due to non-converged bits) Why µ = O(n−1/8)? Have to have a short initial phase to collapse diversity. During this phase, drift analysis doesn’t work.

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-32
SLIDE 32

10/10 Introduction Proof Ideas End

Summary and Outlook

Done Analyzed standard GA on OneMax, including crossover Proved exponential lower bounds Combined different powerful techniques from runtime analysis

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-33
SLIDE 33

10/10 Introduction Proof Ideas End

Summary and Outlook

Done Analyzed standard GA on OneMax, including crossover Proved exponential lower bounds Combined different powerful techniques from runtime analysis To Do Remove assumption on µ Simpler analysis?

Pietro Oliveto, Carsten Witt On the Analysis of the SGA

slide-34
SLIDE 34

10/10 Introduction Proof Ideas End

Summary and Outlook

Done Analyzed standard GA on OneMax, including crossover Proved exponential lower bounds Combined different powerful techniques from runtime analysis To Do Remove assumption on µ Simpler analysis?

Thank you!

Pietro Oliveto, Carsten Witt On the Analysis of the SGA