The impact of genetic drift on the runtime of simple - - PowerPoint PPT Presentation

the impact of genetic drift on the runtime of simple
SMART_READER_LITE
LIVE PREVIEW

The impact of genetic drift on the runtime of simple - - PowerPoint PPT Presentation

Introduction The impact of genetic drift on the runtime of simple estimation-of-distribution algorithms Dirk Sudholt and Carsten Witt University of Sheffield and Technical University of Denmark Dagstuhl, January 2016 1/16 Dirk Sudholt and


slide-1
SLIDE 1

1/16 Introduction

The impact of genetic drift on the runtime of simple estimation-of-distribution algorithms

Dirk Sudholt and Carsten Witt

University of Sheffield and Technical University of Denmark

Dagstuhl, January 2016

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-2
SLIDE 2

2/16 Introduction

Context

We are interested in runtime analysis of evolutionary

  • algorithms. This talk will not be about (1+1)-type EAs.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-3
SLIDE 3

2/16 Introduction

Context

We are interested in runtime analysis of evolutionary

  • algorithms. This talk will not be about (1+1)-type EAs.

Genetic algorithms involving populations and crossover have been modeled in different ways – to ease both theoretical analysis and efficient implementation.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-4
SLIDE 4

2/16 Introduction

Context

We are interested in runtime analysis of evolutionary

  • algorithms. This talk will not be about (1+1)-type EAs.

Genetic algorithms involving populations and crossover have been modeled in different ways – to ease both theoretical analysis and efficient implementation. Estimation-of-distribution algorithms (EDAs) do not use explicit populations and crossover but build a probabilistic model.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-5
SLIDE 5

2/16 Introduction

Context

We are interested in runtime analysis of evolutionary

  • algorithms. This talk will not be about (1+1)-type EAs.

Genetic algorithms involving populations and crossover have been modeled in different ways – to ease both theoretical analysis and efficient implementation. Estimation-of-distribution algorithms (EDAs) do not use explicit populations and crossover but build a probabilistic model. Recently, there has been increased interest in a runtime analysis of simple EDAs (Friedrich et al., 2015).

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-6
SLIDE 6

2/16 Introduction

Context

We are interested in runtime analysis of evolutionary

  • algorithms. This talk will not be about (1+1)-type EAs.

Genetic algorithms involving populations and crossover have been modeled in different ways – to ease both theoretical analysis and efficient implementation. Estimation-of-distribution algorithms (EDAs) do not use explicit populations and crossover but build a probabilistic model. Recently, there has been increased interest in a runtime analysis of simple EDAs (Friedrich et al., 2015). We consider the Compact GA (cGA) by Harik et al. (1999) and variants thereof.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-7
SLIDE 7

2/16 Introduction

Context

We are interested in runtime analysis of evolutionary

  • algorithms. This talk will not be about (1+1)-type EAs.

Genetic algorithms involving populations and crossover have been modeled in different ways – to ease both theoretical analysis and efficient implementation. Estimation-of-distribution algorithms (EDAs) do not use explicit populations and crossover but build a probabilistic model. Recently, there has been increased interest in a runtime analysis of simple EDAs (Friedrich et al., 2015). We consider the Compact GA (cGA) by Harik et al. (1999) and variants thereof. Goal always: find the optimum of a function f : {0, 1}n → R

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-8
SLIDE 8

3/16 Introduction

The Compact GA – cGA

Instead of a population, cGA uses vector of (allele) frequencies. Parameter K (“population size”) determines strength of updates.

t ← 0; p1,t ← p2,t ← · · · ← pn,t ← 1/2; while termination criterion not met do for i ∈ {1, . . . , n} do xi ← 1 with prob. pi,t, xi ← 0 with prob. 1 − pi,t for i ∈ {1, . . . , n} do yi ← 1 with prob. pi,t, yi ← 0 with prob. 1 − pi,t if f (x) < f (y) then swap x and y; for i ∈ {1, . . . , n} do if xi > yi then pi,t+1 ← pi,t + 1/K; if xi < yi then pi,t+1 ← pi,t − 1/K; if xi = yi then pi,t+1 ← pi,t; Restrict pi,t+1 to be within [1/n, 1 − 1/n] t ← t + 1

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-9
SLIDE 9

4/16 Introduction

Related Algorithms

cGA can be understood as a model of a classical genetic algorithm with a population of K individuals, so-called gene-pool recombination and tournament selection.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-10
SLIDE 10

4/16 Introduction

Related Algorithms

cGA can be understood as a model of a classical genetic algorithm with a population of K individuals, so-called gene-pool recombination and tournament selection. Well-known (Hauschild and Pelikan, 2011) that cGA is related to

  • ther EDAs like so-called UMDA and PBIL and also to simple ant

colony optimizers (ACO).

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-11
SLIDE 11

4/16 Introduction

Related Algorithms

cGA can be understood as a model of a classical genetic algorithm with a population of K individuals, so-called gene-pool recombination and tournament selection. Well-known (Hauschild and Pelikan, 2011) that cGA is related to

  • ther EDAs like so-called UMDA and PBIL and also to simple ant

colony optimizers (ACO). E.g., 2-MMASib, a simple ACO, uses the following update rule: if f (x) < f (y) then swap x and y; for i ∈ {1, . . . , n} do if xi = 1 then pi,t+1 ← pi,t(1 − ρ) + ρ; else pi,t+1 ← pi,t(1 − ρ); Also UMDA follows a similar idea.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-12
SLIDE 12

4/16 Introduction

Related Algorithms

cGA can be understood as a model of a classical genetic algorithm with a population of K individuals, so-called gene-pool recombination and tournament selection. Well-known (Hauschild and Pelikan, 2011) that cGA is related to

  • ther EDAs like so-called UMDA and PBIL and also to simple ant

colony optimizers (ACO). E.g., 2-MMASib, a simple ACO, uses the following update rule: if f (x) < f (y) then swap x and y; for i ∈ {1, . . . , n} do if xi = 1 then pi,t+1 ← pi,t(1 − ρ) + ρ; else pi,t+1 ← pi,t(1 − ρ); Also UMDA follows a similar idea. Will study an effect common to cGA, MMAS, UMDA etc.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-13
SLIDE 13

5/16 Introduction

Research Questions

Focus now on cGA. What is interesting from a runtime analysis perspective?

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-14
SLIDE 14

5/16 Introduction

Research Questions

Focus now on cGA. What is interesting from a runtime analysis perspective? How fast is it on benchmark functions? How to set the parameter K? Can it outperform other evolutionary algorithms or is it

  • utperformed by them?

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-15
SLIDE 15

5/16 Introduction

Research Questions

Focus now on cGA. What is interesting from a runtime analysis perspective? How fast is it on benchmark functions? How to set the parameter K? Can it outperform other evolutionary algorithms or is it

  • utperformed by them?

Known results for cGA Droste (2006) derived first runtime results, in particular for classical benchmark functions OneMax and LeadingOnes. Friedrich et al. (2015) show that the cGA due to its fine-grained probabilistic model outperforms mutation-based EAs in noisy optimization.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-16
SLIDE 16

6/16 Introduction

Research Questions

Best runtime bound for cGA on OneMax is O(n1+ǫ). (OneMax(x1, . . . , xn) = x1 + · · · + xn, but can also think of minimizing Hamming distance to some z ∈ {0, 1}n.) Lack of good lower bounds for EDAs in general; nothing better than Ω(n/log n) (following from a general structural theory) known! For comparison, Ω(n log n) is a natural lower bound for simple EAs on non-trivial functions. Simple (1+1) EA optimizes OneMax in expected Θ(n log n) time.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-17
SLIDE 17

6/16 Introduction

Research Questions

Best runtime bound for cGA on OneMax is O(n1+ǫ). (OneMax(x1, . . . , xn) = x1 + · · · + xn, but can also think of minimizing Hamming distance to some z ∈ {0, 1}n.) Lack of good lower bounds for EDAs in general; nothing better than Ω(n/log n) (following from a general structural theory) known! For comparison, Ω(n log n) is a natural lower bound for simple EAs on non-trivial functions. Simple (1+1) EA optimizes OneMax in expected Θ(n log n) time. Open problem What is the true runtime of cGA on OneMax? Can one come up with matching upper and lower bounds? Is less than n log n possible?

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-18
SLIDE 18

7/16 Introduction

Behavior on OneMax

How to set K on OneMax? Trade-off Want small K to learn optimal frequency 1 − 1/n quickly. But: update decision is not based on the outcome of a single bit but the accumulated value. Process can also learn wrong values. Say x = (0, 0, 1, 1, 1) and y = (1, 0, 0, 1, 0). Then update is with respect to y and p1 will be decreased. It can happen that allele frequency “0” (actually 1/n) must be “unlearned”, which leads to large runtimes. Hence, must use large enough K to avoid wrong values.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-19
SLIDE 19

8/16 Introduction

What is the Optimal Choice of K?

Droste proves expected runtime O(K√n) for K = Ω(n1/2+ǫ), which gives the claimed O(n1+ǫ) if K = n1/2+ǫ. Still, optimal K could be smaller than n1/2+ǫ.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-20
SLIDE 20

8/16 Introduction

What is the Optimal Choice of K?

Droste proves expected runtime O(K√n) for K = Ω(n1/2+ǫ), which gives the claimed O(n1+ǫ) if K = n1/2+ǫ. Still, optimal K could be smaller than n1/2+ǫ. We cannot beat runtime Θ(n log n) if too many (nǫ, e. g., √n) frequencies reach their minimum 1/n. (A so-called coupon collector effect is triggered then.) For example, if K = 2 (or any value independent of n) then Ω(n) frequencies will reach minimum and we get Ω(n log n).

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-21
SLIDE 21

9/16 Introduction

The Stochastic Behavior of a Single Bit (1/2)

Look into a single bit (bit 1). Want to analyze ∆t = p1,t − p1,t+1 (change of allele frequencies). Depends on the other bits; their difference Dt := x2 + · · · + xn − (y2 + · · · + yn) is crucial.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-22
SLIDE 22

9/16 Introduction

The Stochastic Behavior of a Single Bit (1/2)

Look into a single bit (bit 1). Want to analyze ∆t = p1,t − p1,t+1 (change of allele frequencies). Depends on the other bits; their difference Dt := x2 + · · · + xn − (y2 + · · · + yn) is crucial. Two cases: Bit 1 not relevant for selection of the updating offspring (|Dt| ≥ 2 or Dt = 1:) ∗ 1 1 1 1 x ∗ 1 1 > y p1 y p1 y p1 y p1 y p1 y p1 y p1 p1 can go up, down or stay the same. The step is called an rw-step (random walk step).

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-23
SLIDE 23

10/16 Introduction

The Stochastic Behavior of a Single Bit (2/2)

Bit 1 is relevant since x and y have almost the same number

  • f ones in the other bits (Dt = 0 or Dt = −1).

∗ 1 1 1 1 x ∗ 1 1 1 1 = y p1 y p1 y p1 y p1 y p1 y p1 y p1 p1 cannot decrease and increases if bit 1 sampled differently in the two offspring. The step is called a b-step (Bernoulli step).

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-24
SLIDE 24

11/16 Introduction

Formalization of Two Cases

1 In rw-steps, ∆t = Ft where

Ft :=

      

+1/K with probability pt(1 − pt) −1/K with probability pt(1 − pt) with the remaining prob.

2 In b-steps, ∆t = Bt where

Bt :=

  • +1/K

with probability 2pt(1 − pt) with the remaining prob.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-25
SLIDE 25

12/16 Introduction

Change of Frequency as Superposition of Two Processes

Whether a step is an rw-step or b-step for bit i is due to events being external to the bit (and independent of it). Let Rt be the event that rw-step occurs (i. e., Dt = 1 or |Dt ≥ 2|). We get the equality

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-26
SLIDE 26

12/16 Introduction

Change of Frequency as Superposition of Two Processes

Whether a step is an rw-step or b-step for bit i is due to events being external to the bit (and independent of it). Let Rt be the event that rw-step occurs (i. e., Dt = 1 or |Dt ≥ 2|). We get the equality ∆t = Ft · Pr(Rt) + Bt · (1 − Pr(Rt)), which we denote as superposition.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-27
SLIDE 27

12/16 Introduction

Change of Frequency as Superposition of Two Processes

Whether a step is an rw-step or b-step for bit i is due to events being external to the bit (and independent of it). Let Rt be the event that rw-step occurs (i. e., Dt = 1 or |Dt ≥ 2|). We get the equality ∆t = Ft · Pr(Rt) + Bt · (1 − Pr(Rt)), which we denote as superposition. Can prove: Pr(Rt) fairly large in the beginning (1 − O(1/√n)).

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-28
SLIDE 28

13/16 Introduction

Analyzing the Random Walk Steps

Random walk on 0, ..., K with absorbing barriers and movements by {−1, 0, 1}. It is fair (−1 as likely as +1) but loops on the state i ∈ {1, . . . , K − 1} with probability 1 − 2i(K−i)

K 2

.

K K/2

1 2 1 4 1 4

1 − 2

K

1 K 1 K

1 − 2

K

1 K 1 K

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-29
SLIDE 29

13/16 Introduction

Analyzing the Random Walk Steps

Random walk on 0, ..., K with absorbing barriers and movements by {−1, 0, 1}. It is fair (−1 as likely as +1) but loops on the state i ∈ {1, . . . , K − 1} with probability 1 − 2i(K−i)

K 2

.

K K/2

1 2 1 4 1 4

1 − 2

K

1 K 1 K

1 − 2

K

1 K 1 K

Starting at K/2, expected time for absorption is at least

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-30
SLIDE 30

13/16 Introduction

Analyzing the Random Walk Steps

Random walk on 0, ..., K with absorbing barriers and movements by {−1, 0, 1}. It is fair (−1 as likely as +1) but loops on the state i ∈ {1, . . . , K − 1} with probability 1 − 2i(K−i)

K 2

.

K K/2

1 2 1 4 1 4

1 − 2

K

1 K 1 K

1 − 2

K

1 K 1 K

Starting at K/2, expected time for absorption is at least Ω(K 2). Upper bound? Any bets?

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-31
SLIDE 31

13/16 Introduction

Analyzing the Random Walk Steps

Random walk on 0, ..., K with absorbing barriers and movements by {−1, 0, 1}. It is fair (−1 as likely as +1) but loops on the state i ∈ {1, . . . , K − 1} with probability 1 − 2i(K−i)

K 2

.

K K/2

1 2 1 4 1 4

1 − 2

K

1 K 1 K

1 − 2

K

1 K 1 K

Starting at K/2, expected time for absorption is at least Ω(K 2). Upper bound? Any bets? O(K 2 log K)?

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-32
SLIDE 32

13/16 Introduction

Analyzing the Random Walk Steps

Random walk on 0, ..., K with absorbing barriers and movements by {−1, 0, 1}. It is fair (−1 as likely as +1) but loops on the state i ∈ {1, . . . , K − 1} with probability 1 − 2i(K−i)

K 2

.

K K/2

1 2 1 4 1 4

1 − 2

K

1 K 1 K

1 − 2

K

1 K 1 K

Starting at K/2, expected time for absorption is at least Ω(K 2). Upper bound? Any bets? O(K 2 log K)? Truth is Θ(K 2).

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-33
SLIDE 33

13/16 Introduction

Analyzing the Random Walk Steps

Random walk on 0, ..., K with absorbing barriers and movements by {−1, 0, 1}. It is fair (−1 as likely as +1) but loops on the state i ∈ {1, . . . , K − 1} with probability 1 − 2i(K−i)

K 2

.

K K/2

1 2 1 4 1 4

1 − 2

K

1 K 1 K

1 − 2

K

1 K 1 K

Starting at K/2, expected time for absorption is at least Ω(K 2). Upper bound? Any bets? O(K 2 log K)? Truth is Θ(K 2). So, K = C√n could be a good choice for large C? O(K√n) = O(Cn) b-steps suffice to reach max. frequency, and that can be less less than C2n, the expected time for absorption by rw-steps.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-34
SLIDE 34

14/16 Introduction

Lack of Concentration

Unfortunately, time for absorption is NOT concentrated around Θ(K 2) with high probability. Instead (by a non-trivial analysis): Pr(T ≤ αK 2) ≥ (1/2 − o(1))

  • 1
  • 2/α −

1 (2/α)3/2

  • 1

√ 2πe−1/α

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-35
SLIDE 35

14/16 Introduction

Lack of Concentration

Unfortunately, time for absorption is NOT concentrated around Θ(K 2) with high probability. Instead (by a non-trivial analysis): Pr(T ≤ αK 2) ≥ (1/2 − o(1))

  • 1
  • 2/α −

1 (2/α)3/2

  • 1

√ 2πe−1/α Consequence: K must be larger than √n, to avoid reaching the minimum frequency at any bit. More precisely, K = Ω(√n log n).

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-36
SLIDE 36

14/16 Introduction

Lack of Concentration

Unfortunately, time for absorption is NOT concentrated around Θ(K 2) with high probability. Instead (by a non-trivial analysis): Pr(T ≤ αK 2) ≥ (1/2 − o(1))

  • 1
  • 2/α −

1 (2/α)3/2

  • 1

√ 2πe−1/α Consequence: K must be larger than √n, to avoid reaching the minimum frequency at any bit. More precisely, K = Ω(√n log n). We get Conjecture The expected optimization time of cGA on OneMax is Ω(n log n + K√n).

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-37
SLIDE 37

15/16 Introduction

Conclusions

Insights Did a runtime analysis of cGA on OneMax. Observed superposition of “genetic drift” and “real selection”. Effect seems to occur in many different EDAs and maybe also in standard GAs. Careful analysis of the stochastic process can give lower runtime bounds. cGA is no more efficient than (1+1) EA on OneMax.

(1+1) EA has the log n factor since there is a slow bit that takes a long time to flip. cGA has the log n factor since there are some too fast random walks drifting in the wrong direction.

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-38
SLIDE 38

16/16 Introduction

Outlook

Questions Are the results known in other (sub)communities? E. g., is the “superposition” known under a different name? Where are the techniques applicable otherwise? E. g., to standard GAs with stochastic selection? Can we get upper bounds on the runtime? Anything beyond OneMax? Is it reasonable to say that we are analyzing “genetic drift”?

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs

slide-39
SLIDE 39

16/16 Introduction

Outlook

Questions Are the results known in other (sub)communities? E. g., is the “superposition” known under a different name? Where are the techniques applicable otherwise? E. g., to standard GAs with stochastic selection? Can we get upper bounds on the runtime? Anything beyond OneMax? Is it reasonable to say that we are analyzing “genetic drift”?

Thank you!

Dirk Sudholt and Carsten Witt Genetic drift and runtime of EDAs