Natural Computing Lecture 9: Evolutionary Strategies Michael - - PowerPoint PPT Presentation

natural computing
SMART_READER_LITE
LIVE PREVIEW

Natural Computing Lecture 9: Evolutionary Strategies Michael - - PowerPoint PPT Presentation

Natural Computing Lecture 9: Evolutionary Strategies Michael Herrmann INFR09038 mherrman@inf.ed.ac.uk 22/10/2010 phone: 0131 6 517177 Informatics Forum 1.42 Evolutionary algorithms genotype mutation/ phenotype (encoding) crossover


slide-1
SLIDE 1

Natural Computing

Michael Herrmann mherrman@inf.ed.ac.uk phone: 0131 6 517177 Informatics Forum 1.42

INFR09038 22/10/2010

Lecture 9: Evolutionary Strategies

slide-2
SLIDE 2

genotype (encoding) mutation/ crossover phenotype (applied to) Genetic algorithm strings of binary or integer numbers e.g. 1-point for either one with pm, pc

  • ptimization or

search of optimal solutions Genetic programming trees (can be represented as strings) like GA plus additional

  • perators

computer programs for a computational problem Evolutionary programming real-valued parameter vector mutation with self-adaptive rates parameters of a computer program with fixed structure Evolution strategy real-valued encoding mutation with self-adaptive rates

  • ptimization or

search of optimal solutions

Evolutionary algorithms

slide-3
SLIDE 3

Characteristics Suggesting the Use of GP

1.Discovering the size and shape of the solution 2.Reusing substructures 3.Discovering a set of useful of substructures 4.Discovering the nature of the hierarchical references among substructures 5.Passing parameters to a substructure 6.Discovering the type of substructures (e.g., subroutines, iterations, loops, recursions, or storage) 7.Discovering the number of arguments possessed by a substructure 8.Maintaining syntactic validity and locality by means of a developmental process 9.Discovering a general solution in the form of a parametrized topology containing free variables

slide-4
SLIDE 4

Fundamental differences between GP and

  • ther approaches to AI and ML
  • 1. Representation: Genetic programming overtly conducts its search

for a solution to the given problem in program space.

  • 2. Role of point-to-point transformations in the search: Genetic

programming does not conduct its search by transforming a single point in the search space into another single point, but instead transforms a set of points into another set of points.

  • 3. Role of hill climbing in the search: Genetic programming does not

rely exclusively on greedy hill climbing to conduct its search, but instead allocates a certain number of trials, in a principled way, to choices that are appear to be inferior at a given stage.

  • 4. Role of determinism in the search: Genetic programming conducts

its search probabilistically.

  • 5. Role of an explicit knowledge base: None (perhaps for

initialisation).

  • 6. Role of formal logic in the search: None (perhaps for editing)
  • 7. Underpinnings of the technique: Biologically inspired.
slide-5
SLIDE 5

Promising GP Application Areas

 Problem areas involving many variables that are interrelated in

highly non-linear ways

 Inter-relationship of variables is not well understood  A good approximate solution is satisfactory

design, control, classification and pattern recognition, data mining, system identification and forecasting

 Discovery of the size and shape of the solution is a major part of

the problem

 Areas where humans find it difficult to write programs

parallel computers, cellular automata, multi-agent strategies / distributed AI, FPGAs

 "black art" problems

synthesis of topology and sizing of analog circuits, synthesis of topology and tuning

  • f controllers, quantum computing circuits, synthesis of designs for antennas

 Areas where you simply have no idea how to program a solution,

but where the objective (fitness measure) is clear

 Problem areas where large computerized databases are

accumulating and computerized techniques are needed to analyze the data

slide-6
SLIDE 6

Open Questions/Research Areas

  • Scaling up to more complex problems and larger

programs

  • Using large function and terminal sets.
  • How well do the evolved programs generalise?
  • How can we evolve nicer programs?
  • size, efficiency, correctness
  • What sort of problems is GP good at / not-so-

good at?

  • Convergence, optimality etc.?
  • Relation to human-based evolutionary processes

(e.g. wikipedia)

slide-7
SLIDE 7

Cross-Domain Features

 Native representations are sufficient when working

with genetic programming

 Genetic programming breeds “simulatability” (Koza)  Genetic programming starts small and controls bloat  Genetic programming frequently exploits a simulator’s

built-in assumption of reasonableness

 Genetic programming engineers around existing

patents and creates novel designs more frequently than it creates infringing solutions

slide-8
SLIDE 8

Overview

1. Introduction: History 2. The genetic code 3. The canonical genetic algorithm 4. Examples & Variants of GA 5. The schema theorem 6. The building block hypothesis 7. Hybrid algorithms 8. Multiobjective Optimization 9. Genetic Programming

  • 10. Evolutionary strategies
  • 11. Differential evolution
slide-9
SLIDE 9

Evolution strategies

Natural problem-dependent representation for search and optimisation (without “genetic” encoding) Individuals are vectors of real numbers which describe current solutions of the problem Recombination by exchange or averaging of components (but is sometimes not used) Mutation in continuous steps with adaptation of the mutation rate to account for different scales and correlations of the components Selection by fitness from various parent sets Elitism, islands, adaptation of parameters

1964: Ingo Rechenberg; Hans-Paul Schwefel

slide-10
SLIDE 10

Multidimensional Mutations in ES

A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing. Evolution Strategies Uncorrelated mutations Uncorrelated mutation (scaled) Correlated mutations

Generation of offspring: y = x + N (0,C') x stands for the vector ( x1,…,xn ) describing a parent C' is the covariance matrix C after mutation of the σ values where C=diag(σ, ..., σ) for uncorreleted mutations, C=diag(σ1, ..., σn) for scaled axes or C=(Cij) for correlated mutations

slide-11
SLIDE 11

Multidimensional Mutations in ES

Off-spring vectors: xi:=m+zi, zi~N(0,C) Select λ best, i.e. (1,λ) - ES Correlations among successful offspring: Z:=1/λ Σ zi zi

T

Update correlations: C:=(1-ε) C+ε Z New state vector: m:=m+1/λ Σ zi Smoothes fitness fluctuations; or: m=best

slide-12
SLIDE 12

Evolution strategies

(μ , λ): selection of a set of λ children (μ + λ): selection from a set of μ parents and λ children (μ',λ'(μ,λ)γ): isolate the children for γ generations where each time λ children are created (total population is λλ'). Then the best subpopulation is selected and becomes parents (e.g. λ=μ') for the new cycle of γ generations Analogous: (μ'+λ'(μ, λ)γ), (μ'+λ'(μ+λ)γ), (μ',λ'(μ+λ)γ) Heuristic 1/5 rule: If less than 1/5 of the children are better than their parents then decrease size of mutations

slide-13
SLIDE 13

http://www.bionik.tu-berlin.de/intseit2/xs2mulmo.html

 Hills are not independently distributed (hills of hills)  Find a local maximum as a start state  Generate 3 offspring populations (founder populations) that

then evolve in isolation

 Local hill-climbing (if convergent: increase diversity of

  • ffspring populations)

 Select only highest

population

 Walking process from

peak to peak within an “ordered hill scenery” named Meta-Evolution

 Takes the role of

crossover in GA

Nested Evolution Strategy

slide-14
SLIDE 14

ES: Conclusion

 A class of metaheuristic search algorithms  Adaptive parameters important  Relations to Gaussian adaptation  Advanced ESs compare favourably to other

metaheuristic algorithms (see www.lri.fr/~Hansen)

 Diversity of the population of solutions needs to be

specifically considered

 See also www.scholarpedia.org/article/Evolution_strategies

slide-15
SLIDE 15

Rainer Storn & Kenneth Price (1997) Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization 11: 341–359,

slide-16
SLIDE 16
slide-17
SLIDE 17

Differential Evolution

slide-18
SLIDE 18
slide-19
SLIDE 19

DE: Details

 Properties

 Simple, very fast  Reasonably good results  Diversity increases in flat regions

(divergence property)

 Parameters

 NP=5D

(4 … 10D)

 CR=0.1

(0 … 1.0)

 F=0.5

(0.4 …. 1.0)

 a proof exist that effectiveness requires

NP CR F F 2 1− = ≥

crit

slide-20
SLIDE 20

Search in Differential Evolution

Rainer Storn (2008) Differential Evolution Research – Trends and Open Questions. Chapter 1 of Uday K. Chakraborty: Advances in Differential Evolution

slide-21
SLIDE 21

Objective function used here:

slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24

DE with Crossover

slide-25
SLIDE 25

Invariant representations

 Crossover depends

  • n the coordinate

directions and is thus not rotationally invariant

 Using randomly

rotated coordinate systems the search becomes isotropic

slide-26
SLIDE 26
slide-27
SLIDE 27

DE with Jitter

choose for each vector i and for each coordinate j a different random increment, e.g.:

slide-28
SLIDE 28
slide-29
SLIDE 29

 Mutability and threshold parameters can also be evolved

for each individual (as the step sizes in ES), i.e. dimension becomes D+2.

 Scheme for denoting DE variants:

 Also a number of self-adapting variants exist cf. [Storn, 08]

DE: Variants

e.g. best/2

slide-30
SLIDE 30

Meta-Heuristic Search

 µετα “beyond”, ευρισκειν "to find“  applied mainly to combinatorial

  • ptimization

 The user has to modify the

algorithm to a greater or lesser extend in order to adapt it to specific problem

 These algorithms seem to defy

the no-free lunch (NFL) theorem due to the combination of

− biased choice of problems − user-generated modifications

 Can often be outperformed by

a problem-dependent heuristic

slide-31
SLIDE 31

The General Scheme

1. Use populations of solutions/trials/individuals 2. Transfer information in the population from the best individuals to others by selection+crossover/attraction 3. Maintain diversity by adding noise/mutations/ intrinsic dynamics/amplifying differences 4. Avoid local minima (leapfrog/crossover/more noise/ subpopulations/border of instability/checking success, random insertions) 4. Whenever possible, use building blocks/partial solutions/royal road functions 5. Store good solutions in memory as best-so-far/iteration best/individual best/elite/pheromones 6. Use domain knowledge and intuition for encoding, initialization, termination, choice of the algorithm 7. Tweak the parameters, develop your own variants

slide-32
SLIDE 32

“Banal Metaheuristic”

*** in three easy steps ***

  • 1. Call the user-provided state generator.
  • 2. Print the resulting state.
  • 3. Stop.

Given any two distinct metaheuristics M and N, and almost any goal function f, it is usually possible to write a set of auxiliary procedures that will make M find the optimum much more efficient than N, by many orders of magnitude; or vice-versa. In fact, since the auxiliary procedures are usually unrestricted, one can submit the basic step of metaheuristic M as the generator or mutator for N.

en.wikipedia.org/wiki/Metaheuristic

slide-33
SLIDE 33

Contra

 No-free-lunch theorem implies that there must be some

implicit assumptions that single out “good” problems (one such

assumption is the correlation between goal function values at nearby candidate solutions)

 If these assumptions were made explicit more specific

algorithms could be designed

 Random search often seems to be the essential component  The quality of a ME algorithm is not well-defined because

user-provided domain knowledge enters

 There are many “classical” problems which are fully

understood and where ME algorithms perform comparatively

  • poor. (LS is usually not state of the art)

 Dilettantism: A few hours of reading, thinking and

programming can easily save months of computer time used up by ME

en.wikipedia.org/wiki/Metaheuristic

slide-34
SLIDE 34

Pro

 If you know a better solution then why using ME?

But if not, then why not?

 Its not just random search  There are a number of applications where ME are performing

reasonably well

 Theoretical expertise, problem analysis, modeling and

implementation are cost factors in real-world problems

 There are domains where modeling is questionable,

but the combination of existing solutions is possible (minority games, e.g. esthetic design, financial markets)

 Nature is an important source of inspiration  It may help to understand decision making in nature and

society

slide-35
SLIDE 35

Ecological niches for MH algorithms

PSO Mini Tutorial on Particle Swarm Optimisation (2004) Maurice.Clerc@WriteMe.com

slide-36
SLIDE 36

Some of the dimensions of the problems space

slide-37
SLIDE 37

Some References

 Michalewicz, Z. (1996). Springer-Verlag, ISBN 3-540-

60676-9, Chapter 8 Evolutionary Programming: Programs that write programs

 Fogel, D. B. (1994) IEEE Transactions on Neural

Networks 5:1, 3-14.

 Fogel, D.B. (1998) Evolutionary Computation: The

Fossil Record, IEEE Press, ISBN 0-7803-3481-7, 3-14

 Michalewicz, Z. and Fogel, D. (2000). How to Solve It:

Modern Heuristics. Springer, ISBN 3-540-66061-5