CSE CSE 460 460 Evolutionary Evolutionary Methods Methods
In this section we will look at evolutionary methods
- Genetic Algorithms
- Evolution Strategies
- Genetic Programming
- Constraints in Evolutionary Methods
CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In - - PowerPoint PPT Presentation
CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at evolutionary methods Genetic Algorithms Evolution Strategies Genetic Programming Constraints in Evolutionary Methods References
http://www.cs.bham.ac.uk/Mirrors/ftp.de.uu.net/EC/clife/
ftp://alife.santafe.edu/pub/USER-AREA/EC/GA/papers/over93.ps.gz ftp://alife.santafe.edu/pub/USER-AREA/EC/GA/papers/over93-2.ps.gz
“How to solve it: Modern Heuristics” by Z. Michalewicz and D.B. Fogel Springer-Verlag, 1999 “Genetic Algorithms” by C. Reeves and J.E. Rowe, Kluwer, 2003.
J.H. Holland. “Adaption in Natural and Artificial Systems”.
D.B. Fogel (Ed.). “Evolutionary Computation: The Fossil Record”. IEEE Press, Piscataway/NJ, 1998.
Optimization Methods single candidate multiple candidates (population) deterministic (Tabu) Stochastic (SA) stochastic+competition (Evolutionary Algorithms)
Basic Idea: Average population fitness increases due to “Survival of the Fittest”
Example: Color Matching Task: find the color code for a given color. Candidates: represented as 8-bit RGB color ci=(Red, Green, Blue) genotype with integer numbers Red, Green, Blue such that 0 <= Red+Green+Blue <= 255. Evaluation: display the corresponding RGB color phenotype Selection: pick the n candidates of current candidates that are perceived as the colors most similar to the target color. Generation: slighlty vary the values in the selected candidates
Fitness can be evaluated in various ways
e.g. evolution of neural networks
another (also evolving) agent who counters the first agent’s actions. In these cases we can evolve two inter-dependent populations of agents together and put them in direct competition. Example Design an optimum “call rate plan” for a mobile phone provider Customer reactions depend on available competitors’ call plans. Solution: Evolve call plans for all companies simultaneously. (Develop a plan based on projected opponents’ strategies).
Mutation on green add random variable to Green channel
Representation binary chromosomes? unsuitable!
list of cities Initial Population generate random permutations of a list of all cities a b c d e f g h
1 2 3 4 5 6 7 8 1 2 3 4 7 6 5 8
equivalent 2-exchange
a d c b f e a+c<e+f and b+d<e+f “an uncrossed path is always shorter than a crossed path”
4(xi) − 2Πi =1 n cos 2(xi) i =1 n
2 i=1 n
Representation: Array of floats Offsprings: Add (Gaussian) random variable to each component (standard deviation σ, mean m) x1 x2 x3 x4 ... xn
− (x− m)2 /(2 σ 2 )
If m=0, the offsprings will (statistically) be similar to the parent provided the standard deviation is not too high. High standard deviation results in “random search”. The Gaussian distribution will (theoretically) explore every value of the search space (after infinite time). Fitness: Value of objective - penalties (if applicable)
Representation Representation Evaluation Evaluation Mutation Crossover Variation Variation
Selection Fitness Initialization Population Size
Is there a set of problem- independent design decisions that performs above average? No! Statistical (on average) all non-resampling search procedures perform identical to perfect random sampling. This is the main theorem of D.H. Wolpert and W.G. Macready in “No Free Lunch Theorems for Optimization”. IEEE Transactions on Evolutionary Computation, Vol. 1, No. 1, 1997. pp. 67-82 The same holds for the choice of representation. For a given representation, we can always find variation operators etc. that mimic any other choice. Positive consequence: we are free to choose any conceptually reasonable representation .
Fixed Fixed length length symbol symbol vector vector
suitable for
n1 n2 n3 n4 ... nm means: take n1 units of item 1, n2 units of item 2, ... a1 a2 a3 a4 ... an means: approximation is
i=1 n
Permuations Permuations
suitable for
c1 c2 c3 c4 ... cn means: first visit city c1, then c2, then ... t1 t2 t3 t4 ... tn means: first perform task t1, then t2, then ...
State State Automata Automata
1 a 1 is the transition table of the Mealy machine b/1 a/0 c/1 a/1 1 b 1 3 1 c 1 2 2 a 1 3 3 b 2 b/0
Symbolic Symbolic Expressions Expressions (tree (tree structures) structures)
sin
cos
Examples:
On On fixed-length fixed-length symbol symbol vectors vectors
flipping (for binary strings) Gaussian perturbation (real valued strings) Adding/subtracting a random integer (integer valued strings) changing an element (for fixed alphabet choice)
n-point crossover (cut at n-positions and shuffle substrings) arithmetic means
2-point crossover
On On fixed-length fixed-length symbol symbol vectors vectors
flipping (for binary strings) Gaussian perturbation (real valued strings) Adding/subtracting a random integer (integer valued strings) changing an element (for fixed alphabet choice)
n-point crossover (cut at n-positions and shuffle substrings) arithmetic means arithmetic crossover x1 x2 x3 x4 ... xn y1 y2 y3 y4 ... yn z1 z2 z3 z4 ... zn
On On permuations permuations
swap two elements reverse order between two cutpoint (e.g. 2-exchange) analogues to k-exchange (k-opt)
simple cut&splice generates duplicates Method 1
Method 2
Method 1 puts emphasis on maintaining subsequences.
[1 3 5 6 2 4 7 8] & [3 6 2 1 5 4 8 7] || [5 6 2 3 1 4 8 7] [1 3 5 6 2 4 7 8] & [3 6 2 1 5 4 8 7] || [1 6 8 3 2 5 4 7]
On On state state machines machines
add a state (with new random transitions) delete a state change an output symbol for transition change a follow-up state for transition (the above requires an exhaustive transition table)
normal crossover hardly useful democratic majority vote: take a n parent machines. for each state input pair (s,x) determine the follw-up state s’ and output x’ by majority vote among the parents in the offspring replace (s,x) with (s’,x’)
On On symbolic symbolic expressions expressions
perturb change a constant truncate replace a subtree by a leaf (constant) grow insert random subtree
change an operator (in inner node) swap swap two subtrees
swap subtrees between different trees (only useful for very specific problems, however standard in genetic programming)
sin
cos
sin
tan
sin
tan
generate λ offsprings from µ parents be variation (crossover, mutation). Among the λ offsprings chose the µ best individuals as the new population.
generate λ offsprings from µ parents be variation (crossover, mutation). Among the µ parents and λ offsprings chose the µ best individuals.
Selection stochastic deterministic (µ,λ) (µ+λ) proportional tournament iterated ranked
evaluate the fitness F(i) for each individual i pick n offspring, pick individual n with probability
select n individuals at random. Evaluate, keep fittest individual. repeat until sufficiently many individuals selected for next generation.
select n individuals at random. Rank each individual in the original pop against these. Keep k best. Selection stochastic deterministic (µ,λ) (µ+λ) proportional tournament iterated ranked
i
good candidates often have a common structure (called a schema). Consider e.g. the SAT problem All solutions will have x2=x3=0, i.e. they adhere to the schema (*, 0, 0, *, *, ...)
For proportional selection an adequate fitness function it is easy to show that the number of good (bad) schemata in the population increases (decreases) exponentially:
Let m(S,t) be the proportion of individuals adhering to schema S in generation t. Assume that the fitness f (S) is dominated by (depends only on) the schema. Since the proabability to be picked is proportional, we have m(S,t +1) = m(S,t)⋅ n ⋅ f (S) f (i)
i
Assuming (as a simplification) that f (S) is above the average by a constant amount f (S) = (1+ c) f (i)
i
n , 0 < c Then m(S,t) = (1+ c)t ⋅ m(s,0)
In reality things are not as simple, the theory of “building blocks” is much more complex.
EA approaches to TSP are easily outperformed (in runtime and quality) by good heuristics.
“Hybridization” with local search is a possible remedy. The population now consists only of local optima! Disadvantage: expensive computation Successful for TSP (n=500), but still not competitive enough.
°learning interacts with evolution, i.e. changes the genetic information same effect when combining local search with EA => repairing is essentially a local search method °5% rule ?!?!?!? (experimentally observed, highly questionable) “best result when repairing and replacing 5% of the infeasible individuals” °can get stuck in local optima
°changes the fitness landscape without changing the population °often converges more slowly than Lamarckian Evolution
Baldwin effect for local optimizer
n-th incest law prohibit crossover (“mating”) with ancestors up to n generations backwards note the close relation to tabu lists in tabu search in EA (like in nature) incest laws serve to increase and guarantee the diversity of the population’s gene pool. In TSP experiments this proved beneficial and improved the performance of TSP EAs (at least one record solution for E-96). However, it is important to keep in mind that above a certain mutation rate incest laws appear to decrease performance! 20 .01 30 40 50 60 .02 .03 .04 .05 .06 .07 .08 .09 .10 1-th law 2-nd law 4th law 0-th law
ES do not only evolve the chromosomes, but also the evolution parameters. x1 x2 x3 x4 ... xn
Chromosomes are real-valued. Instead of a simple a simple gene we use a gene that is augmented with so-called strategy parameters si x1 s1 x2 s2 ... xn sn
Chromosomes Strategy parameters
xi ⇒ xi + N(si) where N( σ) is a Gaussian random variable with mean 0 and standard deviation σ si ⇒ α ⋅si if random[0,1) > 0.5 α
−1 ⋅si
if random[0,1) < 0.5
Discrete (= Exchange of chromosomes and strategy parameters) Intermediate (= Interpolation)
v x & ′ v x ⇒ v y with yj = xj if random[0,1) > 0.5 ′ x
j
if random[0,1) < 0.5 v x & ′ v x ⇒ v y with yj = 12 ⋅ xj + ′ x
j
The idea to use trees (operator trees) as representations and to use decoders can be combined into a new form of EA called “genetic programming” Idea: Evolve a program that solves a class of problems. Program representation: Usually based on Lisp-like languages, since operator trees can be used. Fitness Evaluation: By simulation (running the program) J.R. Koza. “Genetic Programming: On the Programming of Computers by Meand of Natural Selection”. MIT Press, Cambridge/MA, 1992.
essentially essentially as as in in variation variation for for symbolic symbolic expressions expressions (with (with the the same same complications complications here here regarding regarding syntactical syntactical correctness correctness & & types) types)
perturb change a constant truncate replace a subtree by a leaf (constant) grow insert random subtree
change an operator (in inner node) swap swap two subtrees
swap subtrees between different trees (only useful for very specific problems, however standard in genetic programming)
AND < X 5.7 X 2.5 > OR > X X 2.5 > OR < X 5.7 X 2.7 >
evolving an n-bit parity function is one of the earliest examples in the GP-literature (and one of the most over-used). 3-bit 3-bit even even parity parity (lambda (x y z) (or (and X Y (not Z)) (and X (not Y) Z) (and (not X) Y Z) )) (and (or (or X (nor y z)) z) (and (nand (nor (nor x z) (and (and (and (and y y y) y) y)) y)) (nand (or (and x y) z) x)) (or (nand (and x z) (or (nor x (or z x) y)) (nand (nand y (nand x y)) z))))
Typical Programmer Solution Reported GP Solution (population size 4000, 5 generations) Note
Treatment of Constraint Problems:
smaller search area bigger jumps
larger area smooth path can feasible points efficiently be generated?
feasible only infeasible, too
°Basic fitness function for feasible individuals? °Comparison of infeasible individuals? (what is “less infeasible?”) °Relation between feasible and infeasible individuals?
°Eliminate infeasible individuals? °Repair infeasible individuals? °Replacement of infeasible individuals with repaired version vs. Evaluation of repaired version but keep infeasible individual in the population °Concentrate on critical boundary (borders between F and U)?
°Start with fully feasible population?
°Maintain feasibility with variation operators?
°Use decoders? (transforms structure of search space)
Death Penalty = Infinite Penalty (remove all infeasible individuals in every step)
As usual we can employ a penalty formulation: eval(x) := eval-f(i)+penalty(i) However, determining the penalty functions is often difficult. Consider the task of finding an obstacle free path. straight path °shorter °cuts only across one block °cuts far into the block bend path °longer °cuts across two blocks °cuts close to block corners ⇒ unclear which is “less infeasible” i.e. closer to good feasible solution
infeasible path °only one bend °shorter feasible path °too many bends °unnecessarily long ⇒ unclear which is closer to good feasible solution Crossover between feasible and infeasible would generate a reasonable solution. Indication not to remove infeasible individuals?
Penalties static dynamic fixed levels of penalties degree of violation annealing penalties
i=1 n
selective pressure of constraints slowly increases with runtime adaptive penalties segregated penalties Death Penalty
Instead of making the penalties only runtime-dependent, learn from the search! If the all of the best individuals in the last k iterations were feasible decrease the penalty If all of them were infeasible increase the penalty Otherwise leave the penalty unchanged (Compare Constraint Simulated Annealing!)
eval(v x ) = f (v x ) + λ(t) fi (x)
i=1 n
λ(t +1) = (1/α)⋅ λ(t) if bi feasible for t − k +1≤ i ≤ t β ⋅ λ(t) if bi infeasible for t − k +1≤ i ≤ t λ(t) otherwise bi denotes the best individual at generation i
Too high penalties Infeasible region excluded from search. Almost similar to death penalty Too low penalties Ineffective, search may never produce a feasible solution Segregated penalties Use two different penalties p1 and p2. p1 is intentionally too small, p2 is intentionally too high After variation parents and offsprings are ranked into two different rankings. (1) according to eval(x*)=f(x*)+p1(x*) (2) according to eval(x*)=f(x*)+p2(x*) Interleaved selection selects n survivors strictly alternating from these lists.
Behavioural memory methods work in two distinct phases applying death penalty (1) satisfy the constraints °proceeds in k sub-phases the constraints are ordered from c1...ck °in phase j only the constraints from c1...cj are considered (2) optimize for objective function procedure optimize-behavioural begin j := 1; initialize pobulation P; while (j<=m) do begin do evolve P according using penalty for cj as objective (disregarding the original objective function) ; until (a threshold percentage of individuals in P satisfies c1...cj); remove all individuals that do not satisfy c1...cj; end; evolve P according to original objective function using death penalty for c1...ck; end.
Conditions
NLP under linear constraints °initial solution found by search (single solution start) °linear constraints define convex search space: arithmetic cross-over etc.
x1 x2 M xn & y1 y2 M yn ⇒ a⋅ x1 + (1− a)⋅ y1 a⋅ x2 + (1− a)⋅ y2 a ⋅x3 + (1− a)⋅ y3 a⋅ x4 + (1− a)⋅ y4 = a ⋅ v x + (1− a)⋅ v y for 0 < a < 1
Decoders contain a blueprint (construction instructions) for a phenotype Advantage: Often it is easier to maintain feasibility for a decoder than for an individual Disadvantage: Function of variation ops often difficult to understand Examples: °Ordinal representation in TSP can be regarded as a decoder °0-1 Knapsack Problem for n items use a binary list with n elements [1 1 0 1 1 1 0] interpret a one as “take item i if it still fits” (instead of just “take item i”). Effect any genotype is feasible, any variation operator will maintain feasibility Effect of and / or / xor cross-over etc. difficult to understand
In many problems the optimum is found on the boundary between feasible and infeasible regions Consider resource constraints: resources should be exploited until we run out of some resource, i.e. we reach the boundary of the feasible region (cf. Simplex algorithm) Strategic Oscillation tries to concentrate the search on this “critical boundary” by oscillating between feasible and infeasible points close to it. Another option is to use single sided-oscillation. Keeping the search in the boundary region use adaptive (complex) penalties Keeping it on the boundary initialize with boundary points use boundary-closed variation ops Strategic Oscillation
In many problems the optimum is found on the boundary between feasible and infeasible regions Consider resource constraints: resources should be exploited until we run out of some resource, i.e. we reach the boundary of the feasible region (cf. Simplex algorithm) Strategic Oscillation tries to concentrate the search on this “critical boundary” by oscillating between feasible and infeasible points close to it. Another option is to use single sided-oscillation. Keeping the search in the boundary region use adaptive (complex) penalties Keeping it on the boundary initialize with boundary points use boundary-closed variation ops single-sided Oscillation
In many problems the optimum is found on the boundary between feasible and infeasible regions Consider resource constraints: resources should be exploited until we run out of some resource, i.e. we reach the boundary of the feasible region (cf. Simplex algorithm) Strategic Oscillation tries to concentrate the search on this “critical boundary” by oscillating between feasible and infeasible points close to it. Another option is to use single sided-oscillation. Keeping the search in the boundary region use adaptive (complex) penalties Keeping it on the boundary initialize with boundary points use boundary-closed variation ops Boundary Search
An interesting case of constraint handling in EA is GENOCOP-III “GEnetic algorithm for Numerical Optimization of COnstrained Problems”
GENOCOP: A Genetic Algorithm for Numerical Optimization Problems with Linear Constraints
repair closed variations variation
Arithmetic Interpolation is also used for the repair repair of
search points points to to reference reference points points
x1 x2 M xn & y1 y2 M yn ⇒ a⋅ x1 + (1− a)⋅ y1 a⋅ x2 + (1− a)⋅ y2 a ⋅x3 + (1− a)⋅ y3 a⋅ x4 + (1− a)⋅ y4 = a ⋅ v x + (1− a)⋅ v y for 0 < a < 1
For any infeasible search point S begin randomly select a (fully feasible) reference point R; do perform random interpolations Z=a*S+(1-a)*R; until a feasible Z is found; let eval(S) := eval(Z); if eval(Z) is better than eval(R) then replace R with Z; if (random()>p) replace S with Z; end
x1 M xn ⇒ x1 M xj −1 ′ x
j
xj +1 M xn for randomly chosen j and min j ≤ ′ x
j ≤ max j
Since the feasible space is convex min(j) and max(j) must exist Additionally, Genocop assumes that they can efficiently be computed! Genocop startsfrom a single solution (!) This operator is used to produce a population.
x1 M xn ⇒ x1 M xj −1 ′ x
j
xj +1 M xn for randomly chosen j and xj = ′ x
j ± Δ(t, maxj − x j)
The sign of Δ (+/-) is chosen randomly. Δ returns a value in [0...y] such that the probability of Δ(t,y) being close to 0 increases as t increases (i.e. with increasing run-time).
Δ(t,y) = y ⋅random[0,1]⋅ 1− t tmax
b
x1 M xn & y1 M yn ⇒ x1 M xk a⋅ xk+1 +(1− a)⋅yk+1 M a⋅ xn +(1− a)⋅yn for randomly chosen j
Theorem: If the search space is convex, then there is an a such that the arithmetic cut&splice of two feasible points is feasible. In contrast to arithmetic cut & splice, simple cut&splice is not be closed under linear constraints.
For two individuals v x
1,
v x
2 with eval(v
x
2) better than eval(v
x
1)
v x
1 & v
x
2 ⇒ x3
= v x
1 + random(0,1)⋅(v
x
2 − v
x
1)
This directs search heuristically in the direction of good solutions. If x3 is not feasible the same parents are re-tried with a different random value. (up to a fixed maximum number of attempts). Experimentally, this operator has proven very beneficial.