Machine Learning: Algorithms and Applications Floriano Zini Free - - PDF document

machine learning algorithms and applications
SMART_READER_LITE
LIVE PREVIEW

Machine Learning: Algorithms and Applications Floriano Zini Free - - PDF document

26/03/12 Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 5: 26 th March 2012 Evolutionary computing These slides are mainly taken


slide-1
SLIDE 1

26/03/12 ¡ 1 ¡

Machine Learning: Algorithms and Applications

Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 5: 26th March 2012

Evolutionary computing

These slides are mainly taken from A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing

slide-2
SLIDE 2

26/03/12 ¡ 2 ¡

Genetic Algorithms

(continued)

Population Models

› SGA uses a Generational model:

› each individual survives for exactly one generation › the entire set of parents is replaced by the offspring

› At the other end of the scale are Steady-State models:

› one offspring is generated per generation › one member of population replaced

› Generation Gap

› the proportion of the population replaced › makes a parameterized transition between generational and

steady-state Gas

› gg = 1.0 for SGA, gg = 1/pop_size for SSGA

› The name SSGA is often used for any GA with a generation

gap < 1

slide-3
SLIDE 3

26/03/12 ¡ 3 ¡

Fitness Based Competition

› Selection can occur in two places:

› Selection from current generation to take part in mating

(parent selection)

› Selection from parents + offspring to go into next

generation (survivor selection)

› Selection operators work on whole individual

› i.e. they are representation-independent !

Fitness-Proportionate Selection

› Problems include

› One highly fit member can rapidly take over if rest of

population is much less fit: premature Convergence

› At end of runs when fitnesses are similar, loss of

selection pressure

› Highly susceptible to function transposition

(see next slide)

› Scaling can fix the last two problems

› Windowing:

› where β is worst fitness in this generation

› Sigma Scaling:

› where c is a constant, usually 2.0

f '(i) = f (i)!!

! f (i) = max( f (i)"( f "c#! f ),0.0)

slide-4
SLIDE 4

26/03/12 ¡ 4 ¡

Function transposition for FPS Rank-based Selection

› Attempt to remove problems of FPS by basing

selection probabilities on relative rather than absolute fitness

› Rank population according to fitness and then

base selection probabilities on rank (fittest has rank µ and worst rank 1)

› This imposes a sorting overhead on the

algorithm, but this is usually negligible compared to the fitness evaluation time

slide-5
SLIDE 5

26/03/12 ¡ 5 ¡

Linear Ranking

› Parameterised by factor s: 1.0 < s ≤ 2.0

› measures advantage of best individual › In SGA this is the number of children allotted to it

› Simple 3 member example

P

lin!rank(i) = 2!s

µ + 2(i!1)(s!1) µ(µ !1)

Tournament Selection

› All selection methods above rely on global

population statistics

› Could be a bottleneck esp. on parallel machines › Relies on presence of external fitness function which

might not exist: e.g. evolving game players

› Idea for a procedure using only local fitness

information:

› Pick k members at random then select the best of these › Repeat to select more individuals

slide-6
SLIDE 6

26/03/12 ¡ 6 ¡

Tournament Selection

› Probability of selecting i will depend on:

› Rank of i › Size of sample k

› higher k increases selection pressure because the

probability of above-average fitness individuals increases

› Whether fittest contestant always wins or it is

selected with probability p

› p<1 à lower selection pressure

› Whether contestants are picked with

replacement

› picking without replacement increases selection

pressure: the k-1 least-fit individuals cannot be selected if p=1

Survivor Selection

› Most of selection methods above are used for

parent selection

› Survivor selection can be divided into two

approaches:

› Age-Based Selection

› In SGA the population is fully replaced ad each generation › In SSGA can implement as “delete-random” (not

recommended) or as first-in-first-out (a.k.a. delete-oldest)

› Fitness-Based Selection

› Using one of the methods above

slide-7
SLIDE 7

26/03/12 ¡ 7 ¡ Two Special Cases of fitness-based survivor selection

› Replace-worst

› The worst (in term of fitness) individuals are replaced and

each generation by the offspring

› Rapid takeover: use with large populations or “no

duplicates” policy

› Elitism

› Always keep at least one copy of the fittest solution so far › Widely used in both population models (SGA, SSGA)

SGA technical summary tableau

Representation Binary Strings Recombination N-point or uniform crossover Mutation Bitwise bit-flipping with fixed probability Parent selection Fitness-Proportionate Survivor selection All children replace parents

slide-8
SLIDE 8

26/03/12 ¡ 8 ¡

Genetic Programming GP quick overview

› Developed: USA in the 1990’s › Early names: J. Koza › Typically applied to:

› machine learning tasks (prediction, classification…)

› Attributed features:

› needs huge populations (thousands) › slow

› Special:

› non-linear chromosomes: trees, graphs › mutation possible but not necessary (disputed!)

slide-9
SLIDE 9

26/03/12 ¡ 9 ¡

GP technical summary tableau

Representation Tree structures Recombination Exchange of subtrees Mutation Random change in trees Parent selection Fitness proportional Survivor selection Generational replacement

Introductory example: credit scoring

› Bank wants to distinguish good from bad loan

applicants

› Model needed that matches historical data ID No of children Salary Marital status OK? ID-1 2 45000 Married ID-2 30000 Single 1 ID-3 1 40000 Married 1 ID-4 2 60000 Divorced 1 …. …. …. …. …. ID-10000 2 50000 Married 1

slide-10
SLIDE 10

26/03/12 ¡ 10 ¡

Introductory example: credit scoring

› A possible model:

› IF (NOC = 2) AND (S > 80000) THEN good ELSE bad

› In general:

› IF formula THEN good ELSE bad

› Only unknown is the right formula, hence

› Our search space (phenotypes) is the set of formulas

› Natural fitness of a formula: percentage of well

classified cases of the model it stands for

› Natural representation of formulas (genotypes)

is: parse trees

Introductory example: credit scoring

IF (NOC = 2) AND (S > 80000) THEN good ELSE bad can be represented by the following parse tree

AND S 2 NOC 80000 > =

slide-11
SLIDE 11

26/03/12 ¡ 11 ¡

Tree based representation

› Trees are a universal form, e.g. consider › Arithmetic formula: › Logical formula: › Program:

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + − + + ⋅ 1 5 ) 3 ( 2 y x π

(x ∧ true) → (( x ∨ y ) ∨ (z ↔ (x ∧ y))) i =1; while (i < 20) { i = i +1 }

Tree based representation

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + − + + ⋅ 1 5 ) 3 ( 2 y x π

slide-12
SLIDE 12

26/03/12 ¡ 12 ¡

Tree based representation

(x ∧ true) → (( x ∨ y ) ∨ (z ↔ (x ∧ y)))

Tree based representation

i = 1; while (i < 20) { i = i +1 }

slide-13
SLIDE 13

26/03/12 ¡ 13 ¡

Tree based representation

› In GA, chromosomes are linear structures (bit

strings)

› Tree shaped chromosomes are non-linear

structures

› In GA the size of the chromosomes is fixed › Trees in GP may vary in depth and width

Tree based representation

› Symbolic expressions (s-expressions) can be defined by

› Terminal set T › Function set F (with the arities of function symbols)

› Adopting the following general recursive definition:

› Every t ∈ T is a correct expression › f(e1, …, en) is a correct expression if f ∈ F, arity(f)=n and e1,

…, en are correct expressions

› There are no other forms of correct expressions

› In general, expressions in GP are not typed (closure

property: any f ∈ F can take any g ∈ F as argument)

slide-14
SLIDE 14

26/03/12 ¡ 14 ¡

Offspring creation scheme

› GA scheme uses crossover AND mutation

sequentially

› Each operator is applied probabilistically

› GP scheme using crossover OR (exclusive)

mutation

› Choice among them is done probabilistically

Offspring creation: GA vs GP

slide-15
SLIDE 15

26/03/12 ¡ 15 ¡

Mutation

› Most common mutation: replace randomly

chosen subtree by randomly generated tree

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + − + + ⋅ 1 5 ) 3 ( 2 y x π 2!! + (x +3)" y

( )

Mutation cont’d

› Mutation has two parameters:

› Probability pm to choose mutation vs.

recombination

› Probability to chose an internal point as the

root of the subtree to be replaced

› Remarkably pm is advised to be 0

(Koza’92) or very small, like 0.05 (Banzhaf et al. ’98)

› The size of the child can exceed the size

  • f the parent
slide-16
SLIDE 16

26/03/12 ¡ 16 ¡

Recombination

› Most common recombination: exchange two

randomly chosen subtrees among the parents

› Recombination has two parameters:

› Probability pc = 1- pm to choose recombination vs.

mutation

› Probability to chose an internal point within each

parent as crossover point

› The size of offspring can exceed that of the

parents

Parent 1 Parent 2

Recombination

Child 2 Child 1

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + − + + ⋅ 1 5 ) 3 ( 2 y x π (a!3)! 3+(y+12)

( )

2!! + (x +3)"(a!3)

( )

y 5+1 ! " # $ % &' 3+(y+12)

( )

slide-17
SLIDE 17

26/03/12 ¡ 17 ¡

Selection

› Parent selection typically fitness proportionate › Over-selection in very large populations

› rank population by fitness and divide it into two groups:

› group 1: best x% of population, group 2 other (100-x)%

› 80% of selection operations chooses from group 1, 20%

from group 2

› for pop. size = 1000, 2000, 4000, 8000

x = 32%, 16%, 8%, 4%

› %’s come from rule of thumb

› Survivor selection:

› Typical: generational scheme (thus none) › Recently steady-state is becoming popular for its elitism

Initialisation

› Maximum initial depth of trees Dmax is set › Full method (each branch has depth = Dmax):

› nodes at depth d < Dmax (root and inner nodes)

randomly chosen from function set F

› nodes at depth d = Dmax (leaves) randomly chosen from

terminal set T

› Grow method (each branch has depth ≤ Dmax):

› nodes at depth d < Dmax randomly chosen from F ∪ T › nodes at depth d = Dmax randomly chosen from T

› Common GP initialisation: ramped half-and-half,

where grow & full method each deliver half of initial population

slide-18
SLIDE 18

26/03/12 ¡ 18 ¡

Bloat

› Bloat = “survival of the fattest”, i.e., the tree

sizes in the population are increasing over time

› Ongoing research and debate about the reasons › Needs countermeasures, e.g.

› Prohibiting variation operators that would deliver “too

big” children

› Parsimony pressure: penalty for being oversized

Example application: symbolic regression

› Given some points in R2, (x1, y1), … , (xn, yn) › Find function f(x) s.t. ∀i = 1, …, n : f(xi) = yi › Possible GP solution:

› Representation by F = {+, -, /, exp, sin, cos},

T = R ∪ {x}

› Fitness is the error › Standard mutation and recombination › FP or 2-tournament parent selection › Generational population update

› pop.size = 1000, ramped half-half initialisation

› Termination: n “hits” or 50000 fitness evaluations reached

(where “hit” is if | f(xi) – yi | < 0.0001)

2 1

) ) ( ( ) (

i n i i

y x f f err − =∑

=