Machine Learning: Algorithms and Applications Floriano Zini Free - PDF document

26/03/12 ¡ Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 5: 26 th March 2012 Evolutionary computing These slides are mainly taken from A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing 1 ¡

26/03/12 ¡ Genetic Algorithms (continued) Population Models  SGA uses a Generational model:  each individual survives for exactly one generation  the entire set of parents is replaced by the offspring  At the other end of the scale are Steady-State models:  one offspring is generated per generation  one member of population replaced  Generation Gap  the proportion of the population replaced  makes a parameterized transition between generational and steady-state Gas  gg = 1.0 for SGA, gg = 1/pop_size for SSGA  The name SSGA is often used for any GA with a generation gap < 1 2 ¡

26/03/12 ¡ Fitness Based Competition  Selection can occur in two places:  Selection from current generation to take part in mating (parent selection)  Selection from parents + offspring to go into next generation (survivor selection)  Selection operators work on whole individual  i.e. they are representation-independent ! Fitness-Proportionate Selection  Problems include  One highly fit member can rapidly take over if rest of population is much less fit: premature Convergence  At end of runs when fitnesses are similar, loss of selection pressure  Highly susceptible to function transposition (see next slide)  Scaling can fix the last two problems f '( i ) = f ( i ) ! !  Windowing:  where β is worst fitness in this generation  Sigma Scaling: f ( i ) = max( f ( i ) " ( f " c # ! f ),0.0) !  where c is a constant, usually 2.0 3 ¡

26/03/12 ¡ Function transposition for FPS Rank-based Selection  Attempt to remove problems of FPS by basing selection probabilities on relative rather than absolute fitness  Rank population according to fitness and then base selection probabilities on rank (fittest has rank µ and worst rank 1)  This imposes a sorting overhead on the algorithm, but this is usually negligible compared to the fitness evaluation time 4 ¡

26/03/12 ¡ Linear Ranking  Parameterised by factor s: 1.0 < s ≤ 2.0  measures advantage of best individual  In SGA this is the number of children allotted to it lin ! rank ( i ) = 2 ! s + 2( i ! 1)( s ! 1) P µ µ ( µ ! 1)  Simple 3 member example Tournament Selection  All selection methods above rely on global population statistics  Could be a bottleneck esp. on parallel machines  Relies on presence of external fitness function which might not exist: e.g. evolving game players  Idea for a procedure using only local fitness information:  Pick k members at random then select the best of these  Repeat to select more individuals 5 ¡

26/03/12 ¡ Tournament Selection  Probability of selecting i will depend on:  Rank of i  Size of sample k  higher k increases selection pressure because the probability of above-average fitness individuals increases  Whether fittest contestant always wins or it is selected with probability p  p <1 à lower selection pressure  Whether contestants are picked with replacement  picking without replacement increases selection pressure: the k-1 least-fit individuals cannot be selected if p =1 Survivor Selection  Most of selection methods above are used for parent selection  Survivor selection can be divided into two approaches:  Age-Based Selection  In SGA the population is fully replaced ad each generation  In SSGA can implement as “delete-random” (not recommended) or as first-in-first-out (a.k.a. delete-oldest)  Fitness-Based Selection  Using one of the methods above 6 ¡

26/03/12 ¡ Two Special Cases of fitness-based survivor selection  Replace-worst  The worst (in term of fitness) individuals are replaced and each generation by the offspring  Rapid takeover: use with large populations or “no duplicates” policy  Elitism  Always keep at least one copy of the fittest solution so far  Widely used in both population models (SGA, SSGA) SGA technical summary tableau Representation Binary Strings Recombination N-point or uniform crossover Mutation Bitwise bit-flipping with fixed probability Parent selection Fitness-Proportionate Survivor selection All children replace parents 7 ¡

26/03/12 ¡ Genetic Programming GP quick overview  Developed: USA in the 1990’s  Early names: J. Koza  Typically applied to:  machine learning tasks (prediction, classification…)  Attributed features:  needs huge populations (thousands)  slow  Special:  non-linear chromosomes: trees, graphs  mutation possible but not necessary (disputed!) 8 ¡

26/03/12 ¡ GP technical summary tableau Representation Tree structures Recombination Exchange of subtrees Mutation Random change in trees Parent selection Fitness proportional Survivor selection Generational replacement Introductory example: credit scoring  Bank wants to distinguish good from bad loan applicants  Model needed that matches historical data ID No of Salary Marital status OK? children ID-1 2 45000 Married 0 ID-2 0 30000 Single 1 ID-3 1 40000 Married 1 ID-4 2 60000 Divorced 1 …. …. …. …. …. ID-10000 2 50000 Married 1 9 ¡

26/03/12 ¡ Introductory example: credit scoring  A possible model:  IF (NOC = 2) AND (S > 80000) THEN good ELSE bad  In general:  IF formula THEN good ELSE bad  Only unknown is the right formula, hence  Our search space (phenotypes) is the set of formulas  Natural fitness of a formula: percentage of well classified cases of the model it stands for  Natural representation of formulas (genotypes) is: parse trees Introductory example: credit scoring IF (NOC = 2) AND (S > 80000) THEN good ELSE bad can be represented by the following parse tree AND = > NOC 2 S 80000 10 ¡

26/03/12 ¡ Tree based representation  Trees are a universal form, e.g. consider y ⎛ ⎞  Arithmetic formula: 2 ( x 3 ) ⋅ π + + − ⎜ ⎟ 5 1 + ⎝ ⎠  Logical formula: (x ∧ true) → (( x ∨ y ) ∨ (z ↔ (x ∧ y)))  Program: i =1; while (i < 20) { i = i +1 } Tree based representation y ⎛ ⎞ 2 ( x 3 ) ⋅ π + + − ⎜ ⎟ 5 1 + ⎝ ⎠ 11 ¡

26/03/12 ¡ Tree based representation (x ∧ true) → (( x ∨ y ) ∨ (z ↔ (x ∧ y))) Tree based representation i = 1; while (i < 20) { i = i +1 } 12 ¡

26/03/12 ¡ Tree based representation  In GA, chromosomes are linear structures (bit strings)  Tree shaped chromosomes are non-linear structures  In GA the size of the chromosomes is fixed  Trees in GP may vary in depth and width Tree based representation  Symbolic expressions (s-expressions) can be defined by  Terminal set T  Function set F (with the arities of function symbols)  Adopting the following general recursive definition:  Every t ∈ T is a correct expression  f(e 1 , …, e n ) is a correct expression if f ∈ F, arity(f)=n and e 1 , …, e n are correct expressions  There are no other forms of correct expressions  In general, expressions in GP are not typed (closure property: any f ∈ F can take any g ∈ F as argument) 13 ¡

26/03/12 ¡ Offspring creation scheme  GA scheme uses crossover AND mutation sequentially  Each operator is applied probabilistically  GP scheme using crossover OR (exclusive) mutation  Choice among them is done probabilistically Offspring creation: GA vs GP 14 ¡

26/03/12 ¡ Mutation  Most common mutation: replace randomly chosen subtree by randomly generated tree y ⎛ ⎞ 2 ( x 3 ) ( ) ⋅ π + + − 2 ! ! + ( x + 3) " y ⎜ ⎟ 5 1 + ⎝ ⎠ Mutation cont’d  Mutation has two parameters:  Probability p m to choose mutation vs. recombination  Probability to chose an internal point as the root of the subtree to be replaced  Remarkably p m is advised to be 0 (Koza’92) or very small, like 0.05 (Banzhaf et al. ’98)  The size of the child can exceed the size of the parent 15 ¡

26/03/12 ¡ Recombination  Most common recombination: exchange two randomly chosen subtrees among the parents  Recombination has two parameters:  Probability p c = 1- p m to choose recombination vs. mutation  Probability to chose an internal point within each parent as crossover point  The size of offspring can exceed that of the parents Recombination y ⎛ ⎞ Parent 2 ( a ! 3) ! 3 + ( y + 12) ( ) 2 ( x 3 ) Parent 1 ⋅ π + + − ⎜ ⎟ 5 1 + ⎝ ⎠ ! $ y ( ) Child 2 &' 3 + ( y + 12) # ( ) 2 ! ! + ( x + 3) " ( a ! 3) Child 1 " 5 + 1 % 16 ¡

Machine Learning: Algorithms and Applications Floriano Zini Free - PDF document

26/03/12 Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 5: 26 th March 2012 Evolutionary computing These slides are mainly taken

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

CARPENTER Biological Datasets Find Closed Patterns in Long Biological Datasets Gene

The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, Cedric Chauve, Daniel

Machine Learning 2007: Lecture 2 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Parallel Game Tree Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Search problems on Cayley graphs Elena Konstantinova Sobolev Institute of Mathematics

ON A CHINESE BUS DNA IS A CODE SCIENTIFIC INFERENCE: DESIGN IN BIOLOGY 1. The pattern in DNA is

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

Scaling Communication-Intensive Applications on BlueGene/P Using One- Sided Communication and