adaptive operator selection with rank based multi armed
play

Adaptive Operator Selection with Rank-based Multi-Armed Bandits - PowerPoint PPT Presentation

Adaptive Operator Selection with Rank-based Multi-Armed Bandits Alvaro Fialho, Marc Schoenauer & Mich` ele Sebag 26th COW, April 22., 2013 Context Operator Selection Credit Assignment Empirical Validation Conclusion Outline 1 Context


  1. Adaptive Operator Selection with Rank-based Multi-Armed Bandits Alvaro Fialho, Marc Schoenauer & Mich` ele Sebag 26th COW, April 22., 2013

  2. Context Operator Selection Credit Assignment Empirical Validation Conclusion Outline 1 Context & Motivation 2 Operator Selection 3 Credit Assignment 4 Empirical Validation 5 Conclusions & Further Work Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 2

  3. Context Operator Selection Credit Assignment Empirical Validation Conclusion Context & Motivation 1 Context & Motivation Evolutionary Algorithms Adaptive Operator Selection 2 Operator Selection 3 Credit Assignment 4 Empirical Validation 5 Conclusions & Further Work Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 3

  4. Context Operator Selection Credit Assignment Empirical Validation Conclusion Evolutionary Algorithms Stochastic optimization algorithms (Darwinian paradigm) Bottleneck: parameter setting Population size and number of offspring Selection and replacement methods (and their parameters) Variation Operators (application rate, internal parameters) Goal: Automatic setting (Crossing the Chasm) [Moore, 1991] Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 4

  5. Context Operator Selection Credit Assignment Empirical Validation Conclusion Parameter Setting of Variation Operators Difficult to predict the performance Problem-dependent and inter-dependent choices Off-line tuning → best static strategy (expensive) Performance of operators on OneMax 5 1-Bit 3-Bit 4 5-Bit 1/n BitFlip Also depends on . . . 3 Fitness of the parents 2 Pop. fitness distribution 1 (sample fig. with a (1+50)-EA) 0 1000 3000 5000 7000 9000 fitness of the parent Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 5

  6. Context Operator Selection Credit Assignment Empirical Validation Conclusion Parameter Setting of Variation Operators Difficult to predict the performance Problem-dependent and inter-dependent choices Off-line tuning → best static strategy (expensive) Performance of operators on OneMax 5 1-Bit 3-Bit 4 5-Bit 1/n BitFlip Also depends on . . . 3 Fitness of the parents 2 Pop. fitness distribution 1 (sample fig. with a (1+50)-EA) 0 1000 3000 5000 7000 9000 fitness of the parent ⇒ Should be adapted on-line, while solving the problem Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 5

  7. Context Operator Selection Credit Assignment Empirical Validation Conclusion Adaptive Operator Selection Position of the Problem Given a set of K variation operators Select on-line the operator to be applied next Based on their recent effects EA AOS quality op1 Operator credit or operator Operator quality op2 reward Application Selection . . . quality opk Impact Credit impact Evaluation Assignment Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 6

  8. Context Operator Selection Credit Assignment Empirical Validation Conclusion Operator Selection 1 Context & Motivation 2 Operator Selection A Multi-Armed Bandit problem Operator Selection: Discussion 3 Credit Assignment 4 Empirical Validation 5 Conclusions & Further Work Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 7

  9. Context Operator Selection Credit Assignment Empirical Validation Conclusion A (kind of) Multi-Armed Bandit problem The Basic Multi-Armed Bandit Problem Given K arms ( ≡ operators) At time t , gambler plays arm j and gets r j , t = 1 with (unknown) prob. p j r j , t = 0 with prob. 1 − p j Goal : maximize cumulative reward ≡ minimize regret T � ( r ∗ L ( T ) = t − r t ) t =1 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 8

  10. Context Operator Selection Credit Assignment Empirical Validation Conclusion The Upper Confidence Bound MAB algorithm Assymptotic optimality guarantees (static context) [Auer et al., 2002] Optimal L ( T ) = O (log T ) At time t , choose arm i maximizing: � 2 log � k n k , t = ˆ + score i , t q i , t n i , t ���� exploitation � �� � exploration with n i , t +1 = n i , t + 1 # times � � 1 1 and ˆ q i , t +1 = 1 − · ˆ q i , t + n i , t +1 · r i , t emp. qual. n i , t +1 Efficiency comes from optimal EvE balance Interval between exploration trials increases exponentially w.r.t. # time steps Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 9

  11. Context Operator Selection Credit Assignment Empirical Validation Conclusion Operator Selection with UCB: shortcomings Exploration vs. Exploitation (EvE) balance In UCB theory, rewards ∈ { 0 , 1 } ; fitness-based rewards ∈ [ a , b ] UCB’s EvE balance is broken, Scaling is needed: � 2 log � k n k , t score i , t = ˆ q i , t + C n i , t Dynamical setting (best arm/op changes along evolution) Adjusting ˆ q ’s after a change takes a long time Use change detection test (e.g. Page-Hinkley) [Hinkley, 1969] ⇒ Upon the detection of a change, restart the MAB. DMAB = UCB + Scaling + Page-Hinkley Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 10

  12. Context Operator Selection Credit Assignment Empirical Validation Conclusion Operator Selection: Discussion MAB = UCB + Scaling Optimal EvE, but in static setting. . . AOS is dynamic DMAB = MAB + Page-Hinkley change-detection Won Pascal challenge on On-line EvE trade-off [Hartland et al., 2007] Utilization in the AOS context [GECCO’08] 2 hyper-parameters: scaling C and Page-Hinkley threshold γ Very efficient, but very sensitive to hyper-parameter setting Change-detection works only when changes are abrupt An alternative: ’More Dynamic’ Reward Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 11

  13. Context Operator Selection Credit Assignment Empirical Validation Conclusion Credit Assignment 1 Context & Motivation 2 Operator Selection 3 Credit Assignment Fitness-based Rewards Area-Under-the-Curve (AUC) Rank-based AUC with MAB 4 Empirical Validation 5 Conclusions & Further Work Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 12

  14. Context Operator Selection Credit Assignment Empirical Validation Conclusion Fitness-based Rewards Impact of an operator application? Most common: Fitness Improvement ∆ F For multi-modal problems: diversity also important [CEC’09] From Impact to Credit (or reward) Instantaneous (∆ F last application) likely to be unstable Average of the last W applications Extreme value over the last W applications [PPSN’08] Rare extreme events are more important than average e.g. rogue waves, epidemic propagation Issues: High sensitivity to scaling parameters . . . likely to be dynamic, too Higher robustness: Credit Assignment based on Ranks Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 13

  15. Context Operator Selection Credit Assignment Empirical Validation Conclusion Area-Under-the-Curve (AUC) Area Under ROC Curve in ML Evaluation of binary classifiers [Fawcett, 2006] [ + + - - + + + - - - - . . . ] Performance: % of misclassification Equivalent to MannWhitneyWilcoxon test Pr ( rank ( n + ) > rank ( n − )) Area Under ROC Curve in AOS One operator versus others 6 [GECCO’10] operator under assessment (1) 5 [ op 1 , op 2 , op 1 , op 1 , op 1 , op 2 , op 2 , . . . ] 4 3 2 Fitness improvements are ranked 1 0 Size of the segment = assigned rank-value 0 1 2 3 4 5 6 7 8 9 other operators Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 14

  16. Context Operator Selection Credit Assignment Empirical Validation Conclusion Rank-Based AUC R ∆ F Op Op. 2, Step 0 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 0 10 2.8 2 0 11 2.5 3 12 2.0 1 13 1.5 0 14 1.0 3 15 0.8 0 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 15

  17. Context Operator Selection Credit Assignment Empirical Validation Conclusion Rank-Based AUC R ∆ F Op Op. 2, Step 1 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 1 9 2.9 2 10 2.8 2 0 11 2.5 3 12 2.0 1 13 1.5 0 14 1.0 3 15 0.8 0 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 15

  18. Context Operator Selection Credit Assignment Empirical Validation Conclusion Rank-Based AUC R ∆ F Op Op. 2, Step 2 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 2 8 3.0 2 9 2.9 2 10 2.8 2 0 11 2.5 3 12 2.0 1 13 1.5 0 14 1.0 3 15 0.8 0 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 15

  19. Context Operator Selection Credit Assignment Empirical Validation Conclusion Rank-Based AUC R ∆ F Op Op. 2, Step 3 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 2 8 3.0 2 9 2.9 2 10 2.8 2 1 11 2.5 3 12 2.0 1 13 1.5 0 14 1.0 3 15 0.8 0 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 15

  20. Context Operator Selection Credit Assignment Empirical Validation Conclusion Rank-Based AUC R ∆ F Op Op. 2, Step 4 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 2 8 3.0 2 9 2.9 2 10 2.8 2 2 11 2.5 3 12 2.0 1 13 1.5 0 14 1.0 3 15 0.8 0 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend