Adaptive Operator Selection for Optimization Alvaro Fialho - - PowerPoint PPT Presentation
Adaptive Operator Selection for Optimization Alvaro Fialho - - PowerPoint PPT Presentation
Adaptive Operator Selection for Optimization Alvaro Fialho Advisors: Marc Schoenauer & Mich` ele Sebag Ph.D. Defense Ecole Doctorale dInformatique Universit e Paris-Sud, Orsay, France December 22, 2010 Context Operator
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Outline
1
Context & Motivation
2
Operator Selection
3
Credit Assignment
4
Empirical Validation
5
Conclusions & Further Work
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 2/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Context & Motivation
1
Context & Motivation Evolutionary Algorithms Parameter Setting in EAs Parameter Setting of Variation Operators Adaptive Operator Selection
2
Operator Selection
3
Credit Assignment
4
Empirical Validation
5
Conclusions & Further Work
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 3/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Evolutionary Algorithms
Stochastic optimization algorithms (Darwinian paradigm)
Bottleneck: parameter setting
Population size and number of offspring generated Parameters of selection and replacement methods Parameters of Variation Operators (application rate, etc)
Goal: Automatic parameter setting (Crossing the Chasm)
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 4/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Evolutionary Algorithms
Stochastic optimization algorithms (Darwinian paradigm)
Bottleneck: parameter setting
Population size and number of offspring generated Parameters of selection and replacement methods Parameters of Variation Operators (application rate, etc)
Goal: Automatic parameter setting (Crossing the Chasm)
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 4/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Parameter Setting in EAs
[Eiben et al., 2007] ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 5/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Parameter Setting of Variation Operators
Difficult to predict the performance Problem-dependent and inter-dependent choices
Off-line tuning can find the best static strategy (expensive)
fitness of the parent 1000 3000 5000 7000 9000 1 2 3 4 5 1-Bit 3-Bit 5-Bit 1/n BitFlip Performance of operators on OneMax
Depends also on...
Fitness of the parents
- Pop. fitness distribution
(sample fig. with a (1+50)-EA) ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 6/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Parameter Setting of Variation Operators
Difficult to predict the performance Problem-dependent and inter-dependent choices
Off-line tuning can find the best static strategy (expensive)
fitness of the parent 1000 3000 5000 7000 9000 1 2 3 4 5 1-Bit 3-Bit 5-Bit 1/n BitFlip Performance of operators on OneMax
Depends also on...
Fitness of the parents
- Pop. fitness distribution
(sample fig. with a (1+50)-EA)
⇒ Should be adapted on-line, while solving the problem
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 6/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Parameter Setting in EAs
[Eiben et al., 2007] ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 7/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Adaptive Operator Selection
Position of the Problem
Given a set of K variation operators Select on-line the operator to be applied next
Based on their recent performance Operator Application Impact Evaluation EA AOS Operator Selection Credit Assignment
quality op1 . . . quality op2 quality opk
impact
- perator
credit or reward
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 8/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Operator Selection
1
Context & Motivation
2
Operator Selection Related Work Discussion on Operator Selection A (kind of) Multi-Armed Bandit problem Dynamic Multi-Armed Bandit (DMAB) Sliding Multi-Armed Bandit (SLMAB) Contributions to Operator Selection: Summary
3
Credit Assignment
4
Empirical Validation
5
Conclusions & Further Work
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 9/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Operator Selection - Related Work
“Empirical quality”: ˆ qj,t+1 = (1 − α) · ˆ qj,t + α · rj,t Probability Matching (PM)
[Goldberg, 1990]
si proportional to ˆ qi si,t+1 = pmin + (1 − K · pmin) · ˆ qi,t+1 K
j=1 ˆ
qj,t+1 Adaptive Pursuit (AP)
[Thierens, 2005]
si∗ is pushed to pmax; others to pmin i∗ = argmax{ˆ qi,t, i = 1 . . . K} si∗,t+1 = si∗,t + β · (pmax − si∗,t) , si,t+1 = si,t + β · (pmin − si,t) , for i = i∗
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 10/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Discussion on Operator Selection
Exploration versus Exploitation
In operators search space, not problem search space
Acquire new information (use other operators) vs. Capitalize on the available knowledge (use current best)
Probability-based Methods (PM and AP)
Conservative approach: fixed pmin
Entails over-exploration when many operators
EvE balance ⇒ Game Theory: Multi-Armed Bandits
Level of exploration depends on confidence about knowledge
i.e., pmin should be “dynamic”
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 11/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
A (kind of) Multi-Armed Bandit problem
Original Multi-Armed Bandits (Machine Learning - ML)
Given K arms (≡ operators) At time t, gambler plays arm j and gets
rj,t = 1 with (unknown) prob. pj rj,t = 0 otherwise
Goal: maximize cumulative reward ≡ minimize regret L(T) =
T
- t=1
(r∗
t − rt)
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 12/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
The Upper Confidence Bound MAB algorithm
Assymptotic optimality guarantees (static context)
[Auer et al., 2002]
Optimal L(T) = O(log T) At time t, choose arm i maximizing: scorei,t = ˆ qi,t
- exploitation
+
- 2 log
k nk,t
ni,t
- exploration
with ni,t+1 = ni,t + 1
# times
and ˆ qi,t+1 =
- 1 −
1 ni,t+1
- · ˆ
qi,t +
1 ni,t+1 · ri,t
- emp. qual.
Efficiency comes from optimal EvE balance
Interval between exploration trials increases exponentially w.r.t. # time steps
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 13/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Operator Selection with UCB: shortcomings
Exploration vs. Exploitation (EvE) balance
Original MAB: rewards ∈ {0, 1}; AOS: rewards ∈ [a, b] (e.g., fitness improvement) UCB’s EvE balance is broken, Scaling is needed: scorei,t = ˆ qi,t + C
- 2 log P
k nk,t
ni,t
Dynamics
When opi is not the best anymore . . . ˆ qi,t+1 =
- 1 −
1 ni,t+1
- · ˆ
qi,t +
1 ni,t+1 · ri,t
Weight of r is inversely proportional to n Adjusting ˆ q’s after a change takes a long time
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 14/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Dynamic Multi-Armed Bandit (DMAB)
Rationale
No need for exploration in stationary situations ⇒ Upon the detection of a change, restart the MAB.
How to detect a change in a distribution?
Page-Hinkley statistical test
[Page, 1954] 1
¯ rt = 1
t
Pt
i=1 ri 2
mt = Pt
i=1(ri − ¯
ri + δ),
3
Mt = max{|mi|, i = 1 . . . t}
4
Return (Mt − |mt| > γ)
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 15/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Dynamic Multi-Armed Bandit (DMAB)
Rationale
No need for exploration in stationary situations ⇒ Upon the detection of a change, restart the MAB.
How to detect a change in a distribution?
Page-Hinkley statistical test
[Page, 1954] 1
¯ rt = 1
t
Pt
i=1 ri 2
mt = Pt
i=1(ri − ¯
ri + δ),
3
Mt = max{|mi|, i = 1 . . . t}
4
Return (Mt − |mt| > γ)
DMAB = UCB + Scaling + Page-Hinkley
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 15/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Sliding Multi-Armed Bandit (SLMAB)
MAB: ˆ qi,t+1 =
- 1 −
1 ni,t+1
- · ˆ
qi,t +
1 ni,t+1 · ri,t
Too slow: weight of r inversely proportional to n
AP/PM: ˆ qi,t+1 = (1 − α) · ˆ qi,t + α · ri,t
Fixed weight, extra hyper-parameter
Rationale
Weight of ri,t adapted w.r.t. operator application frequency
smaller weight if frequently applied, bigger otherwise
ˆ qi,t+1 = ˆ qi,t · W W+(t−ti) +
1 ni,t+1 · ri,t
ni,t+1 = ni,t ·
- W
W+(t−ti ) +
1 ni,t+1
- weight ∈ {1/W, W}
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 16/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Sliding Multi-Armed Bandit (SLMAB)
MAB: ˆ qi,t+1 =
- 1 −
1 ni,t+1
- · ˆ
qi,t +
1 ni,t+1 · ri,t
Too slow: weight of r inversely proportional to n
AP/PM: ˆ qi,t+1 = (1 − α) · ˆ qi,t + α · ri,t
Fixed weight, extra hyper-parameter
Rationale
Weight of ri,t adapted w.r.t. operator application frequency
smaller weight if frequently applied, bigger otherwise
ˆ qi,t+1 = ˆ qi,t · W W+(t−ti) +
1 ni,t+1 · ri,t
ni,t+1 = ni,t ·
- W
W+(t−ti ) +
1 ni,t+1
- weight ∈ {1/W, W}
SLMAB = MAB with Sliding update rule
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 16/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Contributions to Operator Selection: Summary
MAB = UCB + Scaling
Optimal EvE, but in static setting. . . AOS is dynamic
DMAB = MAB + Page-Hinkley change-detection
Won Pascal challenge on On-line EvE trade-off
[Hartland et al., 2007]
Utilization in the AOS context
[GECCO’08]
2 hyper-parameters: scaling C and Page-Hinkley threshold γ Very efficient, but very sensitive to hyper-parameter setting
Change-detection works only when changes are abrupt
SLMAB = MAB + Sliding update rule
2 hyper-parameters: scaling C and sliding window size W
W set to the Credit Assignment sliding window size
Better than or similar to DMAB on artificial scenarios
[AMAI’10] ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 17/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Credit Assignment
1
Context & Motivation
2
Operator Selection
3
Credit Assignment Related Work Extreme Value Based Credit Assignment Discussion on Credit Assignment Rank-based Area-Under-Curve (AUC) Rank-based AUC with MAB Contributions to Credit Assignment: Summary
4
Empirical Validation
5
Conclusions & Further Work
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 18/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Credit Assignment - Related Work
Impact of an operator application?
Most common: Fitness Improvement ∆F
Given fitness function F, operator o, and individual x ∆F = F(o(x)) − F(x)
For multi-modal problems: diversity also important
From Impact to Credit (or reward)
Instantaneous (∆F last application)
likely to be unstable
Average of the last W applications
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 19/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Extreme Value Based Credit Assignment
Impact: fitness improvement Outlier operators are rarely considered - smaller expectation. EC: Focus on extreme, rather than avg events
[Whitacre et al., 2006]
Complex systems, e.g. rogue waves, epidemic propagation
Extreme Value-Based Credit Assignment
r = Extreme value over a Window of size W
[PPSN’08] ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 20/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Discussion on Credit Assignment
Schemes based on raw values of ∆F
Instantaneous, Average, Extreme Rewards are problem-dependent
Different fitness landscapes = different ranges and variance
Rewards are “moment”-dependent
Range and variance might reduce as the search advances Improvements tend to become more and more scarce
Sensitiveness of scaling factor C
MAB, DMAB and SLMAB very sensitive to C C has a double role:
Correction of the EvE balance Scaling of the rewards
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 21/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Discussion on Credit Assignment
Schemes based on raw values of ∆F
Instantaneous, Average, Extreme Rewards are problem-dependent
Different fitness landscapes = different ranges and variance
Rewards are “moment”-dependent
Range and variance might reduce as the search advances Improvements tend to become more and more scarce
Sensitiveness of scaling factor C
MAB, DMAB and SLMAB very sensitive to C C has a double role:
Correction of the EvE balance Scaling of the rewards
Higher robustness: Credit Assignment based on Ranks
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 21/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
Area Under ROC Curve in ML
Evaluation of binary classifiers
[Fawcett, 2006]
[ + + - - + + + - - - - . . . ]
Performance: % of misclassification Equivalent to MannWhitneyWilcoxon test Pr (rank(n+) > rank(n−))
Area Under ROC Curve in AOS
One operator versus others
[GECCO’10]
[ op1, op2, op1, op1, op1, op2, op2, . . .] Fitness improvements are ranked Size of the segment = assigned rank-value
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9
- perator under assessment (1)
- ther operators
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 22/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
R ∆F Op 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 10 2.8 2 11 2.5 3 12 2.0 1 13 1.5 14 1.0 3 15 0.8
- Op. 2, Step 0
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
R ∆F Op 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 10 2.8 2 11 2.5 3 12 2.0 1 13 1.5 14 1.0 3 15 0.8
1
- Op. 2, Step 1
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
R ∆F Op 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 10 2.8 2 11 2.5 3 12 2.0 1 13 1.5 14 1.0 3 15 0.8
2
- Op. 2, Step 2
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
R ∆F Op 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 10 2.8 2 11 2.5 3 12 2.0 1 13 1.5 14 1.0 3 15 0.8
2 1
- Op. 2, Step 3
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
R ∆F Op 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 10 2.8 2 11 2.5 3 12 2.0 1 13 1.5 14 1.0 3 15 0.8
2 2
- Op. 2, Step 4
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
R ∆F Op 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 10 2.8 2 11 2.5 3 12 2.0 1 13 1.5 14 1.0 3 15 0.8
8 7
- Op. 2, Step 15
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
R ∆F Op 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 10 2.8 2 11 2.5 3 12 2.0 1 13 1.5 14 1.0 3 15 0.8
1 2 3 4 5 6 7 8 2 4 6 8 10 12 14
- Op. 0: 1.39
- Op. 1: 31.94
- Op. 2: 61.11
- Op. 3: 5.56
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
R ∆F Op 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 10 2.8 2 11 2.5 3 12 2.0 1 13 1.5 14 1.0 3 15 0.8
2 4 6 8 10 12 14 16 2 4 6 8 10 12 14
- riginal AUC in ML: equal widths
1 2 3 4 5 6 7 8 2 4 6 8 10 12 14
- Op. 0: 1.39
- Op. 1: 31.94
- Op. 2: 61.11
- Op. 3: 5.56
segments with same size
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based Area-Under-Curve (AUC)
R ∆F Op 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 10 2.8 2 11 2.5 3 12 2.0 1 13 1.5 14 1.0 3 15 0.8
2 4 6 8 10 12 14 16 2 4 6 8 10 12 14
- riginal AUC in ML: equal widths
exponential D=0.5: (D^R).(W-R) 1 2 3 4 5 6 7 8 2 4 6 8 10 12 14
- Op. 0: 1.39
- Op. 1: 31.94
- Op. 2: 61.11
- Op. 3: 5.56
segments with same size
5 10 15 20 25 5 10 15 20 25 30
- Op. 0: 0.00
- Op. 1: 5.36
- Op. 2: 94.64
- Op. 3: 0.00
exponential decay DR(W − R)
(example with D = 0.5) ⇒
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 23/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Rank-based AUC with MAB
MAB for AOS
Original MAB: very slow adaptation
AUC: behavior of all ops.: dynamic by construction
Extreme DMAB/SLMAB: sensitive hyper-parameter setting
AUC is rank-based, fitness ranges don’t matter
Use of AUC within MAB
Originally, ˆ q is the average of received rewards AUC is already an aggregation ⇒ directly use AUC in UCB: scorej,t = AUCj,t + C ·
- 2 log P
k nk,t
nj,t
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 24/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Contributions to Credit Assignment: Summary
Extreme Value Based
Empirically shown to outperform baseline Inst/Avg But based on raw values of ∆F: problem-dependent
Area-Under-Curve (AUC)
Ranks over fitness improvements (∆F) Invariant w.r.t. linear scaling of F
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 25/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Contributions to Credit Assignment: Summary
Extreme Value Based
Empirically shown to outperform baseline Inst/Avg But based on raw values of ∆F: problem-dependent
Area-Under-Curve (AUC)
Ranks over fitness improvements (∆F) Invariant w.r.t. linear scaling of F
Fitness-based AUC (FAUC)
Ranks over fitness values (F), rather than ranks over ∆F Invariant with respect to monotonous transformations
e.g., behavior on F = behavior on Fn, log(F), exp(F), . . .
Comparison-based property maintained
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 25/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Empirical Validation
1
Context & Motivation
2
Operator Selection
3
Credit Assignment
4
Empirical Validation AOS Combinations and Hyper-Parameters Goals of Experiments Comparative Performance on the OneMax Problem Invariance Analysis on the OneMax Problem Comparative Results on BBOB
5
Conclusions & Further Work
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 26/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
AOS Combinations and Hyper-Parameters
Proposed AOS Combinations
MAB DMAB SLMAB Extreme(∆F)
[GECCO’08, PPSN’08, LION’09, GECCO’09] [AMAI’10]
AUC(∆F) FAUC(F)
[GECCO’10, BBOB’10, PPSN’10]
Hyper-Parameters
Off-line tuned by F-Race
[Birattari et al., 2002]
Operator Selection:
MAB : scaling factor C DMAB : scaling factor C, PH threshold γ SLMAB : scaling factor C – if using W from Credit Assignment AUC-B : scaling factor C, decay factor D (≡ 0.5 ok)
Credit Assignment: sliding window size W and type (Average, Extreme, AUC)
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 27/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Goals of Experiments
Given a set of K operators . . .
Performance ?
Baseline methods
1
Each operator being applied alone
2
Naive uniform selection between operators
3
Static off-line tuning of application rates (cost ≫)
4
Optimal behavior (available only on simple benchmarks)
5
State-of-the-art OS method: Adaptive Pursuit
[Thierens, 2005]
Robustness/Generality ?
AOS methods have hyper-parameters
Robustness w.r.t. hyper-parameter setting Generality w.r.t. different problems/landscapes
Invariance properties
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 28/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
The OneMax Problem
104 bits Fitness: # of “1”s (1+50)-GA 4 mutation
- perators
fitness of the parent 1000 3000 5000 7000 9000 1 2 3 4 5 1-Bit 3-Bit 5-Bit 1/n BitFlip Performance of operators on OneMax ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 29/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
The OneMax Problem
104 bits Fitness: # of “1”s (1+50)-GA 4 mutation
- perators
fitness of the parent 1000 3000 5000 7000 9000 1 2 3 4 5 1-Bit 3-Bit 5-Bit 1/n BitFlip Performance of operators on OneMax
Generations 1000 2000 3000 4000 5000 1 0.5
Fitness 1-Bit 3-Bit 5-Bit 1/n BitFlip Changes
Optimal Operator Selection (Oracle)
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 29/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Comparative Behavior on the OneMax Problem
1000 2000 3000 4000 5000 1 0.5 Extreme - Adaptive Pursuit 1000 2000 3000 4000 5000 1 0.5 Extreme - Dynamic Multi-Armed Bandit 1000 2000 3000 4000 5000 1 0.5 Area-Under-Curve - Bandit
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 30/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Comparative Performance on the OneMax Problem
25 50 75 100 4500 5000 5500 6000 6500 7000 7500 AUC-MAB Ext-SLMAB Ext-DMAB Ext-MAB Ext-AP Naive Best Static Oracle
*Best Static: 1-Bit 80% + 5-Bit 20% ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 31/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Comparative Performance on the OneMax Problem
25 50 75 100 4500 5000 5500 6000 6500 7000 7500 AUC-MAB Ext-SLMAB Ext-DMAB Ext-MAB Ext-AP Naive Best Static Oracle
*Best Static: 1-Bit 80% + 5-Bit 20%
Other scenarios
Artificial Long K-Path Royal Road SAT probs. Continuous
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 31/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Analysis of Invariance w.r.t. Monotonous Transformations
Original OneMax: F = n
i=1 bi
3 monotonous transformations: log(F), exp (F) and F2
(h-l) F = P bi log(F) exp(F) F 2 AOS tech. 485 5103/427 5195/430 5562/950 5588/950 AUC-MAB 807 5123/218 5431/223 5930/334 5792/382 Ext-AP 5726/399 5726/399 5726/399 5726/399 FAUC-MAB 2591 5376/285 7967/718 7722/2151 6138/516 Ext-DMAB 6971 6059/667 8863/694 13030/3053 12136/949 Ext-SLMAB 7052 9044/840 7947/1267 14999/0 14999/0 Ext-MAB
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 32/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Black-Box Optimization Benchmark (BBOB)
- Exp. framework for rigorous benchmarking
[Hansen et al., 2010]
24 continuous functions, 15 instances per function Several problem dimensions (2, 3, 5, 10, 20, 40)
Adaptive Operator Selection in Differential Evolution
A completely different evolutionary algorithm NP = 100 · DIM; CR = 1.0; F = 0.5 With 4 possible mutation strategies
rand/1, rand/2, rand-to-best/2, current-to-rand/1
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 33/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Parwise comparisons of FAUC-Bandit with . . . (sample fig)
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f6-9 DE1: 4/4 DE2: 4/3 DE3: 4/4 DE4: 4/0
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 34/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Parwise comparisons of FAUC-Bandit with . . .
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f1-24 DE1: 15/15 DE2: 15/12 DE3: 15/15 DE4: 15/0
(a) all functions
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f1-5 DE1: 3/3 DE2: 3/3 DE3: 3/3 DE4: 3/0
(b) separable functions
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f6-9 DE1: 4/4 DE2: 4/3 DE3: 4/4 DE4: 4/0
(c) moderate functions
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f10-14 DE1: 5/5 DE2: 5/5 DE3: 5/5 DE4: 5/0
(d) ill-conditioned functions
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 35/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Parwise comparisons of FAUC-Bandit with . . .
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f1-24 pm: 15/15 AP: 15/15 DMAB: 15/15 SLMAB: 15/15 MAB: 15/15
(e) all functions
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f1-5 pm: 3/3 AP: 3/3 DMAB: 3/3 SLMAB: 3/3 MAB: 3/3
(f) separable functions
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f6-9 pm: 4/4 AP: 4/4 DMAB: 4/4 SLMAB: 4/4 MAB: 4/4
(g) moderate functions
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f10-14 pm: 5/5 AP: 5/5 DMAB: 5/5 SLMAB: 5/5 MAB: 5/5
(h) ill-conditioned functions
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 36/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Parwise comparisons of FAUC-Bandit with . . .
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f1-24 Naive: 15/15 StAll: 15/15 StEach: 15/15 CMA: 15/19
(i) all functions
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f1-5 Naive: 3/3 StAll: 3/3 StEach: 3/3 CMA: 3/3
(j) separable functions
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f6-9 Naive: 4/4 StAll: 4/4 StEach: 4/4 CMA: 4/4
(k) moderate functions
- 2
- 1
1 2 log10 of FEvals(A1)/FEvals(A0) proportion f10-14 Naive: 5/5 StAll: 5/5 StEach: 5/5 CMA: 5/5
(l) ill-conditioned functions
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 37/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Conclusions & Further Work
1
Context & Motivation
2
Operator Selection
3
Credit Assignment
4
Empirical Validation
5
Conclusions & Further Work Summary of Contributions Some Perspectives for Further Work
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 38/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Summary of Contributions I
Algorithmic Contributions
Operator Selection
MAB = UCB + Scaling DMAB = MAB + Page-Hinkley test
[GECCO’08]
SLMAB = MAB + Sliding update rule
[AMAI’10]
Credit Assignment
Extreme value-based (∆F)
[PPSN’08]
Rank-based methods
[GECCO’10]
AOS Combinations
Extreme-xMAB: efficient, but sensitive w.r.t. hyper-parameters (F)AUC-MAB: efficient and robust w.r.t. hyper-parameters
FAUC: comparison-based
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 39/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Summary of Contributions I
Algorithmic Contributions
Operator Selection
MAB = UCB + Scaling DMAB = MAB + Page-Hinkley test
[GECCO’08]
SLMAB = MAB + Sliding update rule
[AMAI’10]
Credit Assignment
Extreme value-based (∆F)
[PPSN’08]
Rank-based methods
[GECCO’10]
AOS Combinations
Extreme-xMAB: efficient, but sensitive w.r.t. hyper-parameters (F)AUC-MAB: efficient and robust w.r.t. hyper-parameters
FAUC: comparison-based
⇒ Combining concepts from ML: MABs and AUC ⇒ Extending them to a dynamic context
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 39/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Summary of Contributions II
Proposal of new Artificial Scenarios
Boolean and Outlier
[GECCO’08], derived from Uniform [Thierens, 2005]
Family of Two-Values scenarios
[AMAI’10]
Two params: control of variance and expectation of rewards Analysis of different behavioral aspects of AOS methods
Empirical Validation
(performance, robustness and generality)
Genetic Algorithms
Artificial scenarios
[GECCO’08, AMAI’10, GECCO’10]
Boolean problems
[PPSN’08, LION’09, GECCO’09, AMAI’10, GECCO’10]
OneMax, Long K-Path and Royal Road problems
Memetic Algorithms
SAT problems, with the Compass Credit Assign.
[CEC’09, Chapter’10]
Differential Evolution
Continuous problems
[BBOB’10, PPSN’10] ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 40/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Some Perspectives for Further Work
Application extensions: AOS paradigm is very general
Use within other meta-heuristics Use at the level of hyper-heuristics
Cross-domain Heuristic Search Challenge (CHeSC)
Algorithmic extensions: towards real-world problems
Extend to multi-modal (diversity, pop.size, . . . ) Extend to multi-objective (Pareto, hyper-volume, . . . )
First trial in real-world: sustainable development
Optimization of designs of buildings for energy efficiency
´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 41/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Our Publications I
Da Costa, L., Fialho, A., Schoenauer, M., and Sebag, M. (2008). Adaptive operator selection with dynamic multi-armed bandits. In Proc. Genetic and Evolutionary Computation Conference (GECCO). ACM. Fialho, A., Da Costa, L., Schoenauer, M., and Sebag, M. (2008). Extreme value based adaptive operator selection. In Proc. Intl. Conf. on Parallel Problem Solving from Nature (PPSN). Springer. Fialho, A., Da Costa, L., Schoenauer, M., and Sebag, M. (2009). Dynamic multi-armed bandits and extreme value-based rewards for AOS in evolutionary algorithms. In Proc. Intl. Conf. on Learning and Intelligent Optimization (LION). Springer. Maturana, J., Fialho, A., Saubion, F., Schoenauer, M., and Sebag, M. (2009). Extreme compass and dynamic multi-armed bandits for adaptive operator selection. In Proc. IEEE Congress on Evolutionary Computation (CEC). IEEE. Fialho, A., Schoenauer, M., and Sebag, M. (2009). Analysis of adaptive operator selection techniques on the royal road and long k-path problems. In Proc. Genetic and Evolutionary Computation Conference (GECCO). ACM. Maturana, J., Fialho, A., Saubion, F., Schoenauer, M., Lardeux, F., and Sebag, M. (2010). Adaptive operator selection and management in evolutionary algorithms. In Y. Hamadi et al, editor, Autonomous Search. Springer. (to appear) Fialho, A., Da Costa, L., Schoenauer, M., and Sebag, M. (2010). Analyzing bandit-based adaptive operator selection mechanisms. Annals of Mathematics and A. I. – Special Issue on Learning and Intelligent Optimization. Springer. ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 42/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Our Publications II
Fialho, A., Schoenauer, M., and Sebag, M. (2010). Toward comparison-based adaptive operator selection. In Proc. Genetic and Evolutionary Computation Conference (GECCO). ACM. Gong, W., Fialho, A., and Cai, Z. (2010). Adaptive strategy selection in differential evolution. In Proc. Genetic and Evolutionary Computation Conference (GECCO). ACM. Fialho, A., Schoenauer, M., and Sebag, M. (2010). Fitness-AUC bandit adaptive strategy selection vs. the probability matching one within DE. In Black-Box Optimization Benchmarking Workshop (BBOB-GECCO). ACM. Fialho, A., Gong, W., and Cai, Z. (2010). Probability matching-based adaptive strategy selection vs. uniform strategy selection within DE. In Black-Box Optimization Benchmarking Workshop (BBOB-GECCO). ACM. Fialho, A. and Ros, R. (2010). Analysis of adaptive strategy selection within differential evolution on the BBOB-2010 noiseless benchmark. Research Report RR-7259, INRIA. Fialho, A., Ros, R., Schoenauer, M., and Sebag, M. (2010). Comparison-based adaptive strategy selection in differential evolution. In Proc. Intl. Conf. on Parallel Problem Solving from Nature (PPSN). Springer. Li, K., Fialho, A., and Kwong, S. (2011). Multi-objective differential evolution with adaptive control of parameters and operators. In Proc. Intl. Conf. on Learning and Intelligent Optimization (LION). Springer. (to appear) ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 43/46
Adaptive Operator Selection for Optimization
´ Alvaro Fialho Advisors: Marc Schoenauer & Mich` ele Sebag Ph.D. Defense ´ Ecole Doctorale d’Informatique Universit´ e Paris-Sud, Orsay, France December 22, 2010
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Other References I
Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-time analysis of the multi-armed bandit problem. Machine Learning, 47(2-3):235–256. Birattari, M., St¨ utzle, T., Paquete, L., and Varrentrapp, K. (2002). A racing algorithm for configuring metaheuristics. In W.B. Langdon et al., editor, Proc. Genetic and Evolutionary Computation Conference (GECCO), pages 11–18. Morgan Kaufmann. Eiben, A., Michalewicz, Z., Schoenauer, M., and Smith, J. (2007). Parameter control in evolutionary algorithms. volume 54 of Studies in Computational Intelligence, pages 19–46. Springer. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recogn. Lett., 27(8):861–874. Goldberg, D. (1990). Probability matching, the magnitude of reinforcement, and classifier system bidding. Machine Learning, 5(4):407–426. Hansen, N., Auger, A., Finck, S., and Ros, R. (2010). Real-parameter black-box optimization benchmarking 2010: Experimental setup. Technical Report RR-7215, INRIA. ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 45/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion
Other References II
Hartland, C., Baskiotis, N., Gelly, S., Teytaud, O., and Sebag, M. (2007). Change point detection and meta-bandits for online learning in dynamic environments. In Proc. Conf´ erence Francophone sur l’Apprentissage Automatique (CAPS). Page, E. (1954). Continuous inspection schemes. Biometrika, 41:100–115. Thierens, D. (2005). An adaptive pursuit strategy for allocating operator probabilities. In H.-G. Beyer et al., editor, Proc. Genetic and Evolutionary Computation Conference (GECCO), pages 1539–1546. ACM. Whitacre, J., Pham, T., and Sarker, R. (2006). Use of statistical outlier detection method in adaptive evolutionary algorithms. In M. Cattolico et al., editor, Proc. Genetic and Evolutionary Computation Conference (GECCO), pages 1345–1352. ACM. ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 46/46