Automating the Configuration of Algorithms for Solving Hard - - PowerPoint PPT Presentation

automating the configuration of algorithms for solving
SMART_READER_LITE
LIVE PREVIEW

Automating the Configuration of Algorithms for Solving Hard - - PowerPoint PPT Presentation

Automating the Configuration of Algorithms for Solving Hard Computational Problems Ph.D. Thesis Defence Frank Hutter Supervisory committee: Prof. Holger Hoos (supervisor) Prof. Kevin Leyton-Brown (co-supervisor) Prof. Kevin Murphy


slide-1
SLIDE 1

Automating the Configuration of Algorithms for Solving Hard Computational Problems

Ph.D. Thesis Defence

Frank Hutter Supervisory committee:

  • Prof. Holger Hoos (supervisor)
  • Prof. Kevin Leyton-Brown (co-supervisor)
  • Prof. Kevin Murphy (co-supervisor)
  • Prof. Alan Mackworth

University Examiners:

  • Prof. Michael Friedlander (CS)
  • Prof. Lutz Lampe (ECE)

External Examiner: Prof. ? Chair: Prof. John Nelson (Forestry)

slide-2
SLIDE 2

Parameters in Algorithms

Most algorithms have parameters

◮ Decisions that are left open during algorithm design

– numerical parameters (e.g., real-valued thresholds) – categorical parameters (e.g., which heuristic to use)

2

slide-3
SLIDE 3

Parameters in Algorithms

Most algorithms have parameters

◮ Decisions that are left open during algorithm design

– numerical parameters (e.g., real-valued thresholds) – categorical parameters (e.g., which heuristic to use)

◮ Set to maximize empirical performance

2

slide-4
SLIDE 4

Real-world example for parameterized algorithms: commercial optimization tool CPLEX

◮ State of the art for mixed integer programming (MIP)

3

slide-5
SLIDE 5

Real-world example for parameterized algorithms: commercial optimization tool CPLEX

◮ State of the art for mixed integer programming (MIP) ◮ Large user base

– Over 1 300 corporations and over 1 000 universities

3

slide-6
SLIDE 6

Real-world example for parameterized algorithms: commercial optimization tool CPLEX

◮ State of the art for mixed integer programming (MIP) ◮ Large user base

– Over 1 300 corporations and over 1 000 universities

◮ 63 parameters that affect search trajectory

3

slide-7
SLIDE 7

Real-world example for parameterized algorithms: commercial optimization tool CPLEX

◮ State of the art for mixed integer programming (MIP) ◮ Large user base

– Over 1 300 corporations and over 1 000 universities

◮ 63 parameters that affect search trajectory

“Integer programming problems are more sensitive to specific parameter settings, so you may need to experiment with them.” [CPLEX 10.0 user manual, page 130]

3

slide-8
SLIDE 8

Real-world example for parameterized algorithms: commercial optimization tool CPLEX

◮ State of the art for mixed integer programming (MIP) ◮ Large user base

– Over 1 300 corporations and over 1 000 universities

◮ 63 parameters that affect search trajectory

“Integer programming problems are more sensitive to specific parameter settings, so you may need to experiment with them.” [CPLEX 10.0 user manual, page 130]

◮ “Experiment with them”

3

slide-9
SLIDE 9

Real-world example for parameterized algorithms: commercial optimization tool CPLEX

◮ State of the art for mixed integer programming (MIP) ◮ Large user base

– Over 1 300 corporations and over 1 000 universities

◮ 63 parameters that affect search trajectory

“Integer programming problems are more sensitive to specific parameter settings, so you may need to experiment with them.” [CPLEX 10.0 user manual, page 130]

◮ “Experiment with them”

– Perform manual optimization in 63-dimensional space – Complex, unintuitive interactions between parameters

3

slide-10
SLIDE 10

Real-world example for parameterized algorithms: commercial optimization tool CPLEX

◮ State of the art for mixed integer programming (MIP) ◮ Large user base

– Over 1 300 corporations and over 1 000 universities

◮ 63 parameters that affect search trajectory

“Integer programming problems are more sensitive to specific parameter settings, so you may need to experiment with them.” [CPLEX 10.0 user manual, page 130]

◮ “Experiment with them”

– Perform manual optimization in 63-dimensional space – Complex, unintuitive interactions between parameters – Humans are not good at that

3

slide-11
SLIDE 11

Real-world example for parameterized algorithms: commercial optimization tool CPLEX

◮ State of the art for mixed integer programming (MIP) ◮ Large user base

– Over 1 300 corporations and over 1 000 universities

◮ 63 parameters that affect search trajectory

“Integer programming problems are more sensitive to specific parameter settings, so you may need to experiment with them.” [CPLEX 10.0 user manual, page 130]

◮ “Experiment with them”

– Perform manual optimization in 63-dimensional space – Complex, unintuitive interactions between parameters – Humans are not good at that developed the first automated tools for this type of problem

3

slide-12
SLIDE 12

Automated Algorithm Configuration

Automate the setting of algorithm parameters

◮ Eliminate most tedious part of algorithm design and end use ◮ Save development time ◮ Improve performance

4

slide-13
SLIDE 13

Automated Algorithm Configuration

Automate the setting of algorithm parameters

◮ Eliminate most tedious part of algorithm design and end use ◮ Save development time ◮ Improve performance ◮ First to consider the general problem,

in particular many categorical parameters

– E.g. 50/63 CPLEX parameters are categorical Algorithm configuration

4

slide-14
SLIDE 14

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

5

slide-15
SLIDE 15

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios

5

slide-16
SLIDE 16

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches

5

slide-17
SLIDE 17

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches ◮ Demonstrated practical relevance of algorithm configuration

5

slide-18
SLIDE 18

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches

– 1st and 2nd approach to configure algorithms with many categorical parameters

◮ Demonstrated practical relevance of algorithm configuration

5

slide-19
SLIDE 19

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches

– 1st and 2nd approach to configure algorithms with many categorical parameters

◮ Demonstrated practical relevance of algorithm configuration

– CPLEX: up to 23-fold speedup

5

slide-20
SLIDE 20

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches

– 1st and 2nd approach to configure algorithms with many categorical parameters

◮ Demonstrated practical relevance of algorithm configuration

– CPLEX: up to 23-fold speedup – SAT solver: 500-fold speedup for software verification

5

slide-21
SLIDE 21

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration
  • 3. Model-Based Search for Algorithm Configuration
  • 4. Conclusions

6

slide-22
SLIDE 22

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration
  • 3. Model-Based Search for Algorithm Configuration
  • 4. Conclusions

7

slide-23
SLIDE 23

Algorithm Configuration as Function Optimization

Deterministic algorithm with continuous parameters

– “Blackbox function” f : Rn → R – Can query function at arbitrary points θ ∈ Rn Find min

θ∈Rn f (θ)

8

slide-24
SLIDE 24

Algorithm Configuration as Function Optimization

Deterministic algorithm with continuous parameters

– “Blackbox function” f : Rn → R – Can query function at arbitrary points θ ∈ Rn Find min

θ∈Rn f (θ)

Randomized algorithm with continuous parameters

– For each θ: distribution Dθ – Optimize statistical parameter τ (e.g., expected value)

8

slide-25
SLIDE 25

Algorithm Configuration as Function Optimization

Deterministic algorithm with continuous parameters

– “Blackbox function” f : Rn → R – Can query function at arbitrary points θ ∈ Rn Find min

θ∈Rn f (θ)

Randomized algorithm with continuous parameters

– For each θ: distribution Dθ – Optimize statistical parameter τ (e.g., expected value) – Can sample from distribution Dθ at arbitrary points θ ∈ Θ Find min

θ∈Rn τ(Dθ)

8

slide-26
SLIDE 26

Algorithm Configuration: General Case

Difference to “standard” blackbox optimization

◮ Categorical parameters

9

slide-27
SLIDE 27

Algorithm Configuration: General Case

Difference to “standard” blackbox optimization

◮ Categorical parameters ◮ Distribution of costs

– across multiple repeated runs for randomized algorithms – across problem instances

9

slide-28
SLIDE 28

Algorithm Configuration: General Case

Difference to “standard” blackbox optimization

◮ Categorical parameters ◮ Distribution of costs

– across multiple repeated runs for randomized algorithms – across problem instances

◮ Can terminate unsuccessful runs early

9

slide-29
SLIDE 29

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration

ParamILS: Iterated Local Search in Configuration Space “Real-World” Applications of ParamILS

  • 3. Model-Based Search for Algorithm Configuration
  • 4. Conclusions

10

slide-30
SLIDE 30

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration

ParamILS: Iterated Local Search in Configuration Space “Real-World” Applications of ParamILS

  • 3. Model-Based Search for Algorithm Configuration
  • 4. Conclusions

11

slide-31
SLIDE 31

Simple manual approach for configuration

Start with some parameter configuration

12

slide-32
SLIDE 32

Simple manual approach for configuration

Start with some parameter configuration Modify a single parameter

12

slide-33
SLIDE 33

Simple manual approach for configuration

Start with some parameter configuration Modify a single parameter if results on benchmark set improve then keep new configuration

12

slide-34
SLIDE 34

Simple manual approach for configuration

Start with some parameter configuration repeat Modify a single parameter if results on benchmark set improve then keep new configuration until no more improvement possible (or “good enough”)

12

slide-35
SLIDE 35

Simple manual approach for configuration

Start with some parameter configuration repeat Modify a single parameter if results on benchmark set improve then keep new configuration until no more improvement possible (or “good enough”) Manually-executed local search

12

slide-36
SLIDE 36

The ParamILS Framework

Iterated Local Serach in parameter configuration space: Choose initial parameter configuration θ Perform subsidiary local search on θ

13

slide-37
SLIDE 37

The ParamILS Framework

Iterated Local Serach in parameter configuration space: Choose initial parameter configuration θ Perform subsidiary local search on θ While tuning time left: | | θ′ := θ | | Perform perturbation on θ | | Perform subsidiary local search on θ

13

slide-38
SLIDE 38

The ParamILS Framework

Iterated Local Serach in parameter configuration space: Choose initial parameter configuration θ Perform subsidiary local search on θ While tuning time left: | | θ′ := θ | | Perform perturbation on θ | | Perform subsidiary local search on θ | | | | Based on acceptance criterion, | | keep θ or revert to θ := θ′

13

slide-39
SLIDE 39

The ParamILS Framework

Iterated Local Serach in parameter configuration space: Choose initial parameter configuration θ Perform subsidiary local search on θ While tuning time left: | | θ′ := θ | | Perform perturbation on θ | | Perform subsidiary local search on θ | | | | Based on acceptance criterion, | | keep θ or revert to θ := θ′ | | ⌊ With probability prestart randomly pick new θ Performs biased random walk over local optima

13

slide-40
SLIDE 40

Instantiations of ParamILS Framework

How to evaluate each configuration?

◮ BasicILS(N): perform fixed number of N runs to evaluate a

configuration θ

– Blocking: use same N (instance, seed) pairs for each θ

14

slide-41
SLIDE 41

Instantiations of ParamILS Framework

How to evaluate each configuration?

◮ BasicILS(N): perform fixed number of N runs to evaluate a

configuration θ

– Blocking: use same N (instance, seed) pairs for each θ

◮ FocusedILS: adaptive choice of N(θ)

– small N(θ) for poor configurations θ – large N(θ) only for good θ

14

slide-42
SLIDE 42

Instantiations of ParamILS Framework

How to evaluate each configuration?

◮ BasicILS(N): perform fixed number of N runs to evaluate a

configuration θ

– Blocking: use same N (instance, seed) pairs for each θ

◮ FocusedILS: adaptive choice of N(θ)

– small N(θ) for poor configurations θ – large N(θ) only for good θ – typically outperforms BasicILS

14

slide-43
SLIDE 43

Empirical Comparison to Previous Configuration Procedure

CALIBRA system [Adenso-Diaz & Laguna, ’06]

◮ Based on fractional factorial designs ◮ Limited to continuous parameters ◮ Limited to 5 parameters

15

slide-44
SLIDE 44

Empirical Comparison to Previous Configuration Procedure

CALIBRA system [Adenso-Diaz & Laguna, ’06]

◮ Based on fractional factorial designs ◮ Limited to continuous parameters ◮ Limited to 5 parameters

Empirical comparison

◮ FocusedILS typically did better, never worse ◮ More importantly, much more general

15

slide-45
SLIDE 45

Adaptive Choice of Cutoff Time

◮ Evaluation of poor configurations takes especially long

16

slide-46
SLIDE 46

Adaptive Choice of Cutoff Time

◮ Evaluation of poor configurations takes especially long ◮ Can terminate evaluations early

◮ Incumbent solution provides bound ◮ Can stop evaluation once bound is reached 16

slide-47
SLIDE 47

Adaptive Choice of Cutoff Time

◮ Evaluation of poor configurations takes especially long ◮ Can terminate evaluations early

◮ Incumbent solution provides bound ◮ Can stop evaluation once bound is reached

◮ Results

– Provably never hurts – Sometimes substantial speedups (factor 10)

16

slide-48
SLIDE 48

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration

ParamILS: Iterated Local Search in Configuration Space “Real-World” Applications of ParamILS

  • 3. Model-Based Search for Algorithm Configuration
  • 4. Conclusions

17

slide-49
SLIDE 49

Configuration of ILOG CPLEX

◮ Recall: 63 parameters, 1.78 × 1038 possible configurations ◮ Ran FocusedILS for 2 days on 10 machines

18

slide-50
SLIDE 50

Configuration of ILOG CPLEX

◮ Recall: 63 parameters, 1.78 × 1038 possible configurations ◮ Ran FocusedILS for 2 days on 10 machines ◮ Compared against default

“A great deal of algorithmic development effort has been devoted to establishing default ILOG CPLEX parameter settings that achieve good performance on a wide variety of MIP models.” [CPLEX 10.0 user manual, page 247]

18

slide-51
SLIDE 51

Configuration of ILOG CPLEX

◮ Recall: 63 parameters, 1.78 × 1038 possible configurations ◮ Ran FocusedILS for 2 days on 10 machines ◮ Compared against default

“A great deal of algorithmic development effort has been devoted to establishing default ILOG CPLEX parameter settings that achieve good performance on a wide variety of MIP models.” [CPLEX 10.0 user manual, page 247]

10

−2 10 −1 10

10

1

10

2

10

3

10

4

10

−2

10

−1

10 10

1

10

2

10

3

10

4

Default Auto−tuned

Combinatorial auctions: 7-fold speedup

18

slide-52
SLIDE 52

Configuration of ILOG CPLEX

◮ Recall: 63 parameters, 1.78 × 1038 possible configurations ◮ Ran FocusedILS for 2 days on 10 machines ◮ Compared against default

“A great deal of algorithmic development effort has been devoted to establishing default ILOG CPLEX parameter settings that achieve good performance on a wide variety of MIP models.” [CPLEX 10.0 user manual, page 247]

10

−2 10 −1 10

10

1

10

2

10

3

10

4

10

−2

10

−1

10 10

1

10

2

10

3

10

4

Default Auto−tuned

Combinatorial auctions: 7-fold speedup

10

−2 10 −1 10

10

1

10

2

10

3

10

4

10

−2

10

−1

10 10

1

10

2

10

3

10

4

Default Auto−tuned

Mixed integer knapsack: 23-fold speedup

18

slide-53
SLIDE 53

Configuration of SAT Solver for Verification

SAT (propositional satisfiability problem)

– Prototypical NP-hard problem – Interesting theoretically and in practical applications

19

slide-54
SLIDE 54

Configuration of SAT Solver for Verification

SAT (propositional satisfiability problem)

– Prototypical NP-hard problem – Interesting theoretically and in practical applications

Formal verification

– Bounded model checking – Software verification – Recent progress based on SAT solvers

19

slide-55
SLIDE 55

Configuration of SAT Solver for Verification

SAT (propositional satisfiability problem)

– Prototypical NP-hard problem – Interesting theoretically and in practical applications

Formal verification

– Bounded model checking – Software verification – Recent progress based on SAT solvers

Spear, tree search solver for industrial SAT instances

– 26 parameters, 8.34 × 1017 configurations

19

slide-56
SLIDE 56

Configuration of SAT Solver for Verification

◮ Ran FocusedILS for 2 days on 10 machines

20

slide-57
SLIDE 57

Configuration of SAT Solver for Verification

◮ Ran FocusedILS for 2 days on 10 machines ◮ Compared to manually-engineered default

– 1 week of performance tuning – competitive with the state of the art

20

slide-58
SLIDE 58

Configuration of SAT Solver for Verification

◮ Ran FocusedILS for 2 days on 10 machines ◮ Compared to manually-engineered default

– 1 week of performance tuning – competitive with the state of the art

10

−2 10 −1 10 0 10 1 10 2 10 3 10 4

10

−2

10

−1

10 10

1

10

2

10

3

10

4

SPEAR, original default (s) SPEAR, optimized for IBM−BMC (s)

IBM Bounded Model Checking: 4.5-fold speedup

20

slide-59
SLIDE 59

Configuration of SAT Solver for Verification

◮ Ran FocusedILS for 2 days on 10 machines ◮ Compared to manually-engineered default

– 1 week of performance tuning – competitive with the state of the art

10

−2 10 −1 10 0 10 1 10 2 10 3 10 4

10

−2

10

−1

10 10

1

10

2

10

3

10

4

SPEAR, original default (s) SPEAR, optimized for IBM−BMC (s)

IBM Bounded Model Checking: 4.5-fold speedup

10

−2 10 −1 10 0 10 1 10 2 10 3 10 4

10

−2

10

−1

10 10

1

10

2

10

3

10

4

SPEAR, original default (s) SPEAR, optimized for SWV (s)

Software verification: 500-fold speedup won 2007 SMT competition

20

slide-60
SLIDE 60

Other Fielded Applications of ParamILS

◮ SAPS, local search for SAT

8-fold and 130-fold speedup

21

slide-61
SLIDE 61

Other Fielded Applications of ParamILS

◮ SAPS, local search for SAT

8-fold and 130-fold speedup

◮ SAT4J, tree search for SAT

11-fold speedup

21

slide-62
SLIDE 62

Other Fielded Applications of ParamILS

◮ SAPS, local search for SAT

8-fold and 130-fold speedup

◮ SAT4J, tree search for SAT

11-fold speedup

◮ GLS+ for Most Probable Explanation (MPE) problem

> 360-fold speedup

21

slide-63
SLIDE 63

Other Fielded Applications of ParamILS

◮ SAPS, local search for SAT

8-fold and 130-fold speedup

◮ SAT4J, tree search for SAT

11-fold speedup

◮ GLS+ for Most Probable Explanation (MPE) problem

> 360-fold speedup

◮ Applications by others

– Protein folding [Thatchuk, Shmygelska & Hoos ’07] – Time-tabling [Fawcett, Hoos & Chiarandini ’09] – Local Search for SAT [Khudabukhsh, Xu, Hoos, & Leyton-Brown ’09]

21

slide-64
SLIDE 64

Other Fielded Applications of ParamILS

◮ SAPS, local search for SAT

8-fold and 130-fold speedup

◮ SAT4J, tree search for SAT

11-fold speedup

◮ GLS+ for Most Probable Explanation (MPE) problem

> 360-fold speedup

◮ Applications by others

– Protein folding [Thatchuk, Shmygelska & Hoos ’07] – Time-tabling [Fawcett, Hoos & Chiarandini ’09] – Local Search for SAT [Khudabukhsh, Xu, Hoos, & Leyton-Brown ’09] demonstrates versatility & maturity

21

slide-65
SLIDE 65

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration
  • 3. Model-Based Search for Algorithm Configuration

State of the Art Improvements for Stochastic Blackbox Optimization Beyond Stochastic Blackbox Optimization

  • 4. Conclusions

22

slide-66
SLIDE 66

Model-Based Optimization: Motivation

Fundamentally different approach for algorithm configuration

◮ So far: discussed local search approach ◮ Now: alternative choice, based on predictive models

23

slide-67
SLIDE 67

Model-Based Optimization: Motivation

Fundamentally different approach for algorithm configuration

◮ So far: discussed local search approach ◮ Now: alternative choice, based on predictive models

– Model-based optimization was less well developed emphasis on methodological improvements

23

slide-68
SLIDE 68

Model-Based Optimization: Motivation

Fundamentally different approach for algorithm configuration

◮ So far: discussed local search approach ◮ Now: alternative choice, based on predictive models

– Model-based optimization was less well developed emphasis on methodological improvements

◮ In then end: state-of-the-art configuration tool

23

slide-69
SLIDE 69

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration
  • 3. Model-Based Search for Algorithm Configuration

State of the Art Improvements for Stochastic Blackbox Optimization Beyond Stochastic Blackbox Optimization

  • 4. Conclusions

24

slide-70
SLIDE 70

Model-Based Deterministic Blackbox Optimization (BBO)

EGO algorithm [Jones, Schonlau & Welch ’98]

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

. . True function . .

25

slide-71
SLIDE 71

Model-Based Deterministic Blackbox Optimization (BBO)

EGO algorithm [Jones, Schonlau & Welch ’98]

  • 1. Get response values at initial design points

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

. True function Function evaluations .

25

slide-72
SLIDE 72

Model-Based Deterministic Blackbox Optimization (BBO)

EGO algorithm [Jones, Schonlau & Welch ’98]

  • 1. Get response values at initial design points

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

. . . Function evaluations .

25

slide-73
SLIDE 73

Model-Based Deterministic Blackbox Optimization (BBO)

EGO algorithm [Jones, Schonlau & Welch ’98]

  • 1. Get response values at initial design points
  • 2. Fit a model to the data

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev . Function evaluations .

25

slide-74
SLIDE 74

Model-Based Deterministic Blackbox Optimization (BBO)

EGO algorithm [Jones, Schonlau & Welch ’98]

  • 1. Get response values at initial design points
  • 2. Fit a model to the data
  • 3. Use model to pick most promising next design point

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev . Function evaluations EI (scaled)

25

slide-75
SLIDE 75

Model-Based Deterministic Blackbox Optimization (BBO)

EGO algorithm [Jones, Schonlau & Welch ’98]

  • 1. Get response values at initial design points
  • 2. Fit a model to the data
  • 3. Use model to pick most promising next design point
  • 4. Repeat 2. and 3. until time is up

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

25

slide-76
SLIDE 76

Model-Based Deterministic Blackbox Optimization (BBO)

EGO algorithm [Jones, Schonlau & Welch ’98]

  • 1. Get response values at initial design points
  • 2. Fit a model to the data
  • 3. Use model to pick most promising next design point
  • 4. Repeat 2. and 3. until time is up

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

First step

0.2 0.4 0.6 0.8 1 −5 5 10 15 20 25 30

parameter x response y

DACE mean prediction DACE mean +/− 2*stddev True function Function evaluations EI (scaled)

Second step

25

slide-77
SLIDE 77

Stochastic Blackbox Optimization (BBO): State of the Art

Extensions of EGO algorithm for stochastic case

– Sequential Parameter Optimization (SPO)

[Bartz-Beielstein, Preuss, Lasarczyk, ’05-’09]

– Sequential Kriging Optimization (SKO)

[Huang, Allen, Notz & Zeng, ’06]

26

slide-78
SLIDE 78

Stochastic Blackbox Optimization (BBO): State of the Art

Extensions of EGO algorithm for stochastic case

– Sequential Parameter Optimization (SPO)

[Bartz-Beielstein, Preuss, Lasarczyk, ’05-’09]

– Sequential Kriging Optimization (SKO)

[Huang, Allen, Notz & Zeng, ’06]

Application domain for stochastic BBO

◮ Randomized algorithms with continuous parameters ◮ Optimization for single instances

26

slide-79
SLIDE 79

Stochastic Blackbox Optimization (BBO): State of the Art

Extensions of EGO algorithm for stochastic case

– Sequential Parameter Optimization (SPO)

[Bartz-Beielstein, Preuss, Lasarczyk, ’05-’09]

– Sequential Kriging Optimization (SKO)

[Huang, Allen, Notz & Zeng, ’06]

Application domain for stochastic BBO

◮ Randomized algorithms with continuous parameters ◮ Optimization for single instances

Empirical Evaluation

◮ SPO more robust

26

slide-80
SLIDE 80

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration
  • 3. Model-Based Search for Algorithm Configuration

State of the Art Improvements for Stochastic Blackbox Optimization Beyond Stochastic Blackbox Optimization

  • 4. Conclusions

27

slide-81
SLIDE 81

Improvements for stochastic BBO

I: Studied SPO components

◮ Improved component: “intensification mechanism”

– Increase N(θ) similarly as in FocusedILS – Improved robustness

28

slide-82
SLIDE 82

Improvements for stochastic BBO

I: Studied SPO components

◮ Improved component: “intensification mechanism”

– Increase N(θ) similarly as in FocusedILS – Improved robustness

II: Better Models

◮ Compared various probabilistic models

– Model SPO uses – Approximate Gaussian process (GP) – Random forest (RF)

28

slide-83
SLIDE 83

Improvements for stochastic BBO

I: Studied SPO components

◮ Improved component: “intensification mechanism”

– Increase N(θ) similarly as in FocusedILS – Improved robustness

II: Better Models

◮ Compared various probabilistic models

– Model SPO uses – Approximate Gaussian process (GP) – Random forest (RF)

◮ New models much better

– Resulting configuration procedure: ActiveConfigurator – Improved state of the art for model-based stochastic BBO

28

slide-84
SLIDE 84

Improvements for stochastic BBO

I: Studied SPO components

◮ Improved component: “intensification mechanism”

– Increase N(θ) similarly as in FocusedILS – Improved robustness

II: Better Models

◮ Compared various probabilistic models

– Model SPO uses – Approximate Gaussian process (GP) – Random forest (RF)

◮ New models much better

– Resulting configuration procedure: ActiveConfigurator – Improved state of the art for model-based stochastic BBO – Randomized algorithm with continuous parameters – Optimization for single instances

28

slide-85
SLIDE 85

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration
  • 3. Model-Based Search for Algorithm Configuration

State of the Art Improvements for Stochastic Blackbox Optimization Beyond Stochastic Blackbox Optimization

  • 4. Conclusions

29

slide-86
SLIDE 86

Extension I: Categorical Parameters

Models that can handle categorical inputs

◮ Random forests: out of the box ◮ Extended (approximate) Gaussian processes

– new kernel based on weighted Hamming distance

30

slide-87
SLIDE 87

Extension I: Categorical Parameters

Models that can handle categorical inputs

◮ Random forests: out of the box ◮ Extended (approximate) Gaussian processes

– new kernel based on weighted Hamming distance

Application domain

◮ Algorithms with categorical parameters ◮ Single instances

30

slide-88
SLIDE 88

Extension I: Categorical Parameters

Models that can handle categorical inputs

◮ Random forests: out of the box ◮ Extended (approximate) Gaussian processes

– new kernel based on weighted Hamming distance

Application domain

◮ Algorithms with categorical parameters ◮ Single instances

Empirical evaluation

◮ ActiveConfigurator outperformed FocusedILS

30

slide-89
SLIDE 89

Extension II: Multiple Instances

Models incorporating multiple instances

◮ Can still learn probabilistic models of algorithm performance ◮ Model inputs:

◮ algorithm parameters ◮ instance features 31

slide-90
SLIDE 90

Extension II: Multiple Instances

Models incorporating multiple instances

◮ Can still learn probabilistic models of algorithm performance ◮ Model inputs:

◮ algorithm parameters ◮ instance features

General algorithm configuration

◮ Algorithms with categorical parameters ◮ Multiple instances

31

slide-91
SLIDE 91

Extension II: Multiple Instances

Models incorporating multiple instances

◮ Can still learn probabilistic models of algorithm performance ◮ Model inputs:

◮ algorithm parameters ◮ instance features

General algorithm configuration

◮ Algorithms with categorical parameters ◮ Multiple instances

Empirical evaluation

◮ ActiveConfigurator never worse than FocusedILS ◮ Overall: model-based approaches very promising

31

slide-92
SLIDE 92

Outline

  • 1. Problem Definition & Intuition
  • 2. Model-Free Search for Algorithm Configuration
  • 3. Model-Based Search for Algorithm Configuration
  • 4. Conclusions

32

slide-93
SLIDE 93

Conclusions

Algorithm configuration

◮ Is a high-dimensional optimization problem

– Can be solved by automated approaches – Sometimes much better than by human experts

33

slide-94
SLIDE 94

Conclusions

Algorithm configuration

◮ Is a high-dimensional optimization problem

– Can be solved by automated approaches – Sometimes much better than by human experts

◮ Can cut development time & improve results

33

slide-95
SLIDE 95

Conclusions

Algorithm configuration

◮ Is a high-dimensional optimization problem

– Can be solved by automated approaches – Sometimes much better than by human experts

◮ Can cut development time & improve results

Scaling to very complex problems allows us to

◮ Build very flexible algorithm frameworks ◮ Apply automated tool to instantiate framework

Generate custom algorithms for different problem types

33

slide-96
SLIDE 96

Conclusions

Algorithm configuration

◮ Is a high-dimensional optimization problem

– Can be solved by automated approaches – Sometimes much better than by human experts

◮ Can cut development time & improve results

Scaling to very complex problems allows us to

◮ Build very flexible algorithm frameworks ◮ Apply automated tool to instantiate framework

Generate custom algorithms for different problem types

Blackbox approaches

◮ Very general ◮ Can be used to optimize your parameters

33

slide-97
SLIDE 97

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

34

slide-98
SLIDE 98

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches ◮ Demonstrated practical relevance of algorithm configuration

34

slide-99
SLIDE 99

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches ◮ Demonstrated practical relevance of algorithm configuration

34

slide-100
SLIDE 100

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches

– Model-free Iterated Local Search approach

◮ Demonstrated practical relevance of algorithm configuration

34

slide-101
SLIDE 101

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches

– Model-free Iterated Local Search approach – Improved & Extended Sequential Model-Based Optimization

◮ Demonstrated practical relevance of algorithm configuration

34

slide-102
SLIDE 102

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches

– Model-free Iterated Local Search approach – Improved & Extended Sequential Model-Based Optimization

◮ Demonstrated practical relevance of algorithm configuration

34

slide-103
SLIDE 103

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios ◮ Two fundamentally different solution approaches

– Model-free Iterated Local Search approach – Improved & Extended Sequential Model-Based Optimization

◮ Demonstrated practical relevance of algorithm configuration

– CPLEX: up to 23-fold speedup – SPEAR: 500-fold speedup for software verification

34

slide-104
SLIDE 104

Main Contribution of this thesis

Comprehensive study of the algorithm configuration problem

◮ Empirical analysis of configuration scenarios [Ready for submission] ◮ Two fundamentally different solution approaches

– Model-free Iterated Local Search approach [AAAI’07] – Improved & Extended Sequential Model-Based Optimization

[GECCO’09; EMAA’09] ◮ Demonstrated practical relevance of algorithm configuration

– CPLEX: up to 23-fold speedup [JAIR’09] – SPEAR: 500-fold speedup for software verification [FMCAD’07]

34

slide-105
SLIDE 105

Important Directions for the Next Few Years

◮ Improve configuration procedures from practical point of view

– Mixed categorical/numerical optimization – Make easier to use off the shelf

35

slide-106
SLIDE 106

Important Directions for the Next Few Years

◮ Improve configuration procedures from practical point of view

– Mixed categorical/numerical optimization – Make easier to use off the shelf

◮ More sophisticated model-based methods

– Use model to select most informative instance – Use model to select best cutoff time – Per-instance setting of parameters

35

slide-107
SLIDE 107

Important Directions for the Next Few Years

◮ Improve configuration procedures from practical point of view

– Mixed categorical/numerical optimization – Make easier to use off the shelf

◮ More sophisticated model-based methods

– Use model to select most informative instance – Use model to select best cutoff time – Per-instance setting of parameters

◮ Explore other fields of applications

35

slide-108
SLIDE 108

Thanks to

◮ Supervisory committee

– Holger Hoos (supervisor) – Kevin Leyton-Brown (co-supervisor) – Kevin Murphy (co-supervisor) – Alan Mackworth

◮ Further collaborators

– Domagoj Babi´ c – Thomas Bartz-Beielstein – Youssef Hamadi – Alan Hu – Thomas St¨ utzle – Dave Tompkins – Lin Xu

◮ LCI and BETA lab faculty and students

36