AI-Augmented Algorithms How I Learned to Stop Worrying and Love - - PowerPoint PPT Presentation

ai augmented algorithms
SMART_READER_LITE
LIVE PREVIEW

AI-Augmented Algorithms How I Learned to Stop Worrying and Love - - PowerPoint PPT Presentation

AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthofg University of Wyoming larsko@uwyo.edu Boulder, 16 January 2019 Outline 2 Big Picture Motivation Choosing Algorithms Tuning Algorithms


slide-1
SLIDE 1

AI-Augmented Algorithms

How I Learned to Stop Worrying and Love Choice Lars Kotthofg

University of Wyoming larsko@uwyo.edu Boulder, 16 January 2019

slide-2
SLIDE 2

Outline

▷ Big Picture ▷ Motivation ▷ Choosing Algorithms ▷ Tuning Algorithms ▷ (NCAR-relevant) Applications ▷ Outlook and Resources

2

slide-3
SLIDE 3

Big Picture

▷ advance the state of the art through meta-algorithmic techniques ▷ rather than inventing new things, use existing things more intelligently – automatically ▷ invent new things through combinations of existing things https://xkcd.com/720/

3

slide-4
SLIDE 4

Big Picture

▷ advance the state of the art through meta-algorithmic techniques ▷ rather than inventing new things, use existing things more intelligently – automatically ▷ invent new things through combinations of existing things https://xkcd.com/720/

3

slide-5
SLIDE 5

Motivation – What Difgerence Does It Make?

4

slide-6
SLIDE 6

Prominent Application

Fréchette, Alexandre, Neil Newman, Kevin Leyton-Brown. “Solving the Station Packing Problem.” In Association for the Advancement of Artifjcial Intelligence (AAAI), 2016.

5

slide-7
SLIDE 7

Performance Difgerences

0.1 1 10 100 1000 0.1 1 10 100 1000 Virtual Best SAT Virtual Best CSP

Hurley, Barry, Lars Kotthofg, Yuri Malitsky, and Barry O’Sullivan. “Proteus: A Hierarchical Portfolio of Solvers and Transformations.” In CPAIOR, 2014.

6

slide-8
SLIDE 8

Leveraging the Difgerences

Xu, Lin, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. “SATzilla: Portfolio-Based Algorithm Selection for SAT.” J. Artif. Intell. Res. (JAIR) 32 (2008): 565–606.

7

slide-9
SLIDE 9

Performance Improvements

10

−2 10 −1 10 0 10 1 10 2 10 3 10 4

10

−2

10

−1

10 10

1

10

2

10

3

10

4

SPEAR, original default (s) SPEAR, optimized for SWV (s) Hutter, Frank, Domagoj Babic, Holger H. Hoos, and Alan J. Hu. “Boosting Verifjcation by Automatic Tuning of Decision Procedures.” In FMCAD ’07: Proceedings of the Formal Methods in Computer Aided Design, 27–34. Washington, DC, USA: IEEE Computer Society, 2007.

8

slide-10
SLIDE 10

Common Theme

Performance models of black-box processes ▷ also called surrogate models ▷ substitute expensive underlying process with cheap approximate model ▷ build approximate model using machine learning techniques based on results of evaluations of the underlying process ▷ no knowledge of what the underlying process is required (but can be helpful) ▷ may facilitate better understanding of the underlying process through interrogation of the model

9

slide-11
SLIDE 11

Choosing Algorithms

10

slide-12
SLIDE 12

Algorithm Selection

Given a problem, choose the best algorithm to solve it.

Rice, John R. “The Algorithm Selection Problem.” Advances in Computers 15 (1976): 65–118.

11

slide-13
SLIDE 13

Algorithm Selection

Portfolio Algorithm 2 Algorithm 1 Algorithm 3 Training Instances Instance 2 Instance 1 Instance 3 Algorithm Selection Performance Model Instance 4 Instance 5 Instance 6 . . . Instance 4: Algorithm 2 Instance 5: Algorithm 3 Instance 6: Algorithm 3 . . . Feature Extraction Feature Extraction

12

slide-14
SLIDE 14

Algorithm Portfolios

▷ instead of a single algorithm, use several complementary algorithms ▷ idea from Economics – minimise risk by spreading it out across several securities ▷ same for computational problems – minimise risk of algorithm performing poorly ▷ in practice often constructed from competition winners or

  • ther algorithms known to have good performance

Huberman, Bernardo A., Rajan M. Lukose, and Tad Hogg. “An Economics Approach to Hard Computational Problems.” Science 275, no. 5296 (1997): 51–54. doi:10.1126/science.275.5296.51.

13

slide-15
SLIDE 15

Algorithms

“algorithm” used in a very loose sense ▷ algorithms ▷ heuristics ▷ machine learning models ▷ software systems ▷ machines ▷ …

14

slide-16
SLIDE 16

Parallel Portfolios

Why not simply run all algorithms in parallel? ▷ not enough resources may be available/waste of resources ▷ algorithms may be parallelized themselves ▷ memory/cache contention

15

slide-17
SLIDE 17

Building an Algorithm Selection System

▷ requires algorithms with complementary performance ▷ most approaches rely on machine learning ▷ train with representative data, i.e. performance of all algorithms in portfolio on a number of instances ▷ evaluate performance on separate set of instances ▷ potentially large amount of prep work

16

slide-18
SLIDE 18

Key Components of an Algorithm Selection System

▷ feature extraction ▷ performance model ▷ prediction-based selector/scheduler

  • ptional:

▷ presolver ▷ secondary/hierarchical models and predictors (e.g. for feature extraction time)

17

slide-19
SLIDE 19

Types of Performance Models

Regression Models A1 A2 A3 A1: 1.2 A2: 4.5 A3: 3.9 Classifjcation Model A1 A3 A1 A2 A1 Pairwise Classifjcation Models A1 vs. A2 A1 A2 A1 A1 A1 vs. A3 A1 A1 A3 A3 … A1: 1 vote A2: 0 votes A3: 2 votes Pairwise Regression Models A1 - A2 A1 - A3 … A1: -1.3 A2: 0.4 A3: 1.7 Instance 1 Instance 2 Instance 3 . . . Instance 1: Algorithm 2 Instance 2: Algorithm 1 Instance 3: Algorithm 3 . . . 18

slide-20
SLIDE 20

Tuning Algorithms

19

slide-21
SLIDE 21

Algorithm Confjguration

Given a (set of) problem(s), fjnd the best parameter confjguration.

20

slide-22
SLIDE 22

Parameters?

▷ anything you can change that makes sense to change ▷ e.g. search heuristic, optimization level, computational resolution ▷ not random seed, whether to enable debugging, etc. ▷ some will afgect performance, others will have no efgect at all

21

slide-23
SLIDE 23

Automated Algorithm Confjguration

▷ no background knowledge on parameters or algorithm – black-box process ▷ as little manual intervention as possible

▷ failures are handled appropriately ▷ resources are not wasted ▷ can run unattended on large-scale compute infrastructure

22

slide-24
SLIDE 24

Algorithm Confjguration

Frank Hutter and Marius Lindauer, “Algorithm Confjguration: A Hands on Tutorial”, AAAI 2016

23

slide-25
SLIDE 25

General Approach

▷ evaluate algorithm as black-box function ▷ observe efgect of parameters without knowing the inner workings, build surrogate model based on this data ▷ decide where to evaluate next, based on surrogate model ▷ repeat

24

slide-26
SLIDE 26

When are we done?

▷ most approaches incomplete, i.e. do not exhaustively explore parameter space ▷ cannot prove optimality, not guaranteed to fjnd optimal solution (with fjnite time) ▷ performance highly dependent on confjguration space

How do we know when to stop?

25

slide-27
SLIDE 27

Time Budget

How much time/how many function evaluations? ▷ too much wasted resources ▷ too little suboptimal result ▷ use statistical tests ▷ evaluate on parts of the instance set ▷ for runtime: adaptive capping ▷ in general: whatever resources you can reasonably invest

26

slide-28
SLIDE 28

Grid and Random Search

▷ evaluate certain points in parameter space

Bergstra, James, and Yoshua Bengio. “Random Search for Hyper-Parameter Optimization.” J. Mach. Learn. Res. 13, no. 1 (February 2012): 281–305.

27

slide-29
SLIDE 29

Model-Based Search

▷ evaluate small number of confjgurations ▷ build model of parameter-performance surface based on the results ▷ use model to predict where to evaluate next ▷ repeat ▷ allows targeted exploration of new confjgurations ▷ can take instance features into account like algorithm selection

Hutter, Frank, Holger H. Hoos, and Kevin Leyton-Brown. “Sequential Model-Based Optimization for General Algorithm Confjguration.” In LION 5, 507–23, 2011.

28

slide-30
SLIDE 30

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.000 0.005 0.010 0.015 0.020 0.025

x type

  • init

prop

type

y yhat ei

Iter = 1, Gap = 1.9909e−01

29

slide-31
SLIDE 31

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.00 0.01 0.02 0.03

x type

  • init

prop seq

type

y yhat ei

Iter = 2, Gap = 1.9909e−01

30

slide-32
SLIDE 32

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.000 0.002 0.004 0.006

x type

  • init

prop seq

type

y yhat ei

Iter = 3, Gap = 1.9909e−01

31

slide-33
SLIDE 33

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0e+00 2e−04 4e−04 6e−04 8e−04

x type

  • init

prop seq

type

y yhat ei

Iter = 4, Gap = 1.9992e−01

32

slide-34
SLIDE 34

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0e+00 1e−04 2e−04

x type

  • init

prop seq

type

y yhat ei

Iter = 5, Gap = 1.9992e−01

33

slide-35
SLIDE 35

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.00000 0.00003 0.00006 0.00009 0.00012

x type

  • init

prop seq

type

y yhat ei

Iter = 6, Gap = 1.9996e−01

34

slide-36
SLIDE 36

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0e+00 1e−05 2e−05 3e−05 4e−05 5e−05

x type

  • init

prop seq

type

y yhat ei

Iter = 7, Gap = 2.0000e−01

35

slide-37
SLIDE 37

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05

x type

  • init

prop seq

type

y yhat ei

Iter = 8, Gap = 2.0000e−01

36

slide-38
SLIDE 38

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.0e+00 2.5e−06 5.0e−06 7.5e−06 1.0e−05

x type

  • init

prop seq

type

y yhat ei

Iter = 9, Gap = 2.0000e−01

37

slide-39
SLIDE 39

Model-Based Search Example

  • y

ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0e+00 1e−07 2e−07 3e−07 4e−07

x type

  • init

prop seq

type

y yhat ei

Iter = 10, Gap = 2.0000e−01

38

slide-40
SLIDE 40

Selected Applications

39

slide-41
SLIDE 41

Compiler Parameter Tuning

▷ pre-defjned optimization levels ofger not much fmexibility ▷ improvements possible by tuning full compiler parameter space ▷ tuned compute-intensive AI algorithms ▷ up to 40% runtime improvement over gcc -O2/-O3

Pérez Cáceres, Leslie, Federico Pagnozzi, Alberto Franzin, and Thomas Stützle. “Automatic Confjguration of GCC Using Irace.” In Artifjcial Evolution, edited by Evelyne Lutton, Pierrick Legrand, Pierre Parrend, Nicolas Monmarché, and Marc Schoenauer, 202–16. Cham: Springer International Publishing, 2018.

40

slide-42
SLIDE 42

Compiler Parameter Tuning

▷ not only for C/C++ ▷ JavaScript (JavaScriptCode, V8) optimized for standard benchmarks ▷ up to 35% runtime improvement

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Default configuration PAR10 (CPU s) (log scale) 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Tuned configuration PAR10 (CPU s) (log scale)

Fawcett, Chris, Lars Kotthofg, and Holger H. Hoos. “Hot-Rodding the Browser Engine: Automatic Confjguration of JavaScript Compilers.” CoRR abs/1707.04245 (2017). http://arxiv.org/abs/1707.04245.

41

slide-43
SLIDE 43

“Deep” Parameter Tuning

▷ automatically identify non-exposed parameters and allow them to be tuned (e.g. magic constants) ▷ tuned dlmalloc library, specialized for e.g. awk, fmex, sed ▷ runtime improvements of up to 12%, decrease in memory consumption of up to 21%

Wu, Fan, Westley Weimer, Mark Harman, Yue Jia, and Jens Krinke. “Deep Parameter Optimisation.” In Conference on Genetic and Evolutionary Computation, 1375–82. GECCO ’15. New York, NY, USA: ACM, 2015. https://doi.org/10.1145/2739480.2754648.

42

slide-44
SLIDE 44

Beyond Software

43

slide-45
SLIDE 45

Outlook

44

slide-46
SLIDE 46

Quo Vadis, Software Engineering?

Run

Hoos, Holger H. “Programming by Optimization.” Communications of the Association for Computing Machinery (CACM) 55, no. 2 (February 2012): 70–80. https://doi.org/10.1145/2076450.2076469.

45

slide-47
SLIDE 47

Quo Vadis, Software Engineering?

Run + AI

Hoos, Holger H. “Programming by Optimization.” Communications of the Association for Computing Machinery (CACM) 55, no. 2 (February 2012): 70–80. https://doi.org/10.1145/2076450.2076469.

45

slide-48
SLIDE 48

(Much) More Information

https://larskotthoff.github.io/assurvey/

Kotthofg, Lars. “Algorithm Selection for Combinatorial Search Problems: A Survey.” AI Magazine 35, no. 3 (2014): 48–60.

46

slide-49
SLIDE 49

Tools and Resources

LLAMA https://bitbucket.org/lkotthoff/llama SATzilla http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/

iRace http://iridia.ulb.ac.be/irace/ mlrMBO https://github.com/mlr-org/mlrMBO SMAC http://www.cs.ubc.ca/labs/beta/Projects/SMAC/

Spearmint https://github.com/HIPS/Spearmint TPE https://jaberg.github.io/hyperopt/ autofolio https://bitbucket.org/mlindauer/autofolio/ Auto-WEKA http://www.cs.ubc.ca/labs/beta/Projects/autoweka/ Auto-sklearn https://github.com/automl/auto-sklearn

47

slide-50
SLIDE 50

Summary

Algorithm Selection choose the best algorithm for solving a problem Algorithm Confjguration choose the best parameter confjguration for solving a problem with an algorithm ▷ mature research areas ▷ can combine confjguration and selection ▷ efgective tools are available ▷ COnfjguration and SElection of ALgorithms group COSEAL http://www.coseal.net

Don’t set parameters prematurely, embrace choice!

48

slide-51
SLIDE 51

I’m hiring!

Several funded graduate positions available.

49