AI-Augmented Algorithms
How I Learned to Stop Worrying and Love Choice Lars Kotthofg
University of Wyoming larsko@uwyo.edu Boulder, 16 January 2019
AI-Augmented Algorithms How I Learned to Stop Worrying and Love - - PowerPoint PPT Presentation
AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthofg University of Wyoming larsko@uwyo.edu Boulder, 16 January 2019 Outline 2 Big Picture Motivation Choosing Algorithms Tuning Algorithms
University of Wyoming larsko@uwyo.edu Boulder, 16 January 2019
▷ Big Picture ▷ Motivation ▷ Choosing Algorithms ▷ Tuning Algorithms ▷ (NCAR-relevant) Applications ▷ Outlook and Resources
2
▷ advance the state of the art through meta-algorithmic techniques ▷ rather than inventing new things, use existing things more intelligently – automatically ▷ invent new things through combinations of existing things https://xkcd.com/720/
3
▷ advance the state of the art through meta-algorithmic techniques ▷ rather than inventing new things, use existing things more intelligently – automatically ▷ invent new things through combinations of existing things https://xkcd.com/720/
3
4
Fréchette, Alexandre, Neil Newman, Kevin Leyton-Brown. “Solving the Station Packing Problem.” In Association for the Advancement of Artifjcial Intelligence (AAAI), 2016.
5
0.1 1 10 100 1000 0.1 1 10 100 1000 Virtual Best SAT Virtual Best CSP
Hurley, Barry, Lars Kotthofg, Yuri Malitsky, and Barry O’Sullivan. “Proteus: A Hierarchical Portfolio of Solvers and Transformations.” In CPAIOR, 2014.
6
Xu, Lin, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. “SATzilla: Portfolio-Based Algorithm Selection for SAT.” J. Artif. Intell. Res. (JAIR) 32 (2008): 565–606.
7
10
−2 10 −1 10 0 10 1 10 2 10 3 10 4
10
−2
10
−1
10 10
1
10
2
10
3
10
4
SPEAR, original default (s) SPEAR, optimized for SWV (s) Hutter, Frank, Domagoj Babic, Holger H. Hoos, and Alan J. Hu. “Boosting Verifjcation by Automatic Tuning of Decision Procedures.” In FMCAD ’07: Proceedings of the Formal Methods in Computer Aided Design, 27–34. Washington, DC, USA: IEEE Computer Society, 2007.
8
Performance models of black-box processes ▷ also called surrogate models ▷ substitute expensive underlying process with cheap approximate model ▷ build approximate model using machine learning techniques based on results of evaluations of the underlying process ▷ no knowledge of what the underlying process is required (but can be helpful) ▷ may facilitate better understanding of the underlying process through interrogation of the model
9
10
Given a problem, choose the best algorithm to solve it.
Rice, John R. “The Algorithm Selection Problem.” Advances in Computers 15 (1976): 65–118.
11
Portfolio Algorithm 2 Algorithm 1 Algorithm 3 Training Instances Instance 2 Instance 1 Instance 3 Algorithm Selection Performance Model Instance 4 Instance 5 Instance 6 . . . Instance 4: Algorithm 2 Instance 5: Algorithm 3 Instance 6: Algorithm 3 . . . Feature Extraction Feature Extraction
12
▷ instead of a single algorithm, use several complementary algorithms ▷ idea from Economics – minimise risk by spreading it out across several securities ▷ same for computational problems – minimise risk of algorithm performing poorly ▷ in practice often constructed from competition winners or
Huberman, Bernardo A., Rajan M. Lukose, and Tad Hogg. “An Economics Approach to Hard Computational Problems.” Science 275, no. 5296 (1997): 51–54. doi:10.1126/science.275.5296.51.
13
“algorithm” used in a very loose sense ▷ algorithms ▷ heuristics ▷ machine learning models ▷ software systems ▷ machines ▷ …
14
Why not simply run all algorithms in parallel? ▷ not enough resources may be available/waste of resources ▷ algorithms may be parallelized themselves ▷ memory/cache contention
15
▷ requires algorithms with complementary performance ▷ most approaches rely on machine learning ▷ train with representative data, i.e. performance of all algorithms in portfolio on a number of instances ▷ evaluate performance on separate set of instances ▷ potentially large amount of prep work
16
▷ feature extraction ▷ performance model ▷ prediction-based selector/scheduler
▷ presolver ▷ secondary/hierarchical models and predictors (e.g. for feature extraction time)
17
Regression Models A1 A2 A3 A1: 1.2 A2: 4.5 A3: 3.9 Classifjcation Model A1 A3 A1 A2 A1 Pairwise Classifjcation Models A1 vs. A2 A1 A2 A1 A1 A1 vs. A3 A1 A1 A3 A3 … A1: 1 vote A2: 0 votes A3: 2 votes Pairwise Regression Models A1 - A2 A1 - A3 … A1: -1.3 A2: 0.4 A3: 1.7 Instance 1 Instance 2 Instance 3 . . . Instance 1: Algorithm 2 Instance 2: Algorithm 1 Instance 3: Algorithm 3 . . . 18
19
Given a (set of) problem(s), fjnd the best parameter confjguration.
20
▷ anything you can change that makes sense to change ▷ e.g. search heuristic, optimization level, computational resolution ▷ not random seed, whether to enable debugging, etc. ▷ some will afgect performance, others will have no efgect at all
21
▷ no background knowledge on parameters or algorithm – black-box process ▷ as little manual intervention as possible
▷ failures are handled appropriately ▷ resources are not wasted ▷ can run unattended on large-scale compute infrastructure
22
Frank Hutter and Marius Lindauer, “Algorithm Confjguration: A Hands on Tutorial”, AAAI 2016
23
▷ evaluate algorithm as black-box function ▷ observe efgect of parameters without knowing the inner workings, build surrogate model based on this data ▷ decide where to evaluate next, based on surrogate model ▷ repeat
24
▷ most approaches incomplete, i.e. do not exhaustively explore parameter space ▷ cannot prove optimality, not guaranteed to fjnd optimal solution (with fjnite time) ▷ performance highly dependent on confjguration space
25
How much time/how many function evaluations? ▷ too much wasted resources ▷ too little suboptimal result ▷ use statistical tests ▷ evaluate on parts of the instance set ▷ for runtime: adaptive capping ▷ in general: whatever resources you can reasonably invest
26
▷ evaluate certain points in parameter space
Bergstra, James, and Yoshua Bengio. “Random Search for Hyper-Parameter Optimization.” J. Mach. Learn. Res. 13, no. 1 (February 2012): 281–305.
27
▷ evaluate small number of confjgurations ▷ build model of parameter-performance surface based on the results ▷ use model to predict where to evaluate next ▷ repeat ▷ allows targeted exploration of new confjgurations ▷ can take instance features into account like algorithm selection
Hutter, Frank, Holger H. Hoos, and Kevin Leyton-Brown. “Sequential Model-Based Optimization for General Algorithm Confjguration.” In LION 5, 507–23, 2011.
28
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.000 0.005 0.010 0.015 0.020 0.025
x type
prop
type
y yhat ei
Iter = 1, Gap = 1.9909e−01
29
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.00 0.01 0.02 0.03
x type
prop seq
type
y yhat ei
Iter = 2, Gap = 1.9909e−01
30
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.000 0.002 0.004 0.006
x type
prop seq
type
y yhat ei
Iter = 3, Gap = 1.9909e−01
31
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0e+00 2e−04 4e−04 6e−04 8e−04
x type
prop seq
type
y yhat ei
Iter = 4, Gap = 1.9992e−01
32
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0e+00 1e−04 2e−04
x type
prop seq
type
y yhat ei
Iter = 5, Gap = 1.9992e−01
33
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.00000 0.00003 0.00006 0.00009 0.00012
x type
prop seq
type
y yhat ei
Iter = 6, Gap = 1.9996e−01
34
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0e+00 1e−05 2e−05 3e−05 4e−05 5e−05
x type
prop seq
type
y yhat ei
Iter = 7, Gap = 2.0000e−01
35
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.0e+00 5.0e−06 1.0e−05 1.5e−05 2.0e−05
x type
prop seq
type
y yhat ei
Iter = 8, Gap = 2.0000e−01
36
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0.0e+00 2.5e−06 5.0e−06 7.5e−06 1.0e−05
x type
prop seq
type
y yhat ei
Iter = 9, Gap = 2.0000e−01
37
ei −1.0 −0.5 0.0 0.5 1.0 0.0 0.4 0.8 0e+00 1e−07 2e−07 3e−07 4e−07
x type
prop seq
type
y yhat ei
Iter = 10, Gap = 2.0000e−01
38
39
▷ pre-defjned optimization levels ofger not much fmexibility ▷ improvements possible by tuning full compiler parameter space ▷ tuned compute-intensive AI algorithms ▷ up to 40% runtime improvement over gcc -O2/-O3
Pérez Cáceres, Leslie, Federico Pagnozzi, Alberto Franzin, and Thomas Stützle. “Automatic Confjguration of GCC Using Irace.” In Artifjcial Evolution, edited by Evelyne Lutton, Pierrick Legrand, Pierre Parrend, Nicolas Monmarché, and Marc Schoenauer, 202–16. Cham: Springer International Publishing, 2018.
40
▷ not only for C/C++ ▷ JavaScript (JavaScriptCode, V8) optimized for standard benchmarks ▷ up to 35% runtime improvement
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Default configuration PAR10 (CPU s) (log scale) 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Tuned configuration PAR10 (CPU s) (log scale)
Fawcett, Chris, Lars Kotthofg, and Holger H. Hoos. “Hot-Rodding the Browser Engine: Automatic Confjguration of JavaScript Compilers.” CoRR abs/1707.04245 (2017). http://arxiv.org/abs/1707.04245.
41
▷ automatically identify non-exposed parameters and allow them to be tuned (e.g. magic constants) ▷ tuned dlmalloc library, specialized for e.g. awk, fmex, sed ▷ runtime improvements of up to 12%, decrease in memory consumption of up to 21%
Wu, Fan, Westley Weimer, Mark Harman, Yue Jia, and Jens Krinke. “Deep Parameter Optimisation.” In Conference on Genetic and Evolutionary Computation, 1375–82. GECCO ’15. New York, NY, USA: ACM, 2015. https://doi.org/10.1145/2739480.2754648.
42
43
44
Run
Hoos, Holger H. “Programming by Optimization.” Communications of the Association for Computing Machinery (CACM) 55, no. 2 (February 2012): 70–80. https://doi.org/10.1145/2076450.2076469.
45
Run + AI
Hoos, Holger H. “Programming by Optimization.” Communications of the Association for Computing Machinery (CACM) 55, no. 2 (February 2012): 70–80. https://doi.org/10.1145/2076450.2076469.
45
https://larskotthoff.github.io/assurvey/
Kotthofg, Lars. “Algorithm Selection for Combinatorial Search Problems: A Survey.” AI Magazine 35, no. 3 (2014): 48–60.
46
LLAMA https://bitbucket.org/lkotthoff/llama SATzilla http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/
iRace http://iridia.ulb.ac.be/irace/ mlrMBO https://github.com/mlr-org/mlrMBO SMAC http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
Spearmint https://github.com/HIPS/Spearmint TPE https://jaberg.github.io/hyperopt/ autofolio https://bitbucket.org/mlindauer/autofolio/ Auto-WEKA http://www.cs.ubc.ca/labs/beta/Projects/autoweka/ Auto-sklearn https://github.com/automl/auto-sklearn
47
Algorithm Selection choose the best algorithm for solving a problem Algorithm Confjguration choose the best parameter confjguration for solving a problem with an algorithm ▷ mature research areas ▷ can combine confjguration and selection ▷ efgective tools are available ▷ COnfjguration and SElection of ALgorithms group COSEAL http://www.coseal.net
48
Several funded graduate positions available.
49