Frank Hutter: Towards efficient automatic learning
Towards efficient automatic end-to-end learning
Frank Hutter
University of Freiburg, Germany
Based on joint work with great students and collaborators (named throughout)
Towards efficient automatic end-to-end learning Frank Hutter - - PowerPoint PPT Presentation
Towards efficient automatic end-to-end learning Frank Hutter University of Freiburg, Germany Based on joint work with great students and collaborators (named throughout) Frank Hutter: Towards efficient automatic learning What will this partial
Frank Hutter: Towards efficient automatic learning
University of Freiburg, Germany
Based on joint work with great students and collaborators (named throughout)
Frank Hutter: Towards efficient automatic learning
2
epoch validation set accuracy
Frank Hutter: Towards efficient automatic learning
– Architectural choices – Optimization algorithm, learning rates, momentum, batch normalization, batch sizes, dropout rates, weight decay, … – Data augmentation & preprocessing
3
… dog cat
# convolutional layers # fully connected layers Units per layer Kernel size
Frank Hutter: Towards efficient automatic learning
4
Frank Hutter: Towards efficient automatic learning
DNN hyperparameter setting Validation performance f() Train DNN and validate it Blackbox
max f()
Grid search, random search, population-based & evolutionary methods, ..., Bayesian
5
f()
Frank Hutter: Towards efficient automatic learning
6
DNN hyperparameter setting Validation performance f() Train DNN and validate it Blackbox
max f()
Frank Hutter: Towards efficient automatic learning
7
Frank Hutter: Towards efficient automatic learning
8
Efficient transfer learning method for automatic hyperparameter tuning
Initializing Bayesian hyperparameter optimization via meta-learning
Bayesian optimization with robust Bayesian neural networks
Frank Hutter: Towards efficient automatic learning
9
[Springenberg, Klein, Falkner, Hutter; NIPS 2016]
Frank Hutter: Towards efficient automatic learning
10
log(C) log() log() log() log() smax /128 smax /16 smax /4 smax log(C) log(C) log(C)
Example: SVM error surface, trained on data subsets of size s
smax smin
Frank Hutter: Towards efficient automatic learning
11
[Klein, Falkner, Bartels, Hennig, Hutter, arXiv 2016]
Frank Hutter: Towards efficient automatic learning
10x-1000x speedup for SVMs, 5x-10x for DNNs
12
Error
Budget of optimizer [s]
[Klein, Falkner, Bartels, Hennig, Hutter, under review at AISTATS 2016]
Frank Hutter: Towards efficient automatic learning
Example: DNN learning curves with different hyperparameter settings
13
time t
Optimization of Deep Neural Networks by Extrapolation of Learning Curves
Frank Hutter: Towards efficient automatic learning
14
Parametric model, e.g. Maximum Likelihood fit: MCMC: to quantify model uncertainty
, ) | (
1
K k k k k t
t f w y
) , ( ~
2
N
K = 11 parametric models
Convex Combination
[Domhan, Springenberg, Hutter; AutoML 2014 & IJCAI 2015]
Frank Hutter: Towards efficient automatic learning
15
1:n)³ 5%
P(ym > ybest | y
1:n)
y
1:n
ym
ybest
epoch validation set accuracy
Frank Hutter: Towards efficient automatic learning
16
P(ym > ybest | y
1:n)
y
1:n
ym
ybest
epoch validation set accuracy
P(ym > ybest | y
1:n)< 5%
Frank Hutter: Towards efficient automatic learning
17
Frank Hutter: Towards efficient automatic learning
18
[Klein, Falkner, Springenberg, Hutter; Bayesian Deep Learning Workshop 2016]
Frank Hutter: Towards efficient automatic learning
19
– Daniel et al, AAAI 2016: Learning step size controllers for robust neural network training – Hansen, arXiv 2016: Using deep Q-Learning to control optimization hyperparameters – Andrychowicz et al, arXiv 2016: Learning to learn by gradient descent by gradient descent
hyper
Frank Hutter: Towards efficient automatic learning
20
Hyperparameter 1 Hyperparameter 2 Hyperparameter 3
Marginal loss
One way to inspect the model: functional ANOVA
explains performance variation due to each subset of hyperparameters
Possible future insights:
1. How stable are good hyperparameter settings across datasets? 2. Which hyperparameters need to change as the dataset grows? 3. Which factors affect empirical convergence rates of SGD?
[Hutter, Hoos, Leyton-Brown; ICML 2014]
Frank Hutter: Towards efficient automatic learning
21
– 500x for software verification [Hutter, Babic, Hoos, Hu, FMCAD 2007] – 50x for MIP [Hutter, Hoos, Leyton-Brown, CPAIOR 2011] – 100x for finding better domain encoding in AI planning
[Vallati, Hutter, Chrpa, McCluskey, IJCAI 2015]
– E.g., SATzilla won SAT competitions 2007, 2009, 2012 (every time we entered) [Xu, Hutter, Hoos, Leyton-Brown, JAIR 2008] – E.g., Cedalion won IPC 2014 Planning & Learning Track [Seipp, Siefert, Helmert, Hutter, AAAI 2015]
Frank Hutter: Towards efficient automatic learning
22