Towards efficient automatic end-to-end learning Frank Hutter - PowerPoint PPT Presentation

Towards efficient automatic end-to-end learning Frank Hutter University of Freiburg, Germany Based on joint work with great students and collaborators (named throughout) Frank Hutter: Towards efficient automatic learning

What will this partial learning curve converge to? ? validation set accuracy epoch 2 Frank Hutter: Towards efficient automatic learning

One Problem of Deep Learning Performance is very sensitive to many hyperparameters – Architectural choices Units per layer dog cat … Kernel size # convolutional layers # fully connected layers – Optimization algorithm, learning rates , momentum, batch normalization, batch siz es, dropout rates, weight decay, … – Data augmentation & preprocessing 3 Frank Hutter: Towards efficient automatic learning

Towards True End-to-end Learning Current deep learning practice Expert chooses Deep architecture & learning hyperparameters “end -to- end” AutoML: true end-to-end learning Meta-level Learning learning & End-to-end learning box optimization 4 Frank Hutter: Towards efficient automatic learning

The standard approach: blackbox optimization Train DNN Validation DNN hyperparameter performance f(  ) setting  and validate it max f(  ) Blackbox  optimizer Grid search, random search, population-based & evolutionary f(  ) methods, ..., Bayesian optimization  5 Frank Hutter: Towards efficient automatic learning

The standard approach: blackbox optimization Train DNN Validation DNN hyperparameter performance f(  ) setting  and validate it max f(  ) Blackbox  optimizer Too slow for tuning DNNs  Need more fine-grained actions 6 Frank Hutter: Towards efficient automatic learning

ways in which we can go beyond blackbox optimization 7 Frank Hutter: Towards efficient automatic learning

1. We can use transfer learning Transfer learning from other datasets D  f(  , D ) , need scalable model Using Gaussian process models • Bardenet et al, ICML 2013: Collaborative hyperparameter tuning • Swersky et al, NIPS 2013: Multi-task Bayesian optimization • Yogatama & Mann, AISTATS 2014: Efficient transfer learning method for automatic hyperparameter tuning Using other models • Feurer et al, AAAI 2015: Initializing Bayesian hyperparameter optimization via meta-learning • Springenberg et al, NIPS 2016: Bayesian optimization with robust Bayesian neural networks 8 Frank Hutter: Towards efficient automatic learning

Example: BO with robust Bayesian neural nets [Springenberg, Klein, Falkner, Hutter; NIPS 2016] https://github.com/automl/RoBO Well-calibrated uncertainty estimates Scalable for multitask problems 9 Frank Hutter: Towards efficient automatic learning

2. We can reason over cheaper subsets of data Large datasets: start from small subsets of size s  f(  , s) , need model that extrapolates well s min s max Example: SVM error surface, trained on data subsets of size s s max /128 s max /16 s max /4 s max log(C) log(C) log(C) log(C) log(  ) log(  ) log(  ) log(  ) 10 Frank Hutter: Towards efficient automatic learning

Example: Fast Bayesian optimization on large datasets [Klein, Falkner, Bartels, Hennig, Hutter, arXiv 2016] • Automatically choose dataset size for each evaluation – Trading off information gain about global optimum vs. cost • Entropy Search – Based on a probability distribution of where the minimum lies: • Strategy: pick configuration & data size pair ( x, s ) to maximally decrease entropy of p min per time spent 11 Frank Hutter: Towards efficient automatic learning

Example: Fast Bayesian optimization on large datasets [Klein, Falkner, Bartels, Hennig, Hutter, under review at AISTATS 2016] 10x-1000x speedup for SVMs, 5x-10x for DNNs https://github.com/automl/RoBO Error Budget of optimizer [s] 12 Frank Hutter: Towards efficient automatic learning

3. We can model the online performance of DNNs Graybox optimization  f(  , t) Example: DNN learning curves with different hyperparameter settings time t • Swersky et al, arXiv 2014: Freeze-Thaw Bayesian optimization • Domhan et al, AutoML 2014 & IJCAI 2015: Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves 13 Frank Hutter: Towards efficient automatic learning

Example: extrapolating learning curves [Domhan, Springenberg, Hutter; AutoML 2014 & IJCAI 2015] K = 11 parametric models Parametric model, e.g. Convex Combination of these models: Maximum Likelihood fit: MCMC: to quantify model uncertainty K        2 ~ ( 0 , ) ( | ) , y w f t N t k k k  1 k ? 14 Frank Hutter: Towards efficient automatic learning

Example: extrapolating learning curves validation set accuracy P ( y m > y best | y 1: n ) y best continue P ( y m > y best | y 1: n ) ³ 5% training… y m y epoch 1: n 15 Frank Hutter: Towards efficient automatic learning

Example: extrapolating learning curves validation set accuracy y best P ( y m > y best | y 1: n ) P ( y m > y best | y 1: n ) < 5% Terminate! y m y epoch 1: n 16 Frank Hutter: Towards efficient automatic learning

Example: extrapolating learning curves 17 Frank Hutter: Towards efficient automatic learning

Easy to include in a Bayesian neural network [Klein, Falkner, Springenberg, Hutter; Bayesian Deep Learning Workshop 2016] 18 Frank Hutter: Towards efficient automatic learning

4. We can change hyperparameters on the fly hyper • Common practice: change SGD learning rates over time • Can automate & improve with reinforcement learning – Daniel et al, AAAI 2016: Learning step size controllers for robust neural network training – Hansen, arXiv 2016: Using deep Q-Learning to control optimization hyperparameters – Andrychowicz et al, arXiv 2016: Learning to learn by gradient descent by gradient descent 19 Frank Hutter: Towards efficient automatic learning

5. We can automatically gain scientific insights [Hutter, Hoos, Leyton-Brown; ICML 2014] Marginal loss Hyperparameter 1 Hyperparameter 2 Hyperparameter 3 One way to inspect the model: functional ANOVA explains performance variation due to each subset of hyperparameters Possible future insights: 1. How stable are good hyperparameter settings across datasets ? 2. Which hyperparameters need to change as the dataset grows ? 3. Which factors affect empirical convergence rates of SGD ? 20 Frank Hutter: Towards efficient automatic learning

Learning to optimize, to plan, etc. • Algorithm configuration often speeds up solvers a lot – 500x for software verification [Hutter, Babic, Hoos, Hu, FMCAD 2007] – 50x for MIP [Hutter, Hoos, Leyton-Brown, CPAIOR 2011] – 100x for finding better domain encoding in AI planning [Vallati, Hutter, Chrpa, McCluskey, IJCAI 2015] • Algorithm portfolios won many competitions – E.g., SATzilla won SAT competitions 2007, 2009, 2012 (every time we entered) [Xu, Hutter, Hoos, Leyton-Brown, JAIR 2008] – E.g., Cedalion won IPC 2014 Planning & Learning Track [Seipp, Siefert, Helmert, Hutter, AAAI 2015] 21 Frank Hutter: Towards efficient automatic learning

Conclusion: moving beyond hand-designed learners Some ways of making this efficient Transfer learning: exploit strong priors Exploit cheaper, approximate blackboxes Graybox: partial feedback during evaluation Whitebox: hyperparameter control (RL) https://github.com/automl/RoBO 22 Frank Hutter: Towards efficient automatic learning

Towards efficient automatic end-to-end learning Frank Hutter - PowerPoint PPT Presentation

Towards efficient automatic end-to-end learning Frank Hutter University of Freiburg, Germany Based on joint work with great students and collaborators (named throughout) Frank Hutter: Towards efficient automatic learning What will this partial

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Towards Efficient Distributed Towards Efficient Distributed Simulation in Modelica using

Towards Heterogeneous Automatic MT Error Analysis (6th LREC) Jes us Gim enez and Llu

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

Advice Automatic Structures and Uniformly Automatic Classes Faried Abu Zaid 1 , Erich Grdel 2 ,

Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master

Writing reliable end to end tests End to end browser tests They take a long time to run. Around

Cirrus: A Serverless Framework for End-to-end ML Workflows Joao Carreira , Pedro Fonseca, Alexey

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Towards efficient end-to-end architectures for action recognition and detection in videos Limin

Towards a Secure and Efficient System for End-to-End Provenance Patrick McDaniel, Kevin Butler,

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems 1 Leda Sar

Automatic Learning with Feedback Queries Automatic Refers to Accepted by Finite Automata John Case

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman

Scalable Bandit Methods for Hyper-parameter Tuning Kirthevasan Kandasamy Carnegie Mellon

CMSC5743 L09: Network Architecture Search Bei Yu (Latest update: September 13, 2020) Fall 2020

Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal with ties

Hyperparameter optimization strategies git clone

Calculating Hypergradient Jingchang Liu November 13, 2019 HKUST 1 Table of Contents

Separating hyperplanes S a closed, convex set Point x not in S ==> strict

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Towards efficient automatic end-to-end learning Frank Hutter - PowerPoint PPT Presentation

Towards efficient automatic end-to-end learning Frank Hutter University of Freiburg, Germany Based on joint work with great students and collaborators (named throughout) Frank Hutter: Towards efficient automatic learning What will this partial

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Towards Efficient Distributed Towards Efficient Distributed Simulation in Modelica using

Towards Heterogeneous Automatic MT Error Analysis (6th LREC) Jes us Gim enez and Llu

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

Advice Automatic Structures and Uniformly Automatic Classes Faried Abu Zaid 1 , Erich Grdel 2 ,

Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master

Writing reliable end to end tests End to end browser tests They take a long time to run. Around

Cirrus: A Serverless Framework for End-to-end ML Workflows Joao Carreira , Pedro Fonseca, Alexey

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Towards efficient end-to-end architectures for action recognition and detection in videos Limin

Towards a Secure and Efficient System for End-to-End Provenance Patrick McDaniel, Kevin Butler,

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems 1 Leda Sar

Automatic Learning with Feedback Queries Automatic Refers to Accepted by Finite Automata John Case

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman

Scalable Bandit Methods for Hyper-parameter Tuning Kirthevasan Kandasamy Carnegie Mellon

CMSC5743 L09: Network Architecture Search Bei Yu (Latest update: September 13, 2020) Fall 2020

Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&amp;A Q: How do we deal with ties

Hyperparameter optimization strategies git clone

Calculating Hypergradient Jingchang Liu November 13, 2019 HKUST 1 Table of Contents

Separating hyperplanes S a closed, convex set Point x not in S ==&gt; strict

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal with ties

Separating hyperplanes S a closed, convex set Point x not in S ==> strict