towards efficient automatic end to end learning
play

Towards efficient automatic end-to-end learning Frank Hutter - PowerPoint PPT Presentation

Towards efficient automatic end-to-end learning Frank Hutter University of Freiburg, Germany Based on joint work with great students and collaborators (named throughout) Frank Hutter: Towards efficient automatic learning What will this partial


  1. Towards efficient automatic end-to-end learning Frank Hutter University of Freiburg, Germany Based on joint work with great students and collaborators (named throughout) Frank Hutter: Towards efficient automatic learning

  2. What will this partial learning curve converge to? ? validation set accuracy epoch 2 Frank Hutter: Towards efficient automatic learning

  3. One Problem of Deep Learning Performance is very sensitive to many hyperparameters – Architectural choices Units per layer dog cat … Kernel size # convolutional layers # fully connected layers – Optimization algorithm, learning rates , momentum, batch normalization, batch siz es, dropout rates, weight decay, … – Data augmentation & preprocessing 3 Frank Hutter: Towards efficient automatic learning

  4. Towards True End-to-end Learning Current deep learning practice Expert chooses Deep architecture & learning hyperparameters “end -to- end” AutoML: true end-to-end learning Meta-level Learning learning & End-to-end learning box optimization 4 Frank Hutter: Towards efficient automatic learning

  5. The standard approach: blackbox optimization Train DNN Validation DNN hyperparameter performance f(  ) setting  and validate it max f(  ) Blackbox  optimizer Grid search, random search, population-based & evolutionary f(  ) methods, ..., Bayesian optimization  5 Frank Hutter: Towards efficient automatic learning

  6. The standard approach: blackbox optimization Train DNN Validation DNN hyperparameter performance f(  ) setting  and validate it max f(  ) Blackbox  optimizer Too slow for tuning DNNs  Need more fine-grained actions 6 Frank Hutter: Towards efficient automatic learning

  7. ways in which we can go beyond blackbox optimization 7 Frank Hutter: Towards efficient automatic learning

  8. 1. We can use transfer learning Transfer learning from other datasets D  f(  , D ) , need scalable model Using Gaussian process models • Bardenet et al, ICML 2013: Collaborative hyperparameter tuning • Swersky et al, NIPS 2013: Multi-task Bayesian optimization • Yogatama & Mann, AISTATS 2014: Efficient transfer learning method for automatic hyperparameter tuning Using other models • Feurer et al, AAAI 2015: Initializing Bayesian hyperparameter optimization via meta-learning • Springenberg et al, NIPS 2016: Bayesian optimization with robust Bayesian neural networks 8 Frank Hutter: Towards efficient automatic learning

  9. Example: BO with robust Bayesian neural nets [Springenberg, Klein, Falkner, Hutter; NIPS 2016] https://github.com/automl/RoBO Well-calibrated uncertainty estimates Scalable for multitask problems 9 Frank Hutter: Towards efficient automatic learning

  10. 2. We can reason over cheaper subsets of data Large datasets: start from small subsets of size s  f(  , s) , need model that extrapolates well s min s max Example: SVM error surface, trained on data subsets of size s s max /128 s max /16 s max /4 s max log(C) log(C) log(C) log(C) log(  ) log(  ) log(  ) log(  ) 10 Frank Hutter: Towards efficient automatic learning

  11. Example: Fast Bayesian optimization on large datasets [Klein, Falkner, Bartels, Hennig, Hutter, arXiv 2016] • Automatically choose dataset size for each evaluation – Trading off information gain about global optimum vs. cost • Entropy Search – Based on a probability distribution of where the minimum lies: • Strategy: pick configuration & data size pair ( x, s ) to maximally decrease entropy of p min per time spent 11 Frank Hutter: Towards efficient automatic learning

  12. Example: Fast Bayesian optimization on large datasets [Klein, Falkner, Bartels, Hennig, Hutter, under review at AISTATS 2016] 10x-1000x speedup for SVMs, 5x-10x for DNNs https://github.com/automl/RoBO Error Budget of optimizer [s] 12 Frank Hutter: Towards efficient automatic learning

  13. 3. We can model the online performance of DNNs Graybox optimization  f(  , t) Example: DNN learning curves with different hyperparameter settings time t • Swersky et al, arXiv 2014: Freeze-Thaw Bayesian optimization • Domhan et al, AutoML 2014 & IJCAI 2015: Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves 13 Frank Hutter: Towards efficient automatic learning

  14. Example: extrapolating learning curves [Domhan, Springenberg, Hutter; AutoML 2014 & IJCAI 2015] K = 11 parametric models Parametric model, e.g. Convex Combination of these models: Maximum Likelihood fit: MCMC: to quantify model uncertainty K        2 ~ ( 0 , ) ( | ) , y w f t N t k k k  1 k ? 14 Frank Hutter: Towards efficient automatic learning

  15. Example: extrapolating learning curves validation set accuracy P ( y m > y best | y 1: n ) y best continue P ( y m > y best | y 1: n ) ³ 5% training… y m y epoch 1: n 15 Frank Hutter: Towards efficient automatic learning

  16. Example: extrapolating learning curves validation set accuracy y best P ( y m > y best | y 1: n ) P ( y m > y best | y 1: n ) < 5% Terminate! y m y epoch 1: n 16 Frank Hutter: Towards efficient automatic learning

  17. Example: extrapolating learning curves 17 Frank Hutter: Towards efficient automatic learning

  18. Easy to include in a Bayesian neural network [Klein, Falkner, Springenberg, Hutter; Bayesian Deep Learning Workshop 2016] 18 Frank Hutter: Towards efficient automatic learning

  19. 4. We can change hyperparameters on the fly hyper • Common practice: change SGD learning rates over time • Can automate & improve with reinforcement learning – Daniel et al, AAAI 2016: Learning step size controllers for robust neural network training – Hansen, arXiv 2016: Using deep Q-Learning to control optimization hyperparameters – Andrychowicz et al, arXiv 2016: Learning to learn by gradient descent by gradient descent 19 Frank Hutter: Towards efficient automatic learning

  20. 5. We can automatically gain scientific insights [Hutter, Hoos, Leyton-Brown; ICML 2014] Marginal loss Hyperparameter 1 Hyperparameter 2 Hyperparameter 3 One way to inspect the model: functional ANOVA explains performance variation due to each subset of hyperparameters Possible future insights: 1. How stable are good hyperparameter settings across datasets ? 2. Which hyperparameters need to change as the dataset grows ? 3. Which factors affect empirical convergence rates of SGD ? 20 Frank Hutter: Towards efficient automatic learning

  21. Learning to optimize, to plan, etc. • Algorithm configuration often speeds up solvers a lot – 500x for software verification [Hutter, Babic, Hoos, Hu, FMCAD 2007] – 50x for MIP [Hutter, Hoos, Leyton-Brown, CPAIOR 2011] – 100x for finding better domain encoding in AI planning [Vallati, Hutter, Chrpa, McCluskey, IJCAI 2015] • Algorithm portfolios won many competitions – E.g., SATzilla won SAT competitions 2007, 2009, 2012 (every time we entered) [Xu, Hutter, Hoos, Leyton-Brown, JAIR 2008] – E.g., Cedalion won IPC 2014 Planning & Learning Track [Seipp, Siefert, Helmert, Hutter, AAAI 2015] 21 Frank Hutter: Towards efficient automatic learning

  22. Conclusion: moving beyond hand-designed learners Some ways of making this efficient Transfer learning: exploit strong priors Exploit cheaper, approximate blackboxes Graybox: partial feedback during evaluation Whitebox: hyperparameter control (RL) https://github.com/automl/RoBO 22 Frank Hutter: Towards efficient automatic learning

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend