Using Meta Learning to Initialize Bayesian Optimization - PowerPoint PPT Presentation

Using Meta Learning to Initialize Bayesian Optimization Albert-Ludwigs-Universität Freiburg Matthias Feurer 1 Jost Tobias Springenberg 2 Frank Hutter 1 1 Research Group on Learning, Optimization, and Automated Algorithm Design 2 Machine Learning Lab Department of Computer Science, University of Freiburg, Germany ECAI-2014 Workshop on Meta-learning & Algorithm Selection, 19 August 2014

Your task: Build an Iris classification system The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0. MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

Your task: Build an Iris classification system Choose an algorithm based on dataset characteristics, e.g. for the Iris dataset this could be an SVM The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0. MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

Your task: Build an Iris classification system Choose an algorithm based on dataset characteristics, e.g. for the Iris dataset this could be an SVM Manual tuning -> fiddling with hyperparameters. The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0. MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

Your task: Build an Iris classification system Choose an algorithm based on dataset characteristics, e.g. for the Iris dataset this could be an SVM Manual tuning -> fiddling with hyperparameters. Better: Use automated methods like PSO, GA or SMBO The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0. MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

Your task: Build an Iris classification system Choose an algorithm based on dataset characteristics, e.g. for the Iris dataset this could be an SVM Manual tuning -> fiddling with hyperparameters. Better: Use automated methods like PSO, GA or SMBO Best: AutoWeka The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0. MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18

Adding the Iris Japonica to the dataset The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI under CC BY-SA 3.0 MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18

Adding the Iris Japonica to the dataset Manual tuning: Use experience and start from the parameters found on the Iris dataset The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI under CC BY-SA 3.0 MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18

Adding the Iris Japonica to the dataset Manual tuning: Use experience and start from the parameters found on the Iris dataset Automated methods -> start from scratch The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI under CC BY-SA 3.0 MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18

Adding the Iris Japonica to the dataset Manual tuning: Use experience and start from the parameters found on the Iris dataset Automated methods -> start from scratch → Cast use experience into an algorithm. The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI under CC BY-SA 3.0 MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18

Sequential Model-based Bayesian Optimization (SMBO) ML Algorithm A Configuration Space Λ of A Dataset D MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 4 / 18

Sequential Model-based Bayesian Optimization (SMBO) ML Algorithm A Configuration Space Λ of A Dataset D Configuration Task Configuration λ ∗ MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 4 / 18

Sequential Model-based Bayesian Optimization (SMBO) Fit regression Select promising ML Algorithm A model on configuration ( λ , A λ ( D )) pairs λ ∈ Λ Configuration Space Λ of A Evaluate A λ ( D ) Dataset D Configuration Task Configuration λ ∗ MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 4 / 18

Metalearning-Initialized SMBO (MI-SMBO) Fit regression Select promising ML Algorithm A model on pairs of configuration ( λ , A λ ( D new )) λ ∈ Λ Configuration Space Λ of A Evaluate Dataset D new A λ ( D new ) Configuration Task Configuration λ ∗ MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 5 / 18

Metalearning-Initialized SMBO (MI-SMBO) Initialize Search Find Datasets D i with λ ∗ similar to D new D i Fit regression Select promising ML Algorithm A model on pairs of configuration ( λ , A λ ( D new )) λ ∈ Λ Configuration Space Λ of A Evaluate Dataset D new A λ ( D new ) Configuration Task Configuration λ ∗ MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 5 / 18

Metafeatures # training examples: 150 # classes: 3 # features: 4 # numerical features: 4 # categorical features: 0 missing values? No The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0. MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 6 / 18

Metalearning-Initialized Bayesian Optimization For a new dataset D new : Sort known datasets D 1 : N by distance to D new . For each of these datasets, extract the best known hyperparameter configuration λ ∗ D i . Initialize SMBO with the first k hyperparameter configurations from the sorted list. MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 7 / 18

Similarity of Datasets a s y i u g m t n p a o h e l t y G o y s l b i h o a R I y d t c o o u r u u o O s r . s a a a e z l d a n e a r o n b m n - a r u e d - t y n i a r y l m c i y r p r o g b o s l a i a o l t m t t i t a a a s d m p r i - e u t r h n e o e d t e l i l t i z a x a p i p b r - - h e t - v a t i e r t f a a m e r e h p s r o o t s t o c a p f h - t p a m e f y m l r e i e r u o t o f v r - e t r c a a n e n a m f o c e s - t k s n i a r e e r z b c - t t - a r e e a r e m f e e h h n p s o l o a n b o a a i - t i d e l r e c g w - g t o i o d l v e a t r c t s - r t a n n e a e h m n r u e h b r a w a h k - - s t t a a e e l e a r o m f c b t s - r g i c e a d o t l - r o c o h i s p t i d r - o r m s e t v s - i e t g l i a i t e d e f t b m p a o d i p k s - m r v e - o t r t k o e r l h s u m s t t e s y e i c 0 e e s r s e t s n g s r o l m 0 l k a i s i a e a e c l a a c c r g a i c c 0 t c i a i e m m g l s h e 5 o b d y r e s l m n g i u m - - b e e t n v e - a c e a p s s r n g p o a s f a e a l p v a b w MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

Using Meta Learning to Initialize Bayesian Optimization - PowerPoint PPT Presentation

Using Meta Learning to Initialize Bayesian Optimization Albert-Ludwigs-Universitt Freiburg Matthias Feurer 1 Jost Tobias Springenberg 2 Frank Hutter 1 1 Research Group on Learning, Optimization, and Automated Algorithm Design 2 Machine Learning

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Meta-Learning CS 330 1 Logistics Homework 2 due next Wednesday. Project proposal due in

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior Zi Wang*

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Comparison of Bayesian Network Meta-Analysis Models for Survival Data Purvi Prajapati James

Meta-Bayesian Analysis Jun Yang joint work with Daniel M. Roy Department of Statistical Sciences

Bayesian Optimization CSC2541 - Topics in Machine Learning Scalable and Flexible Models of

Welcome! If not using speakers and you havent already, please call into the call center number

Other Cool Stuff together with Eclipse and Modelling Carsten Gosvig SW Tools Architect 20 March

United States Pharmacopeia: 2016 Excipients Stakeholder Forum Wrap-UpSeptember 29, 2016

Go Further Go Further Go Further Go Further BOB SHANKS: EXECUTIVE VICE PRESIDENT AND CFO

"Pretty much all the honest truth telling there is in the w orld is done by children."

We would like to begin by acknowledging that we are on Treaty 1 territory and that the land on

Dynamic Re-ordering in Mining Top- k Productive Discriminative Patterns Yoshitaka Kameya * and

Quarterly Cost Reports County Treasurers Costs of Direct Services Audiological, Speech,

Using Meta Learning to Initialize Bayesian Optimization - PowerPoint PPT Presentation

Using Meta Learning to Initialize Bayesian Optimization Albert-Ludwigs-Universitt Freiburg Matthias Feurer 1 Jost Tobias Springenberg 2 Frank Hutter 1 1 Research Group on Learning, Optimization, and Automated Algorithm Design 2 Machine Learning

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Meta-Learning CS 330 1 Logistics Homework 2 due next Wednesday. Project proposal due in

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior Zi Wang*

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Comparison of Bayesian Network Meta-Analysis Models for Survival Data Purvi Prajapati James

Meta-Bayesian Analysis Jun Yang joint work with Daniel M. Roy Department of Statistical Sciences

Bayesian Optimization CSC2541 - Topics in Machine Learning Scalable and Flexible Models of

Welcome! If not using speakers and you havent already, please call into the call center number

Other Cool Stuff together with Eclipse and Modelling Carsten Gosvig SW Tools Architect 20 March

United States Pharmacopeia: 2016 Excipients Stakeholder Forum Wrap-UpSeptember 29, 2016

Go Further Go Further Go Further Go Further BOB SHANKS: EXECUTIVE VICE PRESIDENT AND CFO

&quot;Pretty much all the honest truth telling there is in the w orld is done by children.&quot;

We would like to begin by acknowledging that we are on Treaty 1 territory and that the land on

Dynamic Re-ordering in Mining Top- k Productive Discriminative Patterns Yoshitaka Kameya * and

Quarterly Cost Reports County Treasurers Costs of Direct Services Audiological, Speech,

"Pretty much all the honest truth telling there is in the w orld is done by children."