Using Meta Learning to Initialize Bayesian Optimization - - PowerPoint PPT Presentation

using meta learning to initialize bayesian optimization
SMART_READER_LITE
LIVE PREVIEW

Using Meta Learning to Initialize Bayesian Optimization - - PowerPoint PPT Presentation

Using Meta Learning to Initialize Bayesian Optimization Albert-Ludwigs-Universitt Freiburg Matthias Feurer 1 Jost Tobias Springenberg 2 Frank Hutter 1 1 Research Group on Learning, Optimization, and Automated Algorithm Design 2 Machine Learning


slide-1
SLIDE 1

Using Meta Learning to Initialize Bayesian Optimization

Albert-Ludwigs-Universität Freiburg

Matthias Feurer1 Jost Tobias Springenberg2 Frank Hutter1

1Research Group on Learning, Optimization, and Automated Algorithm Design 2Machine Learning Lab

Department of Computer Science, University of Freiburg, Germany ECAI-2014 Workshop on Meta-learning & Algorithm Selection, 19 August 2014

slide-2
SLIDE 2

Your task: Build an Iris classification system

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0.

slide-3
SLIDE 3

Your task: Build an Iris classification system

Choose an algorithm based on dataset characteristics, e.g. for the Iris dataset this could be an SVM

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0.

slide-4
SLIDE 4

Your task: Build an Iris classification system

Choose an algorithm based on dataset characteristics, e.g. for the Iris dataset this could be an SVM Manual tuning

  • > fiddling with

hyperparameters.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0.

slide-5
SLIDE 5

Your task: Build an Iris classification system

Choose an algorithm based on dataset characteristics, e.g. for the Iris dataset this could be an SVM Manual tuning

  • > fiddling with

hyperparameters. Better: Use automated methods like PSO, GA or SMBO

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0.

slide-6
SLIDE 6

Your task: Build an Iris classification system

Choose an algorithm based on dataset characteristics, e.g. for the Iris dataset this could be an SVM Manual tuning

  • > fiddling with

hyperparameters. Better: Use automated methods like PSO, GA or SMBO Best: AutoWeka

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 2 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0.

slide-7
SLIDE 7

Adding the Iris Japonica to the dataset

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI under CC BY-SA 3.0

slide-8
SLIDE 8

Adding the Iris Japonica to the dataset

Manual tuning: Use experience and start from the parameters found on the Iris dataset

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI under CC BY-SA 3.0

slide-9
SLIDE 9

Adding the Iris Japonica to the dataset

Manual tuning: Use experience and start from the parameters found on the Iris dataset Automated methods

  • > start from scratch

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI under CC BY-SA 3.0

slide-10
SLIDE 10

Adding the Iris Japonica to the dataset

Manual tuning: Use experience and start from the parameters found on the Iris dataset Automated methods

  • > start from scratch

→ Cast use experience into an algorithm.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 3 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0; Bottom right: Iris Japonica is licensed by KENPEI under CC BY-SA 3.0

slide-11
SLIDE 11

Sequential Model-based Bayesian Optimization (SMBO)

ML Algorithm A Configuration Space Λ of A Dataset D

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 4 / 18

slide-12
SLIDE 12

Sequential Model-based Bayesian Optimization (SMBO)

Configuration Task ML Algorithm A Configuration Space Λ of A Dataset D Configuration λ ∗

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 4 / 18

slide-13
SLIDE 13

Sequential Model-based Bayesian Optimization (SMBO)

Configuration Task ML Algorithm A Configuration Space Λ of A Dataset D Configuration λ ∗ Fit regression model on (λ,Aλ(D)) pairs Select promising configuration λ ∈ Λ Evaluate Aλ(D)

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 4 / 18

slide-14
SLIDE 14

Metalearning-Initialized SMBO (MI-SMBO)

Configuration Task ML Algorithm A Configuration Space Λ of A Dataset Dnew Fit regression model on pairs of (λ,Aλ(Dnew)) Select promising configuration λ ∈ Λ Evaluate Aλ(Dnew) Configuration λ ∗

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 5 / 18

slide-15
SLIDE 15

Metalearning-Initialized SMBO (MI-SMBO)

Configuration Task ML Algorithm A Configuration Space Λ of A Dataset Dnew Fit regression model on pairs of (λ,Aλ(Dnew)) Select promising configuration λ ∈ Λ Evaluate Aλ(Dnew) Configuration λ ∗ Find Datasets Di similar to Dnew Initialize Search with λ ∗

Di

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 5 / 18

slide-16
SLIDE 16

Metafeatures

# training examples: 150 # classes: 3 # features: 4 # numerical features: 4 # categorical features: 0 missing values? No

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 6 / 18 The iris pictures on this slide are from wikimedia commons and used under the following licenses: Top left: Iris Versicolor is public domain; Bottom left: Iris setosa is licensed by Radomil under CC BY-SA 3.0; Top right: Iris Virginica is licensed by C T Johansson under CC BY 3.0.

slide-17
SLIDE 17

Metalearning-Initialized Bayesian Optimization

For a new dataset Dnew: Sort known datasets D1:N by distance to Dnew. For each of these datasets, extract the best known hyperparameter configuration λ ∗

Di.

Initialize SMBO with the first k hyperparameter configurations from the sorted list.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 7 / 18

slide-18
SLIDE 18

Similarity of Datasets

w a v e f

  • r

m

  • 5

i r i s p e n d i g i t s y e a s t s a t i m a g e b a l a n c e

  • s

c a l e e c

  • l

i c m c p a g e

  • b

l

  • c

k s t a e g l a s s n u r s e r y s p a m b a s e v e h i c l e s e g m e n t c a r

  • p

t d i g i t s p r i m a r y

  • t

u m

  • r

m f e a t

  • f
  • u

r i e r m f e a t

  • k

a r h u n e n m f e a t

  • z

e r n i k e a b a l

  • n

e v

  • w

e l m f e a t

  • f

a c t

  • r

s d e r m a t

  • l
  • g

y l e t t e r m f e a t

  • m
  • r

p h

  • l
  • g

i c a l m f e a t

  • p

i x e l e u c a l y p t u s a r r h y t h m i a z

  • a

u d i

  • l
  • g

y a u t

  • s

s

  • y

b e a n c y l i n d e r

  • b

a n d s v

  • t

e i

  • n
  • s

p h e r e t i c

  • t

a c

  • t
  • e

b r a z i l t

  • u

r i s m h e p a t i t i s l a b

  • r

h e a r t

  • s

t a t l

  • g

b r e a s t

  • w

m u s h r

  • m

h e a r t

  • c

h e a r t

  • h

p

  • s

t

  • p

e r a t i v e

  • p

a t i e n t

  • d

a t a l y m p h b r e a s t

  • c

a n c e r d i a b e t e s l i v e r

  • d

i s

  • r

d e r s k r

  • v

s

  • k

p c r e d i t

  • a

a n n e a l . O R I G c r e d i t

  • g

s

  • n

a r h a b e r m a n

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

slide-19
SLIDE 19

Finding the nearest datasets (1)

w a v e f

  • r

m

  • 5

i r i s p e n d i g i t s y e a s t s a t i m a g e b a l a n c e

  • s

c a l e e c

  • l

i c m c p a g e

  • b

l

  • c

k s t a e g l a s s n u r s e r y s p a m b a s e v e h i c l e s e g m e n t c a r

  • p

t d i g i t s p r i m a r y

  • t

u m

  • r

m f e a t

  • f
  • u

r i e r m f e a t

  • k

a r h u n e n m f e a t

  • z

e r n i k e a b a l

  • n

e v

  • w

e l m f e a t

  • f

a c t

  • r

s d e r m a t

  • l
  • g

y l e t t e r m f e a t

  • m
  • r

p h

  • l
  • g

i c a l m f e a t

  • p

i x e l e u c a l y p t u s a r r h y t h m i a z

  • a

u d i

  • l
  • g

y a u t

  • s

s

  • y

b e a n c y l i n d e r

  • b

a n d s v

  • t

e i

  • n
  • s

p h e r e t i c

  • t

a c

  • t
  • e

b r a z i l t

  • u

r i s m h e p a t i t i s l a b

  • r

h e a r t

  • s

t a t l

  • g

b r e a s t

  • w

m u s h r

  • m

h e a r t

  • c

h e a r t

  • h

p

  • s

t

  • p

e r a t i v e

  • p

a t i e n t

  • d

a t a l y m p h b r e a s t

  • c

a n c e r d i a b e t e s l i v e r

  • d

i s

  • r

d e r s k r

  • v

s

  • k

p c r e d i t

  • a

a n n e a l . O R I G c r e d i t

  • g

s

  • n

a r h a b e r m a n

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

slide-20
SLIDE 20

Finding the nearest datasets (2)

w a v e f

  • r

m

  • 5

i r i s p e n d i g i t s y e a s t s a t i m a g e b a l a n c e

  • s

c a l e e c

  • l

i c m c p a g e

  • b

l

  • c

k s t a e g l a s s n u r s e r y s p a m b a s e v e h i c l e s e g m e n t c a r

  • p

t d i g i t s p r i m a r y

  • t

u m

  • r

m f e a t

  • f
  • u

r i e r m f e a t

  • k

a r h u n e n m f e a t

  • z

e r n i k e a b a l

  • n

e v

  • w

e l m f e a t

  • f

a c t

  • r

s d e r m a t

  • l
  • g

y l e t t e r m f e a t

  • m
  • r

p h

  • l
  • g

i c a l m f e a t

  • p

i x e l e u c a l y p t u s a r r h y t h m i a z

  • a

u d i

  • l
  • g

y a u t

  • s

s

  • y

b e a n c y l i n d e r

  • b

a n d s v

  • t

e i

  • n
  • s

p h e r e t i c

  • t

a c

  • t
  • e

b r a z i l t

  • u

r i s m h e p a t i t i s l a b

  • r

h e a r t

  • s

t a t l

  • g

b r e a s t

  • w

m u s h r

  • m

h e a r t

  • c

h e a r t

  • h

p

  • s

t

  • p

e r a t i v e

  • p

a t i e n t

  • d

a t a l y m p h b r e a s t

  • c

a n c e r d i a b e t e s l i v e r

  • d

i s

  • r

d e r s k r

  • v

s

  • k

p c r e d i t

  • a

a n n e a l . O R I G c r e d i t

  • g

s

  • n

a r h a b e r m a n

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

slide-21
SLIDE 21

Finding the nearest datasets (3)

w a v e f

  • r

m

  • 5

i r i s p e n d i g i t s y e a s t s a t i m a g e b a l a n c e

  • s

c a l e e c

  • l

i c m c p a g e

  • b

l

  • c

k s t a e g l a s s n u r s e r y s p a m b a s e v e h i c l e s e g m e n t c a r

  • p

t d i g i t s p r i m a r y

  • t

u m

  • r

m f e a t

  • f
  • u

r i e r m f e a t

  • k

a r h u n e n m f e a t

  • z

e r n i k e a b a l

  • n

e v

  • w

e l m f e a t

  • f

a c t

  • r

s d e r m a t

  • l
  • g

y l e t t e r m f e a t

  • m
  • r

p h

  • l
  • g

i c a l m f e a t

  • p

i x e l e u c a l y p t u s a r r h y t h m i a z

  • a

u d i

  • l
  • g

y a u t

  • s

s

  • y

b e a n c y l i n d e r

  • b

a n d s v

  • t

e i

  • n
  • s

p h e r e t i c

  • t

a c

  • t
  • e

b r a z i l t

  • u

r i s m h e p a t i t i s l a b

  • r

h e a r t

  • s

t a t l

  • g

b r e a s t

  • w

m u s h r

  • m

h e a r t

  • c

h e a r t

  • h

p

  • s

t

  • p

e r a t i v e

  • p

a t i e n t

  • d

a t a l y m p h b r e a s t

  • c

a n c e r d i a b e t e s l i v e r

  • d

i s

  • r

d e r s k r

  • v

s

  • k

p c r e d i t

  • a

a n n e a l . O R I G c r e d i t

  • g

s

  • n

a r h a b e r m a n

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

slide-22
SLIDE 22

Finding the nearest datasets (3)

w a v e f

  • r

m

  • 5

i r i s p e n d i g i t s y e a s t s a t i m a g e b a l a n c e

  • s

c a l e e c

  • l

i c m c p a g e

  • b

l

  • c

k s t a e g l a s s n u r s e r y s p a m b a s e v e h i c l e s e g m e n t c a r

  • p

t d i g i t s p r i m a r y

  • t

u m

  • r

m f e a t

  • f
  • u

r i e r m f e a t

  • k

a r h u n e n m f e a t

  • z

e r n i k e a b a l

  • n

e v

  • w

e l m f e a t

  • f

a c t

  • r

s d e r m a t

  • l
  • g

y l e t t e r m f e a t

  • m
  • r

p h

  • l
  • g

i c a l m f e a t

  • p

i x e l e u c a l y p t u s a r r h y t h m i a z

  • a

u d i

  • l
  • g

y a u t

  • s

s

  • y

b e a n c y l i n d e r

  • b

a n d s v

  • t

e i

  • n
  • s

p h e r e t i c

  • t

a c

  • t
  • e

b r a z i l t

  • u

r i s m h e p a t i t i s l a b

  • r

h e a r t

  • s

t a t l

  • g

b r e a s t

  • w

m u s h r

  • m

h e a r t

  • c

h e a r t

  • h

p

  • s

t

  • p

e r a t i v e

  • p

a t i e n t

  • d

a t a l y m p h b r e a s t

  • c

a n c e r d i a b e t e s l i v e r

  • d

i s

  • r

d e r s k r

  • v

s

  • k

p c r e d i t

  • a

a n n e a l . O R I G c r e d i t

  • g

s

  • n

a r h a b e r m a n

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

slide-23
SLIDE 23

Finding the nearest datasets (4)

w a v e f

  • r

m

  • 5

i r i s p e n d i g i t s y e a s t s a t i m a g e b a l a n c e

  • s

c a l e e c

  • l

i c m c p a g e

  • b

l

  • c

k s t a e g l a s s n u r s e r y s p a m b a s e v e h i c l e s e g m e n t c a r

  • p

t d i g i t s p r i m a r y

  • t

u m

  • r

m f e a t

  • f
  • u

r i e r m f e a t

  • k

a r h u n e n m f e a t

  • z

e r n i k e a b a l

  • n

e v

  • w

e l m f e a t

  • f

a c t

  • r

s d e r m a t

  • l
  • g

y l e t t e r m f e a t

  • m
  • r

p h

  • l
  • g

i c a l m f e a t

  • p

i x e l e u c a l y p t u s a r r h y t h m i a z

  • a

u d i

  • l
  • g

y a u t

  • s

s

  • y

b e a n c y l i n d e r

  • b

a n d s v

  • t

e i

  • n
  • s

p h e r e t i c

  • t

a c

  • t
  • e

b r a z i l t

  • u

r i s m h e p a t i t i s l a b

  • r

h e a r t

  • s

t a t l

  • g

b r e a s t

  • w

m u s h r

  • m

h e a r t

  • c

h e a r t

  • h

p

  • s

t

  • p

e r a t i v e

  • p

a t i e n t

  • d

a t a l y m p h b r e a s t

  • c

a n c e r d i a b e t e s l i v e r

  • d

i s

  • r

d e r s k r

  • v

s

  • k

p c r e d i t

  • a

a n n e a l . O R I G c r e d i t

  • g

s

  • n

a r h a b e r m a n

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 8 / 18

slide-24
SLIDE 24

Distance metric (1)

Commonly used in literature, the L1 norm: d(Dnew,Dj) = ∑

i

|mnew

i

−mj

i|

(1)

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 9 / 18

slide-25
SLIDE 25

Experimental Setup

57 datasets from the OpenML repository

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

slide-26
SLIDE 26

Experimental Setup

57 datasets from the OpenML repository 46 metafeatures from the literature:

Split into five different subsets, including landmarking

[Pfahringer et al. 2000]

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

slide-27
SLIDE 27

Experimental Setup

57 datasets from the OpenML repository 46 metafeatures from the literature:

Split into five different subsets, including landmarking

[Pfahringer et al. 2000]

Two case studies

Support Vector Machine with MI-Spearmint [Snoek et al. 2012] AutoSklearn with MI-SMAC [Hutter et al. 2011]

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

slide-28
SLIDE 28

Experimental Setup

57 datasets from the OpenML repository 46 metafeatures from the literature:

Split into five different subsets, including landmarking

[Pfahringer et al. 2000]

Two case studies

Support Vector Machine with MI-Spearmint [Snoek et al. 2012] AutoSklearn with MI-SMAC [Hutter et al. 2011]

Tried 5, 10, 20 and 25 initial configurations

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

slide-29
SLIDE 29

Experimental Setup

57 datasets from the OpenML repository 46 metafeatures from the literature:

Split into five different subsets, including landmarking

[Pfahringer et al. 2000]

Two case studies

Support Vector Machine with MI-Spearmint [Snoek et al. 2012] AutoSklearn with MI-SMAC [Hutter et al. 2011]

Tried 5, 10, 20 and 25 initial configurations ran each instantiation 10 times on each dataset → 26220 optimization runs

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

slide-30
SLIDE 30

Experimental Setup

57 datasets from the OpenML repository 46 metafeatures from the literature:

Split into five different subsets, including landmarking

[Pfahringer et al. 2000]

Two case studies

Support Vector Machine with MI-Spearmint [Snoek et al. 2012] AutoSklearn with MI-SMAC [Hutter et al. 2011]

Tried 5, 10, 20 and 25 initial configurations ran each instantiation 10 times on each dataset → 26220 optimization runs therefore, precomputed a dense grid for every dataset

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 10 / 18

slide-31
SLIDE 31

Combined Algorithm Selection and Hyperparameter Optimization problem (CASH)

Max Features SVM gamma C(SVM) Classifier Random Forest LinearSVM Criterion Min Samples Split loss C(LinearSVM) MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 11 / 18

slide-32
SLIDE 32

Combined Algorithm Selection and Hyperparameter Optimization problem (CASH)

Max Features SVM gamma C(SVM) Classifier Random Forest LinearSVM Criterion Min Samples Split loss C(LinearSVM)

[Auto-WEKA, Thornton et al. 2013]

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 11 / 18

slide-33
SLIDE 33

AutoSklearn: Hyperparameters

Component Hyperparameter # Values Main λclassifier 3 Main preprocessing 2 SVM log2(C) 21 SVM log2(γ) 19 LinearSVM log2(C) 21 LinearSVM penalty 2 RF min splits 5 RF max features 10 RF criterion 2 PCA variance to keep 2

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 12 / 18

slide-34
SLIDE 34

AutoSklearn: Hyperparameters

Component Hyperparameter # Values Main λclassifier 3 Main preprocessing 2 SVM log2(C) 21 SVM log2(γ) 19 LinearSVM log2(C) 21 LinearSVM penalty 2 RF min splits 5 RF max features 10 RF criterion 2 PCA variance to keep 2 1623 hyperparameter configurations

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 12 / 18

slide-35
SLIDE 35

AutoSklearn: Results (1)

10 20 30 40 50 #Function evaluations 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Difference to min function value

SMAC

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 13 / 18

slide-36
SLIDE 36

AutoSklearn: Results (1)

10 20 30 40 50 #Function evaluations 0.00 0.02 0.04 0.06 0.08 0.10 Difference to min function value

SMAC random TPE MI-SMAC(10,L1,landmarking)

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 14 / 18

slide-37
SLIDE 37

AutoSklearn: Results (2)

10 20 30 40 50 1.8 2.0 2.2 2.4 2.6 2.8 3.0 SMAC random TPE MI-SMAC(10,L1 ,landmarking) MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

slide-38
SLIDE 38

AutoSklearn: Results (3)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10 20 30 40 50 #Function evaluations 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

MI-SMAC(10,L1,landmarking) vs SMAC

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

slide-39
SLIDE 39

AutoSklearn: Results (3)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10 20 30 40 50 #Function evaluations 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

MI-SMAC(10,L1,landmarking) vs SMAC

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

slide-40
SLIDE 40

AutoSklearn: Results (3)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10 20 30 40 50 #Function evaluations 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

MI-SMAC(10,L1,landmarking) vs SMAC MI-SMAC(10,L1,landmarking) vs TPE

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

slide-41
SLIDE 41

AutoSklearn: Results (3)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10 20 30 40 50 #Function evaluations 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

MI-SMAC(10,L1,landmarking) vs SMAC MI-SMAC(10,L1,landmarking) vs TPE MI-SMAC(10,L1,landmarking) vs random

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 15 / 18

slide-42
SLIDE 42

Open questions

Does MI-SMBO scale to larger configuration spaces? What if gridsearch is too expensive? Can the metalearning component be added directly into the SMBO procedure?

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 16 / 18

slide-43
SLIDE 43

Take home messages

SMBO can be substantially improved by providing good initial configurations. Metalearning provides a sound framework to find these configurations. MI-SMAC improves on state-of-the-art methods on a large configuration space, namely AutoSklearn.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 17 / 18

slide-44
SLIDE 44

The end

Thank you for your attention. Further questions: feurerm@cs.uni-freiburg.de This presentation was partially supported by an ECCAI Travel Award and the ECCAI sponsors.

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18

slide-45
SLIDE 45

AutoSklearn: Results (5)

10 20 30 40 50 #Function evaluations 0.00 0.02 0.04 0.06 0.08 0.10 Min function value

SMAC random TPE MI-SMAC(10,L1,landmarking)

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18

slide-46
SLIDE 46

AutoSklearn: Results (7)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 10 20 30 40 50 Function evaluations −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0.0

MI-SMAC(10,L1,all) vs MI-SMAC(10,L1,landmarking) MI-SMAC(10,L1,all) vs SMAC MI-SMAC(10,L1,all) vs TPE MI-SMAC(10,L1,all) vs random

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18

slide-47
SLIDE 47

AutoSklearn: Results (8)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 10 20 30 40 50 #Function evaluations 0.6 0.5 0.4 0.3 0.2 0.1 0.0

SMAC vs MI-SMAC(10,L1,landmarking) SMAC vs TPE SMAC vs random

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18

slide-48
SLIDE 48

AutoSklearn: Results (9)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10 20 30 40 50 Function evaluations −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0.0

MI-SMAC(25,L1,landmarking) vs MI-SMAC(10,L1,landmarking) MI-SMAC(25,L1,landmarking) vs MI-SMAC(20,L1,landmarking) MI-SMAC(25,L1,landmarking) vs MI-SMAC(5,L1,landmarking) MI-SMAC(25,L1,landmarking) vs SMAC MI-SMAC(25,L1,landmarking) vs TPE MI-SMAC(25,L1,landmarking) vs random

MetaSel ’14 Feurer, Springenberg and Hutter – MI-SMBO 18 / 18