Hyperparameter Search in Machine Learning Marc Claesen and Bart De - - PowerPoint PPT Presentation

hyperparameter search in machine learning
SMART_READER_LITE
LIVE PREVIEW

Hyperparameter Search in Machine Learning Marc Claesen and Bart De - - PowerPoint PPT Presentation

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor marc.claesen@esat.kuleuven.be


slide-1
SLIDE 1

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Hyperparameter Search in Machine Learning

Marc Claesen and Bart De Moor

marc.claesen@esat.kuleuven.be ESAT-STADIUS, KU Leuven iMinds Medical IT Department

STADIUS

Center for Dynamical Systems, Signal Processing and Data Analytics

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-2
SLIDE 2

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Outline

1

Introduction

2

Example: optimizing hyperparameters for an SVM classifier

3

Challenges in hyperparameter search

4

State-of-the-art

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-3
SLIDE 3

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Machine learning

Methods capable of learning patterns of interest from data. by formulating the learning task as an optimization problem

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-4
SLIDE 4

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Machine learning

Methods capable of learning patterns of interest from data. by formulating the learning task as an optimization problem Machine learning is situated on the intersection of various fields: statistics, computer science, optimization, (biology), . . .

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-5
SLIDE 5

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Machine learning

Methods capable of learning patterns of interest from data. by formulating the learning task as an optimization problem Machine learning is situated on the intersection of various fields: statistics, computer science, optimization, (biology), . . . The field encompasses learning methods with various origins, e.g.: biology, e.g. neural networks [1] convex optimization, e.g. support vector machines [2] statistics, e.g. hidden Markov models [3] tensor decompositions, e.g. recommender systems [4]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-6
SLIDE 6

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Hyperparameter search

Most machine learning methods are (hyper)parameterized. e.g. Occam’s razor: model complexity and overfitting

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-7
SLIDE 7

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Hyperparameter search

Most machine learning methods are (hyper)parameterized. e.g. Occam’s razor: model complexity and overfitting Hyperparameters can significantly impact performance suitable hyperparameters must be determined for each task

  • ccurs in both supervised and unsupervised learning

→ need for disciplined, automated optimization methods

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-8
SLIDE 8

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Hyperparameter search

Most machine learning methods are (hyper)parameterized. e.g. Occam’s razor: model complexity and overfitting Hyperparameters can significantly impact performance suitable hyperparameters must be determined for each task

  • ccurs in both supervised and unsupervised learning

→ need for disciplined, automated optimization methods Some examples: SVM: regularization and kernel hyperparameters ANN: regularization, network architecture, transfer functions

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-9
SLIDE 9

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components: a learning algorithm A, parameterized by hyperparameters λ

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-10
SLIDE 10

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components: a learning algorithm A, parameterized by hyperparameters λ training and test data X(tr), X(te)

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-11
SLIDE 11

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components: a learning algorithm A, parameterized by hyperparameters λ training and test data X(tr), X(te) a model M = A(X(tr) | λ)

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-12
SLIDE 12

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components: a learning algorithm A, parameterized by hyperparameters λ training and test data X(tr), X(te) a model M = A(X(tr) | λ) loss function L to assess quality of M, typically using X(te): L(M | X(te))

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-13
SLIDE 13

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components: a learning algorithm A, parameterized by hyperparameters λ training and test data X(tr), X(te) a model M = A(X(tr) | λ) loss function L to assess quality of M, typically using X(te): L(M | X(te)) In optimization terms, we aim to find λ∗ (assuming minimization): λ∗ = arg min

λ

L

  • A(X(tr) | λ) | X(te)

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-14
SLIDE 14

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Formalizing hyperparameter tuning

In a general sense, tuning involves these components: a learning algorithm A, parameterized by hyperparameters λ training and test data X(tr), X(te) a model M = A(X(tr) | λ) loss function L to assess quality of M, typically using X(te): L(M | X(te)) In optimization terms, we aim to find λ∗ (assuming minimization): λ∗ = arg min

λ

L

  • A(X(tr) | λ) | X(te)

= arg min

λ

F(λ | A, X(tr), X(te), L)

  • bjective function

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-15
SLIDE 15

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Tuning in practice

Most often done using a combination of grid and manual search: grid search suffers from the curse of dimensionality manual tuning leads to poor reproducibility

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-16
SLIDE 16

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Tuning in practice

Most often done using a combination of grid and manual search: grid search suffers from the curse of dimensionality manual tuning leads to poor reproducibility Better solutions exist but lack adoption because: potential performance improvements are underestimated lack of availability and/or ease of use

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-17
SLIDE 17

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Outline

1

Introduction

2

Example: optimizing hyperparameters for an SVM classifier

3

Challenges in hyperparameter search

4

State-of-the-art

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-18
SLIDE 18

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Support vector machine (SVM) classifiers

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-19
SLIDE 19

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Support vector machine (SVM) classifiers

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-20
SLIDE 20

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Support vector machine (SVM) classifiers

min

α,ξ,b

1 2

  • i∈SV
  • j∈SV

αiαjyiyjκ(xi, xj) + C

n

  • i=1

ξi, subject to yi

j∈SV

αiαjyiyjκ(xi, xj) + b

  • ≥ 1 − ξi,

ξi ≥ 0, ∀i.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-21
SLIDE 21

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Task: optimize hyperparameters for an SVM

Tune an SVM classifier with RBF kernel κ(u, v) = e−γu−v2: min

α,b,ξ

1 2

  • i∈SV
  • j∈SV

αiαjyiyj exp

  • − γxi − xj2
  • w2

+C

  • i∈SV

ξi

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-22
SLIDE 22

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Task: optimize hyperparameters for an SVM

Tune an SVM classifier with RBF kernel κ(u, v) = e−γu−v2: min

α,b,ξ

1 2

  • i∈SV
  • j∈SV

αiαjyiyj exp

  • − γxi − xj2
  • w2

+C

  • i∈SV

ξi

  • ptimize regularization parameter C and kernel parameter γ

evaluate (C, γ) pair using 2× iterated 10-fold cross-validation via Optunity’s particle swarm optimizer [5]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-23
SLIDE 23

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Task: optimize hyperparameters for an SVM

Tune an SVM classifier with RBF kernel κ(u, v) = e−γu−v2: min

α,b,ξ

1 2

  • i∈SV
  • j∈SV

αiαjyiyj exp

  • − γxi − xj2
  • w2

+C

  • i∈SV

ξi

  • ptimize regularization parameter C and kernel parameter γ

evaluate (C, γ) pair using 2× iterated 10-fold cross-validation via Optunity’s particle swarm optimizer [5]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-24
SLIDE 24

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Response surface I

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-25
SLIDE 25

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Response surface II

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-26
SLIDE 26

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Outline

1

Introduction

2

Example: optimizing hyperparameters for an SVM classifier

3

Challenges in hyperparameter search

4

State-of-the-art

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-27
SLIDE 27

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Expensive function evaluations

A single objective function evaluation consists of:

1 training a model via the learning method

can be very time consuming (days up to weeks! [6, 7, 8])

2 predict a test set (for supervised methods) 3 compute some evaluation metric for the model/its predictions Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-28
SLIDE 28

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Expensive function evaluations

A single objective function evaluation consists of:

1 training a model via the learning method

can be very time consuming (days up to weeks! [6, 7, 8])

2 predict a test set (for supervised methods) 3 compute some evaluation metric for the model/its predictions

All of the above is often done in cross-validation [9, 10]. used to reliably estimate generalization performance involves many repetitions → exacerbates computation time

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-29
SLIDE 29

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Expensive function evaluations

A single objective function evaluation consists of:

1 training a model via the learning method

can be very time consuming (days up to weeks! [6, 7, 8])

2 predict a test set (for supervised methods) 3 compute some evaluation metric for the model/its predictions

All of the above is often done in cross-validation [9, 10]. used to reliably estimate generalization performance involves many repetitions → exacerbates computation time Training/evaluation time is a function of hyperparameter choice!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-30
SLIDE 30

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Randomness

The objective function measures empirical performance based on a finite sample (data set) → induces discrete, non-smooth jumps

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-31
SLIDE 31

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Randomness

The objective function measures empirical performance based on a finite sample (data set) → induces discrete, non-smooth jumps This gives rise to a stochastic component, inherent to: the learning method (e.g. resampling methods [11, 12, 13]) random sampling (e.g. cross-validation, bootstrap [10, 9])

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-32
SLIDE 32

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Randomness

The objective function measures empirical performance based on a finite sample (data set) → induces discrete, non-smooth jumps This gives rise to a stochastic component, inherent to: the learning method (e.g. resampling methods [11, 12, 13]) random sampling (e.g. cross-validation, bootstrap [10, 9]) The objective function F is not a strict mathematical function → evaluating F(x) multiple times yields multiple results

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-33
SLIDE 33

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Randomness

The objective function measures empirical performance based on a finite sample (data set) → induces discrete, non-smooth jumps This gives rise to a stochastic component, inherent to: the learning method (e.g. resampling methods [11, 12, 13]) random sampling (e.g. cross-validation, bootstrap [10, 9]) The objective function F is not a strict mathematical function → evaluating F(x) multiple times yields multiple results Empirical optimum might not really be best!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-34
SLIDE 34

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Exotic search spaces

Hyperparameter search spaces can be extremely complex: mixed integer-continuous (e.g. regularization & kernel)

  • ften domain constrained (e.g. positive regularization)

combinatorial (e.g. feature selection) conditional dimensions (*)

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-35
SLIDE 35

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Exotic search spaces

Hyperparameter search spaces can be extremely complex: mixed integer-continuous (e.g. regularization & kernel)

  • ften domain constrained (e.g. positive regularization)

combinatorial (e.g. feature selection) conditional dimensions (*) (*) Consider the architecture of an artificial neural network: number of hidden layers size per hidden layer (transfer functions per layer)

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-36
SLIDE 36

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Desiderata for hyperparameter optimizers

Optimization routines for hyperparameter search are ideally: efficient in terms of function evaluations, appropriate for wildly varying objective functions, able to account for randomness, flexible in terms of search space, parallelizable. The practical performance bottleneck is evaluating F → deciding

  • n the next point to evaluate need not be fast

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-37
SLIDE 37

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Outline

1

Introduction

2

Example: optimizing hyperparameters for an SVM classifier

3

Challenges in hyperparameter search

4

State-of-the-art

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-38
SLIDE 38

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Sequential model-based optimization (SMBO)

Commonly used for time-consuming objective functions F [14, 15]. SMBO is an iterative approach, in which each iteration involves:

1 model the response surface M, based on previous evaluations

→ evaluating M is cheap, use M as surrogate for F

2 find optimal test point x∗ based on M

→ optimize some criterion, e.g. expected improvement [16] Approaches differ in terms of model and criterion [14, 15, 17].

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-39
SLIDE 39

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Sequential model-based optimization (SMBO)

Commonly used for time-consuming objective functions F [14, 15]. SMBO is an iterative approach, in which each iteration involves:

1 model the response surface M, based on previous evaluations

→ evaluating M is cheap, use M as surrogate for F

2 find optimal test point x∗ based on M

→ optimize some criterion, e.g. expected improvement [16] Approaches differ in terms of model and criterion [14, 15, 17]. But: inherently sequential!

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-40
SLIDE 40

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Metaheuristic optimization techniques

A large variety of metaheuristic methods have been used, such as: particle swarm optimization [18, 19, 20] genetic algorithms [21, 22] artificial bee colony [23] harmonic search [24] simulated annealing [25] Nelder-Mead simplex [26] Advantages: ease of implementation and parallelization general purpose solvers → few implicit assumptions

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-41
SLIDE 41

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Software

Several packages offer Bayesian SMBO approaches: Hyperopt [27], Spearmint [17] ParamILS [28], AutoWEKA [29] BayesOpt [30], DiceKriging [31]

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-42
SLIDE 42

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Software

Several packages offer Bayesian SMBO approaches: Hyperopt [27], Spearmint [17] ParamILS [28], AutoWEKA [29] BayesOpt [30], DiceKriging [31] Optunity offers fundamentally distinct methods [5]: focus on metaheuristic techniques not offered elsewhere PSO, CMA-ES, random search, sobol sequences, . . . multiplatform: Python, R, MATLAB, Octave General purpose optimization libraries also applicable → but often difficult to integrate in machine learning pipeline

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-43
SLIDE 43

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Metaheuristic methods are competitive to SMBO

Optunity’s standard PSO [5] versus Hyperopt’s tree-structured Parzen estimator [15, 27] on two-dimensional rastrigin function.

1 100 200 300 400 500 100 101

winner

function evaluation number error random search tree of Parzen estimators particle swarm optimization

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-44
SLIDE 44

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Conclusion

Hyperparameter search in machine learning requires disciplined optimization methods is receiving a lot of research attention, e.g. ChaLearn AutoML

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-45
SLIDE 45

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Conclusion

Hyperparameter search in machine learning requires disciplined optimization methods is receiving a lot of research attention, e.g. ChaLearn AutoML The main challenges are: expensive function evaluations with a stochastic component exotic search spaces

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-46
SLIDE 46

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Conclusion

Hyperparameter search in machine learning requires disciplined optimization methods is receiving a lot of research attention, e.g. ChaLearn AutoML The main challenges are: expensive function evaluations with a stochastic component exotic search spaces Hyperparameter search is an interesting optimization problem → metaheuristic optimization methods are good candidates

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-47
SLIDE 47

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

Acknowledgements

Research Council KU Leuven: GOA/10/09 MaNet Flemish Government:

FWO: projects: G.0871.12N (Neural circuits) IWT: TBM-Logic Insulin(100793), TBM Rectal Cancer(100783), TBM IETA(130256); PhD grant (111065) Industrial Research fund (IOF): IOF/HB/13/027 Logic Insulin iMinds Medical Information Technologies SBO 2014 VLK Stichting E. van der Schueren: rectal cancer

Federal Government: FOD: Cancer Plan 2012-2015 KPC-29-023 (prostate) COST: Action: BM1104: Mass Spectrometry Imaging

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-48
SLIDE 48

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References I

[1] Simon Haykin and Neural Network. A comprehensive

  • foundation. Neural Networks, 2(2004), 2004.

[2] Corinna Cortes and Vladimir Vapnik. Support-vector

  • networks. Machine learning, 20(3):273–297, 1995.

[3] Lawrence Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989. [4] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems, pages 79–86. ACM, 2010.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-49
SLIDE 49

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References II

[5] Marc Claesen, Jaak Simm, Dusan Popovic, Yves Moreau, and Bart De Moor. Easy hyperparameter search using Optunity. arXiv preprint arXiv:1412.1114, 2014. [6] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural

  • networks. In Advances in neural information processing

systems, pages 1097–1105, 2012. [7] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep

  • networks. In Advances in Neural Information Processing

Systems, pages 1223–1231, 2012.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-50
SLIDE 50

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References III

[8] Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pages 3104–3112, 2014. [9] Bradley Efron and Gail Gong. A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician, 37(1):36–48, 1983. [10] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence, volume 14, pages 1137–1145, 1995. [11] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-51
SLIDE 51

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References IV

[12] Marc Claesen, Frank De Smet, Johan A.K. Suykens, and Bart De Moor. EnsembleSVM: A library for ensemble learning using support vector machines. Journal of Machine Learning Research, 15:141–145, 2014. [13] Marc Claesen, Frank De Smet, Johan AK Suykens, and Bart De Moor. A robust ensemble approach to learn from positive and unlabeled data using svm base models. Neurocomputing, 160:73–84, 2015. [14] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm

  • configuration. In Learning and Intelligent Optimization, pages

507–523. Springer, 2011.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-52
SLIDE 52

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References V

[15] James S Bergstra, R´ emi Bardenet, Yoshua Bengio, and Bal´ azs K´

  • egl. Algorithms for hyper-parameter optimization. In

Advances in Neural Information Processing Systems, pages 2546–2554, 2011. [16] Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455–492, 1998. [17] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, pages 2951–2959, 2012.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-53
SLIDE 53

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References VI

[18] Michael Meissner, Michael Schmuker, and Gisbert Schneider. Optimized particle swarm optimization (opso) and its application to artificial neural network training. BMC bioinformatics, 7(1):125, 2006. [19] XC Guo, JH Yang, CG Wu, CY Wang, and YC Liang. A novel ls-svms hyper-parameter selection based on particle swarm

  • ptimization. Neurocomputing, 71(16):3211–3215, 2008.

[20] Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, and Zne-Jung Lee. Particle swarm optimization for parameter determination and feature selection of support vector

  • machines. Expert systems with applications,

35(4):1817–1824, 2008.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-54
SLIDE 54

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References VII

[21] Jinn-Tsong Tsai, Jyh-Horng Chou, and Tung-Kuan Liu. Tuning the structure and parameters of a neural network by using hybrid taguchi-genetic algorithm. Neural Networks, IEEE Transactions on, 17(1):69–80, 2006. [22] Carlos Ans´

  • tegui, Meinolf Sellmann, and Kevin Tierney. A

gender-based genetic algorithm for the automatic configuration of algorithms. In Principles and Practice of Constraint Programming-CP 2009, pages 142–157. Springer, 2009. [23] Dervis Karaboga, Bahriye Akay, and Celal Ozturk. Artificial bee colony (abc) optimization algorithm for training feed-forward neural networks. In Modeling decisions for artificial intelligence, pages 318–329. Springer, 2007.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-55
SLIDE 55

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References VIII

[24] Jo˜ ao P Papa, Gustavo H Rosa, Aparecido N Marana, Walter Scheirer, and David D Cox. Model selection for Discriminative Restricted Boltzmann Machines through meta-heuristic

  • techniques. Journal of Computational Science, 9:14–18, 2015.

[25] Samuel Xavier-de Souza, Johan AK Suykens, Joos Vandewalle, and D´ esir´ e Boll´

  • e. Coupled simulated annealing.

Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 40(2):320–335, 2010. [26] Gavin C Cawley and Nicola LC Talbot. Fast exact leave-one-out cross-validation of sparse least-squares support vector machines. Neural networks, 17(10):1467–1475, 2004.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-56
SLIDE 56

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References IX

[27] James Bergstra, Dan Yamins, and David D Cox. Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python in Science Conference, pages 13–20. SciPy, 2013. [28] Frank Hutter, Holger H Hoos, Kevin Leyton-Brown, and Thomas St¨

  • utzle. ParamILS: an automatic algorithm

configuration framework. Journal of Artificial Intelligence Research, 36(1):267–306, 2009. [29] Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Auto-WEKA: Automated selection and hyper-parameter optimization of classification algorithms. CoRR, abs/1208.3719, 2012.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning

slide-57
SLIDE 57

Introduction Example: optimizing hyperparameters for an SVM classifier Challenges in hyperparameter search State-of-the-art References

References X

[30] Ruben Martinez-Cantin. BayesOpt: A Bayesian optimization library for nonlinear optimization, experimental design and

  • bandits. arXiv preprint arXiv:1405.7430, 2014.

[31] Olivier Roustant, David Ginsbourger, Yves Deville, et al. DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by kriging-based metamodeling and

  • ptimization. 2012.

Marc Claesen and Bart De Moor Hyperparameter Search in Machine Learning