Hyperparameter optimization strategies git clone - - PowerPoint PPT Presentation

hyperparameter optimization strategies
SMART_READER_LITE
LIVE PREVIEW

Hyperparameter optimization strategies git clone - - PowerPoint PPT Presentation

Hyperparameter optimization strategies git clone https://github.com/IASIAI/hyperparameter-optimization-strategies.git meetup.com/ IASI-AI / facebook.com/ AI.Iasi / iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher ,


slide-1
SLIDE 1

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Hyperparameter

  • ptimization strategies

git clone https://github.com/IASIAI/hyperparameter-optimization-strategies.git

slide-2
SLIDE 2

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Gabriel Marchidan Bogdan Burlacu

Software architect AI researcher, PhD

slide-3
SLIDE 3

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

“Algorithms are conceived in analytic purity in the high citadels

  • f academic research, heuristics are midwifed by expediency

in the dark corners of the practitioner’s lair” Fred Glover, 1977

slide-4
SLIDE 4

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Contents

  • Problem statement
  • Disclaimer
  • Random search
  • Grid search
  • Bayesian optimization
  • Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
slide-5
SLIDE 5

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Problem statement

  • Hyperparameters are parameters whose values are set prior to the

commencement of the learning process.

  • By contrast, the values of other parameters are derived via training.
  • The problem (hyper)parameter optimization is not specific to ML
slide-6
SLIDE 6

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Problem statement

slide-7
SLIDE 7

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Problem statement

slide-8
SLIDE 8

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Disclaimer Things that will improve a real-life algorithm mode than in-depth parameter optimization:

  • Having better data
  • Having more data
  • Changing the algorithm (or the rates in which multiple algorithms’ results

are being weighted)

So don’t start with this !

slide-9
SLIDE 9

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Grid search

  • Scans the parameter space in a

grid pattern with a certain step size

  • Hence the name “Grid search”
slide-10
SLIDE 10

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Grid search

  • Grid search probes parameter configurations deterministically, by laying down a

grid of all possible configurations inside your parameter space

  • In all continuous dimensions of parameter space a step is considered (defines the

smoothness of the grid)

slide-11
SLIDE 11

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Grid search

  • Requires a lot of function evaluations
  • Is highly impractical for algorithms with more than 4 parameters
  • The number of function evaluations grows exponentially with each additional

parameter (curse of dimensionality)

slide-12
SLIDE 12

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Curse of dimensionality

slide-13
SLIDE 13

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Random search

  • Random points from the

parameter space are being chosen

  • In turn, random points from the

solution space are being sampled

slide-14
SLIDE 14

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Random search

  • Random Search suggests configurations randomly from your parameter space
  • The best result is saved along with the corresponding parameters
  • The next result is sampled either randomly from the whole parameter space or

randomly from a sphere around the current result

  • The process is repeated until a termination criterion is met (usually, number of

iterations)

slide-15
SLIDE 15

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Random search

  • Can be applied to functions that are not continuous or differentiable
  • It makes no assumptions about the properties of the function
  • Has multiple variants: fixed step, optimum step, adaptive step etc.
slide-16
SLIDE 16

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Bayesian optimization

  • With each observation we are

improving the model of the objective function

  • We are sampling the points that have

the highest chance to improve the

  • bjective function
slide-17
SLIDE 17

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Bayesian optimization

slide-18
SLIDE 18

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Bayesian optimization

  • To use Bayesian optimization, we need a way to flexibly model distributions over
  • bjective functions
  • For this problem, Gaussian Processes are a particularly elegant technique
  • Used for problems where each sampling is costly either as time or resources
  • Historically Gaussian Processes were developed to help search for gold
slide-19
SLIDE 19

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Bayesian optimization

slide-20
SLIDE 20

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

  • CMA-ES stands for Covariance Matrix Adaptation Evolution Strategy
  • It is an evolutionary algorithm for difficult non-linear non-convex black-box
  • ptimisation problems in continuous domain
  • The CMA-ES is considered as state-of-the-art in evolutionary computation and

has been adopted as one of the standard tools for continuous optimisation

slide-21
SLIDE 21

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

  • It is an evolutionary algorithm
  • Solutions are represented by parameter vectors with real number values
  • Initial solutions are randomly generated
  • Subsequent solutions are generated from the fittest solutions of the previous

generation by recombination and mutation

slide-22
SLIDE 22

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

  • Runs have the same population size each step (λ, λ > 4, usually λ > 20)
  • Each value from the parameter solution vector is modified by sampling a certain

distribution

  • The distribution is updated by CMA based on the best solutions found in the

current step(ES)

slide-23
SLIDE 23

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

  • The mean is updated each time to provide a new centroid for new solutions
  • Two paths of the time evolution of the distribution mean of the strategy are

recorded, called search or evolution paths

  • The two paths contain information about the correlation between consecutive

iterations

slide-24
SLIDE 24

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

  • Each iteration the mean is adjusted
  • The two evolution paths are updated
  • A new step size is calculated
slide-25
SLIDE 25

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Running the simulations

  • Windows - WinPython - https://winpython.github.io/
  • Linux and macOS – Python 3.5+, SciPy, scikit-learn, skopt (scikit-optimize)
  • Python virtualenv recommended for Linux and macOS
  • To test, you should be able to run the examples here: https://scikit-optimize.github.io/

git clone https://github.com/IASIAI/hyperparameter-optimization-strategies.git

slide-26
SLIDE 26

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Simulation

  • Trying to find the maximum

value of the Rastrigin function

  • Will run, in turn:

○ Grid Search ○ Random Search ○ Bayesian optimization ○ CMA-ES

slide-27
SLIDE 27

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Biography

  • https://www.lri.fr/~hansen/cmaesintro.html
  • https://blog.sigopt.com/posts/evaluating-hyperparameter-optimization-strategies
  • https://cloud.google.com/blog/big-data/2017/08/hyperparameter-tuning-in-cloud-machine-learning-engine-using-bayesia

n-optimization

  • https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)
  • https://en.wikipedia.org/wiki/CMA-ES
  • https://en.wikipedia.org/wiki/Rastrigin_function
slide-28
SLIDE 28

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Questions ?

slide-29
SLIDE 29

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Thank You!