Bayesian Optimization of Gaussian Processes applied to Performance - - PowerPoint PPT Presentation

bayesian optimization of gaussian processes applied to
SMART_READER_LITE
LIVE PREVIEW

Bayesian Optimization of Gaussian Processes applied to Performance - - PowerPoint PPT Presentation

Bayesian Optimization of Gaussian Processes applied to Performance Tuning Ramki Ramakrishna @ysr1729 #TwitterVMTeam QCon Sao Paulo, 2019 A JVM Engineer talks to a Data Scientist 2 Many Hundreds of Services Several Tens of Thousands


slide-1
SLIDE 1

Bayesian Optimization of Gaussian Processes applied to Performance Tuning

Ramki Ramakrishna @ysr1729 #TwitterVMTeam

QCon Sao Paulo, 2019

slide-2
SLIDE 2

A JVM Engineer talks to a Data Scientist

2

slide-3
SLIDE 3

Many Hundreds

  • f

Services

slide-4
SLIDE 4

Several Tens of Thousands of Physical Servers

slide-5
SLIDE 5

Several Millions

  • f

CPU Cores

slide-6
SLIDE 6

Several Hundreds of Thousands of Twitter JVMs

slide-7
SLIDE 7

A Few Hundred Tunable JVM Parameters

slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

Mining for Gold

  • 1930’s South Africa
  • Prospecting for gold and other minerals
  • Daniel Krige, 1951: “Kriging” in geostatistics
  • Jonas Mockus 70’s
  • Jones et al. 80’s
  • Rasmussen & Williams 90’s : Gaussian Processes

12

slide-13
SLIDE 13

Applications

  • design of expensive experiments
  • optimal designs
  • optimization of engineered materials
  • hyperparameter tuning (architectural

parameters) of neural networks

13

slide-14
SLIDE 14

Engineering as Optimization

  • linear or non-linear objective function
  • finite convex or non-convex space, rectangular, linear

(affine) or non-linear constraints

  • black box objective function
  • black box constraints
  • noisy objective function
  • noisy constraints

14

slide-15
SLIDE 15

Black Box Modeling

  • Model the unknown objective function
  • Model the unknown constraints
  • Model is a “surrogate”
  • Evaluations are expensive

15

slide-16
SLIDE 16

Models and Model Parameters

  • Parametric models
  • Non-parametric models

16

slide-17
SLIDE 17

Probabilistic Models

  • A measure of our uncertainty
  • A measure of measurement/observation noise

17

slide-18
SLIDE 18

Gaussian Process

  • : mean function
  • : covariance function

18

GP(μ, κ)

μ(x) κ(x, x′)

slide-19
SLIDE 19

Gaussian Process

  • Two different views
  • a vector of possibly uncountably many

Gaussian variables with given mean and a joint covariate distribution

  • a Gaussian distribution over functions

19

slide-20
SLIDE 20

Gaussian Process

20

slide-21
SLIDE 21

Gaussian Process

21

slide-22
SLIDE 22

Gaussian Process

22

slide-23
SLIDE 23

Gaussian Process

23

slide-24
SLIDE 24

Gaussian Process

24

slide-25
SLIDE 25

Gaussian Process

25

slide-26
SLIDE 26

Gaussian Process

26

slide-27
SLIDE 27

Gaussian Process

27

slide-28
SLIDE 28

28

μn(x) = κT(K + σ2

noiseI)−1Y

κn(x, x′) = κ(x, x′) − κT(K + σ2

noiseI)−1κ′

GPn

slide-29
SLIDE 29

Covariance Kernel Function

  • Squared exponentials (SE)
  • “n/2" Matern kernels

29

slide-30
SLIDE 30

Covariance Kernel Functions

30

slide-31
SLIDE 31

Covariance Kernel Functions

31

slide-32
SLIDE 32

Acquisition Function

32

GPprior + Datan →Bayes GPn →? xn+1

slide-33
SLIDE 33

Acquisition Function

33

GPprior + Datan+1 →Bayes GPn+1 →? xn+2

slide-34
SLIDE 34

Acquisition Functions

  • Thompson Sampling from the posterior GP

(TS)

  • Probability of Improvement (PI)
  • Upper Confidence Bound (UCB)
  • Expected Improvement (EI)

34

slide-35
SLIDE 35

Thompson Sampling

35

slide-36
SLIDE 36

Probability of Improvement

36

slide-37
SLIDE 37

Upper Confidence Bound

37

slide-38
SLIDE 38

Expected Improvement

38

slide-39
SLIDE 39

Acquisition Function

  • Thompson Sampling from the posterior GP

(TS)

  • Probability of Improvement (PI)
  • Upper Confidence Bound (UCB)
  • Expected Improvement (EI)

39

slide-40
SLIDE 40

Maximizing the Acquisition Function

  • piecewise infinitely smooth
  • gradient-based techniques work
  • modified Monte-Carlo techniques are typically

used

40

slide-41
SLIDE 41

Optimizing Performance Parameters

myExpt = Optimizer.declareDevice(Parm1: {Int, Min1, Max1}, Parm2: {Real, Min2, Max2}, Parm3: {Enum, enum1, enum2, enum3} …) myExpt.setSLA(…) // set performance SLA myExpt.setTerminationCriteria(…) // set termination criteria while (!myExpt.shouldTerminate()) { parmSuggestion = myExpt.suggest() // get another test suggestion newRun = myDevice.test(parmSuggestion) // test device at given setting if (myExpt.isValid(newRun)) { // is SLA met? myExpt.update(parmSuggestion, newRun) // update w/new result } } return myExpt.bestConfig()

slide-42
SLIDE 42

Bayesian Optimization

myExpt = Optimizer.declareDevice(Parm1: {Int, Min1, Max1}, Parm2: {Real, Min2, Max2}, Parm3: {Enum, enum1, enum2, enum3} …) myExpt.setSLA(…) // set performance SLA myExpt.setTerminationCriteria(…) // set termination criteria while (!myExpt.shouldTerminate()) { parmSuggestion = myExpt.suggest() // get another test suggestion newRun = myDevice.test(parmSuggestion) // test device at given setting if (myExpt.isValid(newRun)) { // is SLA met? myExpt.update(parmSuggestion, newRun) // update w/new result } } return myExpt.bestConfig()

slide-43
SLIDE 43

Constraints

myExpt = Optimizer.declareDevice(Parm1: {Int, Min1, Max1}, Parm2: {Real, Min2, Max2}, Parm3: {Enum, enum1, enum2, enum3} …) myExpt.setSLA(…) // set performance SLA myExpt.setTerminationCriteria(…) // set termination criteria while (!myExpt.shouldTerminate()) { parmSuggestion = myExpt.suggest() // get another test suggestion newRun = myDevice.test(parmSuggestion) // test device at given setting if (myExpt.isValid(newRun)) { // is SLA met? myExpt.update(parmSuggestion, newRun) // update w/new result } } return myExpt.bestConfig()

slide-44
SLIDE 44

AUTOTUNE AS A SERVICE

slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58

GizmoDuck & Garbage Collection Overhead via Tuning JVM Parameters

slide-59
SLIDE 59

TweetyPie & CPU Utilization via Tuning Graal JIT Parameters

slide-60
SLIDE 60