Bayesian Optimization of Composite Functions Ral Astudillo Cornell - - PowerPoint PPT Presentation

bayesian optimization of composite functions
SMART_READER_LITE
LIVE PREVIEW

Bayesian Optimization of Composite Functions Ral Astudillo Cornell - - PowerPoint PPT Presentation

Bayesian Optimization of Composite Functions Ral Astudillo Cornell University Joint work with Peter I. Frazier ICML 2019 Ral Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions 1 0 1 log10(regret) 2 3


slide-1
SLIDE 1

Bayesian Optimization of Composite Functions

Raúl Astudillo Cornell

University

Joint work with Peter I. Frazier ICML 2019

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-2
SLIDE 2

20 40 60 80 100 function evaluations −7 −6 −5 −4 −3 −2 −1 1 log10(regret) Random EI PES Our method

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-3
SLIDE 3

Problem

We consider problems of the form max

x∈X f(x),

where f(x) = g(h(x)) and

  • h : X ⊂ Rd → Rm is a time-consuming-to-evaluate

black-box,

  • g : Rm → R and its gradient are known in closed

form and fast-to-evaluate.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-4
SLIDE 4

Composite functions arise naturally in practice

  • Hyperparameter tuning of classification algorithms:

g(h(x)) = −

m

  • j=1

hj(x), where hj(x) is the classification error on the j-th class under hyperparameters x.

  • Calibration of expensive simulators:

g(h(x)) = −

m

  • j=1

(hj(x) − yj)2, where h(x) is the output of the simulator under parameters x and y is a vector of observed data.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-5
SLIDE 5

Standard BayesOpt approach

  • Set a Gaussian process distribution on f.
  • While evaluation budget is not exhausted:
  • Compute the posterior distribution on f given the

evaluations so far, {(xi, f(xi))}n

i=1,

  • Choose the next point to evaluate as the one that

maximizes an acquisition function a: xn+1 ∈ argmaxxan(x), where the subscript n indicates the dependence

  • n the posterior distribution at time n.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-6
SLIDE 6

Background: Expected Improvement (EI)

The most widely used acquisition function in standard BayesOpt is: EIn(x) = En

  • {f(x) − f ∗

n}+

, where

  • f ∗

n is the best observed value so far,

  • En is the conditional expectation under the

posterior after n evaluations.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-7
SLIDE 7

Background: Expected Improvement (EI)

The most widely used acquisition function in standard BayesOpt is: EIn(x) = En

  • {f(x) − f ∗

n}+

. When f(x) is Gaussian, EI and its derivative have a closed form which make it easy to optimize.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-8
SLIDE 8

Our contribution

  • 1. A statistical approach for modeling f that greatly

improves over the standard BayesOpt approach.

  • 2. An efficient way to optimize the expected

improvement under this new statistical model.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-9
SLIDE 9

Our approach

  • Model h using a multi-output Gaussian process

instead of f directly.

  • This implies a (non-Gaussian) posterior on

f(x) = g(h(x)).

  • To decide where to sample next: compute and
  • ptimize the expected improvement under this new

posterior.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-10
SLIDE 10

Expected Improvement for Composite Functions

Our acquisition function is Expected Improvement for Composite Functions (EI-CF): EI-CFn(x) = En

  • {g(h(x)) − f ∗

n}+

, where h is a GP, making h(x) Gaussian.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-11
SLIDE 11

−4 −2 2 4 x −4 −2 2 4 6 8 h(x) h posterior mean 95% confidence interval −4 −2 2 4 x 5 10 15 20 25 30 35 f(x) f posterior mean 95% confidence interval −4 −2 2 4 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 EI-CF(x) EI-CF −4 −2 2 4 x 5 10 15 20 25 30 35 f(x) f posterior mean 95% confidence interval −4 −2 2 4 x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 EI(x) EI

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-12
SLIDE 12

Challenge: maximizing EI-CF is hard

Expected Improvement for Composite Functions (EI-CF): EI-CFn(x) = En

  • {g(h(x)) − f ∗

n}+

. Challenge:

  • When h is a GP and g is nonlinear, f(x) = g(h(x))

is not Gaussian.

  • EI-CF does not have a closed form, making it hard

to optimize.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-13
SLIDE 13

Our approach to maximize EI-CF

  • Construct an unbiased estimator of ∇EI-CFn(x)

using the reparametrization trick and infinitesimal perturbation analysis.

  • Use this estimator within multi-start stochastic

gradient ascent to find an approximate solution of argmaxxEI-CFn(x).

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-14
SLIDE 14

Asymptotic consistency

Theorem. Under suitable regularity conditions, EI-CF is asymptotically consistent, i.e., it finds the true global

  • ptimum as the number of evaluations goes to infinity.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-15
SLIDE 15

Numerical experiments

20 40 60 80 100 function evaluations −7 −6 −5 −4 −3 −2 −1 1 log10(regret) Random PI EI PES Random-CF PI-CF EI-CF 20 40 60 80 100 function evaluations −5 −4 −3 −2 −1 log10(regret) Random PI EI PES Random-CF PI-CF EI-CF 20 40 60 80 100 function evaluations −7 −6 −5 −4 −3 −2 −1 1 log10(regret) Random PI EI PES Random-CF PI-CF EI-CF 20 40 60 80 100 function evaluations −6 −5 −4 −3 −2 −1 1 2 log10(regret) Random PI EI PES Random-CF PI-CF EI-CF

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

slide-16
SLIDE 16

Conclusion

  • Exploiting composite objectives can improve

BayesOpt performance by 3-6 orders of magnitude.

  • Come to our poster: Wed 6:30-9pm Pacific

Ballroom #237.

  • Check out our code:

https://github.com/RaulAstudillo06/BOCF

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions