Bayesian Optimization of Composite Functions Ral Astudillo Cornell - - PowerPoint PPT Presentation

▶

Jan 15, 2024 36 likes •198 views

Bayesian Optimization of Composite Functions Ral Astudillo Cornell University Joint work with Peter I. Frazier ICML 2019 Ral Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions 1 0 1 log10(regret) 2 3

SLIDE 1

Bayesian Optimization of Composite Functions

Raúl Astudillo Cornell

University

Joint work with Peter I. Frazier ICML 2019

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 2

20 40 60 80 100 function evaluations −7 −6 −5 −4 −3 −2 −1 1 log10(regret) Random EI PES Our method

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 3

Problem

We consider problems of the form max

x∈X f(x),

where f(x) = g(h(x)) and

h : X ⊂ Rd → Rm is a time-consuming-to-evaluate

black-box,

g : Rm → R and its gradient are known in closed

form and fast-to-evaluate.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 4

Composite functions arise naturally in practice

Hyperparameter tuning of classification algorithms:

g(h(x)) = −

hj(x), where hj(x) is the classification error on the j-th class under hyperparameters x.

Calibration of expensive simulators:

g(h(x)) = −

(hj(x) − yj)2, where h(x) is the output of the simulator under parameters x and y is a vector of observed data.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 5

Standard BayesOpt approach

Set a Gaussian process distribution on f.
While evaluation budget is not exhausted:
Compute the posterior distribution on f given the

evaluations so far, {(xi, f(xi))}n

i=1,

Choose the next point to evaluate as the one that

maximizes an acquisition function a: xn+1 ∈ argmaxxan(x), where the subscript n indicates the dependence

n the posterior distribution at time n.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 6

Background: Expected Improvement (EI)

The most widely used acquisition function in standard BayesOpt is: EIn(x) = En

{f(x) − f ∗

n}+

, where

f ∗

n is the best observed value so far,

En is the conditional expectation under the

posterior after n evaluations.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 7

Background: Expected Improvement (EI)

The most widely used acquisition function in standard BayesOpt is: EIn(x) = En

{f(x) − f ∗

n}+

. When f(x) is Gaussian, EI and its derivative have a closed form which make it easy to optimize.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 8

Our contribution

1. A statistical approach for modeling f that greatly

improves over the standard BayesOpt approach.

2. An efficient way to optimize the expected

improvement under this new statistical model.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 9

Our approach

Model h using a multi-output Gaussian process

instead of f directly.

This implies a (non-Gaussian) posterior on

f(x) = g(h(x)).

To decide where to sample next: compute and
ptimize the expected improvement under this new

posterior.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 10

Expected Improvement for Composite Functions

Our acquisition function is Expected Improvement for Composite Functions (EI-CF): EI-CFn(x) = En

{g(h(x)) − f ∗

n}+

, where h is a GP, making h(x) Gaussian.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 11

−4 −2 2 4 x −4 −2 2 4 6 8 h(x) h posterior mean 95% confidence interval −4 −2 2 4 x 5 10 15 20 25 30 35 f(x) f posterior mean 95% confidence interval −4 −2 2 4 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 EI-CF(x) EI-CF −4 −2 2 4 x 5 10 15 20 25 30 35 f(x) f posterior mean 95% confidence interval −4 −2 2 4 x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 EI(x) EI

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 12

Challenge: maximizing EI-CF is hard

Expected Improvement for Composite Functions (EI-CF): EI-CFn(x) = En

{g(h(x)) − f ∗

n}+

. Challenge:

When h is a GP and g is nonlinear, f(x) = g(h(x))

is not Gaussian.

EI-CF does not have a closed form, making it hard

to optimize.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 13

Our approach to maximize EI-CF

Construct an unbiased estimator of ∇EI-CFn(x)

using the reparametrization trick and infinitesimal perturbation analysis.

Use this estimator within multi-start stochastic

gradient ascent to find an approximate solution of argmaxxEI-CFn(x).

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 14

Asymptotic consistency

Theorem. Under suitable regularity conditions, EI-CF is asymptotically consistent, i.e., it finds the true global

ptimum as the number of evaluations goes to infinity.

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 15

Numerical experiments

20 40 60 80 100 function evaluations −7 −6 −5 −4 −3 −2 −1 1 log10(regret) Random PI EI PES Random-CF PI-CF EI-CF 20 40 60 80 100 function evaluations −5 −4 −3 −2 −1 log10(regret) Random PI EI PES Random-CF PI-CF EI-CF 20 40 60 80 100 function evaluations −7 −6 −5 −4 −3 −2 −1 1 log10(regret) Random PI EI PES Random-CF PI-CF EI-CF 20 40 60 80 100 function evaluations −6 −5 −4 −3 −2 −1 1 2 log10(regret) Random PI EI PES Random-CF PI-CF EI-CF

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

SLIDE 16

Conclusion

Exploiting composite objectives can improve

BayesOpt performance by 3-6 orders of magnitude.

Come to our poster: Wed 6:30-9pm Pacific

Ballroom #237.

Check out our code:

https://github.com/RaulAstudillo06/BOCF

Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions