Regret bounds for meta Bayesian optimization with an unknown Gaussian - - PowerPoint PPT Presentation

▶

Apr 07, 2024 141 likes •347 views

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior Zi Wang* Beomjoon Kim* Leslie Pack Kaelbling Dec 5 @ NeurIPS 18 Poster #22 Bayesian optimization x * = argmax Goal: f ( x ) x

SLIDE 1

Zi Wang* Beomjoon Kim* Leslie Pack Kaelbling

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior

Dec 5 @ NeurIPS 18

Poster #22

SLIDE 2

Bayesian optimization

Goal:

x* = argmax

x∈𝔜

f(x)

Challenges:

f is expensive to evaluate
f is multi-peak
no gradient information
evaluations can be noisy

SLIDE 3

Bayesian optimization

Goal:

x* = argmax

x∈𝔜

f(x)

Challenges:

f is expensive to evaluate
f is multi-peak
no gradient information
evaluations can be noisy

Assume a GP prior f ∼ GP(μ, k) LOOP

choose new query point(s) to evaluate
compute the posterior GP model
3
2
1

1 2 3

1 2

f(x) x

SLIDE 4

Bayesian optimization

Goal:

x* = argmax

x∈𝔜

f(x)

Challenges:

f is expensive to evaluate
f is multi-peak
no gradient information
evaluations can be noisy

Assume a GP prior f ∼ GP(μ, k) LOOP

choose new query point(s) to evaluate
compute the posterior GP model
3
2
1

1 2 3

1 2

f(x) x

How to choose the prior?

SLIDE 5

Bayesian optimization

Goal:

x* = argmax

x∈𝔜

f(x)

Assume a GP prior f ∼ GP(μ, k) LOOP

choose new query point(s) to evaluate
compute the posterior GP model
3
2
1

1 2 3

1 2

f(x) x

re-estimate the prior parameters

Challenges:

f is expensive to evaluate
f is multi-peak
no gradient information
evaluations can be noisy

e.g. by maximizing marginal data likelihood every few iterations

SLIDE 6

Bayesian optimization

Goal:

x* = argmax

x∈𝔜

f(x)

Assume a GP prior f ∼ GP(μ, k)

Which comes first? Data or prior? Challenges:

f is expensive to evaluate
f is multi-peak
no gradient information
evaluations can be noisy

LOOP

choose new query point(s) to evaluate
compute the posterior GP model
re-estimate the prior parameters

e.g. by maximizing marginal data likelihood every few iterations

SLIDE 7

Bayesian optimization

Goal:

x* = argmax

x∈𝔜

f(x)

Assume a GP prior f ∼ GP(μ, k)

Hard to analyze. Which comes first? Data or prior? Challenges:

f is expensive to evaluate
f is multi-peak
no gradient information
evaluations can be noisy

LOOP

choose new query point(s) to evaluate
compute the posterior GP model
re-estimate the prior parameters

e.g. by maximizing marginal data likelihood every few iterations

SLIDE 8

Bayesian optimization with an unknown GP prior

prior model data collected on f

SLIDE 9

Bayesian optimization with an unknown GP prior

prior model data collected on f

Our problem setup: use past experience with similar functions as the meta training data to break the circular dependencies

SLIDE 10

Meta Bayesian optimization with an unknown GP prior

Offline phase Online phase

SLIDE 11

x x

ˆ µ(x) ˆ µ(x) ± 3 q ˆ k(x)

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Estimated prior

Online phase

Estimate the GP prior from offline data sampled from the same prior

SLIDE 12

ˆ µ0(x) ˆ µ0(x) ± ζ1 q ˆ k0(x)

ˆ µ(x) ˆ µ(x) ± 3 q ˆ k(x)

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Estimated prior

Online phase

Estimate the GP prior from offline data sampled from the same prior Construct unbiased estimators of the posterior and use a variant of GP-UCB

SLIDE 13

ˆ µ1(x) ˆ µ1(x) ± ζ2 q ˆ k1(x)

ˆ µ(x) ˆ µ(x) ± 3 q ˆ k(x)

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Estimated prior

Online phase

Estimate the GP prior from offline data sampled from the same prior Construct unbiased estimators of the posterior and use a variant of GP-UCB

SLIDE 14

ˆ µ2(x) ˆ µ2(x) ± ζ3 q ˆ k2(x)

ˆ µ(x) ˆ µ(x) ± 3 q ˆ k(x)

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Estimated prior

Online phase

Estimate the GP prior from offline data sampled from the same prior Construct unbiased estimators of the posterior and use a variant of GP-UCB

SLIDE 15

ˆ µ3(x) ˆ µ3(x) ± ζ4 q ˆ k3(x)

ˆ µ(x) ˆ µ(x) ± 3 q ˆ k(x)

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Estimated prior

Online phase

Estimate the GP prior from offline data sampled from the same prior Construct unbiased estimators of the posterior and use a variant of GP-UCB

SLIDE 16

ˆ µ4(x) ˆ µ4(x) ± ζ5 q ˆ k4(x)

ˆ µ(x) ˆ µ(x) ± 3 q ˆ k(x)

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Estimated prior

Online phase

Estimate the GP prior from offline data sampled from the same prior Construct unbiased estimators of the posterior and use a variant of GP-UCB

SLIDE 17

Effect of N, the number of meta training functions

0.00 0.25 0.50 0.75 1.00 x −10 −5 5 10

ˆ µt(x) ˆ µt(x) ± ζt+1 q ˆ kt(x)

0.00 0.25 0.50 0.75 1.00 x −10 −5 5 10

ˆ µt(x) ˆ µt(x) ± ζt+1 q ˆ kt(x)

N = 1000 N = 100

x x

SLIDE 18

Important assumptions:

meta-training functions come from the same prior
enough number of meta-training functions N ≳ T + 20

constant

bservation noise

Theorem (finite input space)

Given , with high probability, simple regret RT ≲

O ( 1 N − T ) + C O ( log T T ) + σ2 → Cσ T f

bservations on the test function

linear kernel

Bounding the regret of meta BO with an unknown GP prior

≈ 10

Results for continuous input space @ poster #22

SLIDE 19

Empirical results on block picking and placing

……

f1 f2 fN f

meta-training data test function

N = 1500

#evaluations of test function

Max observed value

—Our method —UCB —TransLearn —Rand

5 10 15 20 25 30 6 5 4 3 2

proportion of meta-training data

—Our method —UCB

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 6 5 4 3 2

SLIDE 20

Poster #22

More results on:

estimation details for discrete and continuous input spaces
regret bounds for compact input space in
regret bounds for probability of improvement in the meta learning setting
empirical results on robotics tasks

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior

Poster #22

Bayesian optimization

Goal:

x* = argmax

f(x)

Challenges:

Bayesian optimization

Goal:

x* = argmax

f(x)

Challenges:

Bayesian optimization

Goal:

x* = argmax

f(x)

Challenges:

How to choose the prior?

Bayesian optimization

Goal:

x* = argmax

f(x)

Challenges:

Bayesian optimization

Goal:

x* = argmax

f(x)

Which comes first? Data or prior? Challenges:

Bayesian optimization

Goal:

x* = argmax

f(x)

Hard to analyze. Which comes first? Data or prior? Challenges:

Bayesian optimization with an unknown GP prior

prior model data collected on f

Bayesian optimization with an unknown GP prior

prior model data collected on f

Our problem setup: use past experience with similar functions as the meta training data to break the circular dependencies

Meta Bayesian optimization with an unknown GP prior

Offline phase Online phase

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Online phase

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Online phase

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Online phase

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Online phase

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Online phase

Meta Bayesian optimization with an unknown GP prior

Offline phase ̂ μ, ̂ k

Online phase

Effect of N, the number of meta training functions

0.00 0.25 0.50 0.75 1.00 x −10 −5 5 10

0.00 0.25 0.50 0.75 1.00 x −10 −5 5 10

N = 1000 N = 100

Theorem (finite input space)

O ( 1 N − T ) + C O ( log T T ) + σ2 → Cσ T f

Bounding the regret of meta BO with an unknown GP prior

Empirical results on block picking and placing

……

f1 f2 fN f

meta-training data test function

N = 1500

Poster #22

More results on:

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior

Rd https://ziw.mit.edu/meta_bo