High Dimensional Bayesian Optimisation and Bandits via Additive Models
Kirthevasan Kandasamy, Jeff Schneider, Barnab´ as P´
- czos
ICML ’15
July 8 2015
1/20
High Dimensional Bayesian Optimisation and Bandits via Additive - - PowerPoint PPT Presentation
High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kanda samy , Jeff Schneider, Barnab as P oczos ICML 15 July 8 2015 1/20 Bandits & Optimisation Maximum Likelihood inference in Computational
1/20
E.g: Hubble Constant Baryonic Density
2/20
E.g: Hubble Constant Baryonic Density
2/20
2/20
2/20
3/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
3/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
xt, t=1,...,T f (xt).
3/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
T
3/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
xt, t=1,...,T f (xt).
3/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x f(x)
4/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x f(x)
4/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x f(x)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
xt = 0.828 x ϕt(x)
t
4/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x f(x)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
xt = 0.828 x ϕt(x)
4/20
◮ Statistical Difficulty:
◮ Computational Difficulty:
5/20
◮ Statistical Difficulty:
◮ Computational Difficulty:
◮ (Chen et al. 2012): f depends on a small number of variables.
◮ (Wang et al. 2013): f varies along a lower dimensional
◮ (Djolonga et al. 2013): f varies along a lower dimensional
5/20
◮ Statistical Difficulty:
◮ Computational Difficulty:
◮ Assumes f varies only along a low dimensional subspace. ◮ Perform BO on a low dimensional subspace. ◮ Assumption too strong in realistic settings.
5/20
6/20
j=1} = {(1, 3, 9), (2, 4, 8), (5, 6, 10)} the “decomposition”.
6/20
6/20
i=1}, and test point x†,
† )|X, Y ∼ N
6/20
◮ Bounds on ST: exponential in D → linear in D. ◮ An easy-to-optimise acquisition function. ◮ Performs well even when f is not additive.
7/20
x∈X
t
8/20
x∈X
t
8/20
9/20
9/20
t
9/20
M
t−1(x) + β1/2 t
t−1(x(j)).
10/20
M
t−1(x) + β1/2 t
t−1(x(j))
t (x(j))
t
10/20
M
t−1(x) + β1/2 t
t−1(x(j))
t (x(j))
t
j f (j). Then w.h.p,
10/20
11/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x{1} f(1)(x{1})
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x{2} f(2)(x{2})
12/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x{1} f(1)(x{1})
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x{2} f(2)(x{2})
12/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x{1}
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x{2}
12/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x(1)
t
= 0.869 x{1} ˜ ϕ(1)(x{1})
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x(2)
t
= 0.141 x{2} ˜ ϕ(2)(x{2})
12/20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x(1)
t
= 0.869 x{1} ˜ ϕ(1)(x{1}) xt = (0.869, 0.141)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x(2)
t
= 0.141 x{2} ˜ ϕ(2)(x{2})
12/20
◮ Additive models common in high dimensional regression.
13/20
◮ Additive models common in high dimensional regression.
◮ Additive models are statistically simpler =
13/20
◮ Additive models common in high dimensional regression.
◮ Additive models are statistically simpler =
◮ In BO applications queries are expensive. So we usually
13/20
◮ Additive models common in high dimensional regression.
◮ Additive models are statistically simpler =
◮ In BO applications queries are expensive. So we usually
◮ Observation:
◮ Better bias/ variance trade-off in high dimensional regression. ◮ Easy to maximise acquisition function. 13/20
14/20
15/20
15/20
Cosmological Simulator
Observation
E.g: Hubble Constant Baryonic Density
◮ Task:
◮ 20 Dimensions. But only 9 parameters are relevant. ◮ Each query takes 2-5 seconds. ◮ Use 500 DiRect evaluations to maximise acquisition function.
16/20
17/20
◮ Task:
◮ 22 dimensions. ◮ Each query takes 30-40 seconds. ◮ Use 1000 DiRect evaluations to maximise acquisition function.
18/20
19/20
◮ Additive assumption improves regret:
◮ Acquisition function is easy to maximise. ◮ Even for non-additive f is not additive, Add-GP-UCB does
20/20
◮ Additive assumption improves regret:
◮ Acquisition function is easy to maximise. ◮ Even for non-additive f is not additive, Add-GP-UCB does
◮ Similar results hold for Mat´
20/20
◮ Additive assumption improves regret:
◮ Acquisition function is easy to maximise. ◮ Even for non-additive f is not additive, Add-GP-UCB does
◮ Similar results hold for Mat´
◮ How to choose (d, M)? ◮ Can we generalise to other acquisition functions?
20/20
◮ Additive assumption improves regret:
◮ Acquisition function is easy to maximise. ◮ Even for non-additive f is not additive, Add-GP-UCB does
◮ Similar results hold for Mat´
◮ How to choose (d, M)? ◮ Can we generalise to other acquisition functions?
20/20