Applications of Constrained BayesOpt in Robotics and Rethinking - - PowerPoint PPT Presentation
Applications of Constrained BayesOpt in Robotics and Rethinking - - PowerPoint PPT Presentation
Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc Toussaint Machine Learning & Robotics Lab University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de NIPS BayesOpt, Dec 2016 1/20
(1) Learning Manipulation Skills
- Englert & Toussaint: Combined Optimization and Reinforcement Learning for
Manipulation Skills. R:SS’16 2/20
Combined Black-Box and Analytical Optimization
Englert & Toussaint: Combined Optimization and Reinforcement Learning for Manipulation Skills. R:SS’16
- CORL
(Combined Optimization and RL):
– Policy parameters w – analytically known cost function J(w) = E{T
t=0 ct(xt, ut) | w}
– projection, implicitly given by a constraint h(w, θ) = 0 – unknown black-box return function R(θ) ∈ R – unknown black-box success constraint S(θ) ∈ {0, 1} – Problem: min
w,θ J(w) − R(θ)
s.t. h(w, θ) = 0, S(θ) = 1
- Alternate path optimization
minw J(w) s.t. h(w, θ) = 0 with Bayesian Optimization maxθ R(θ) s.t. S(θ) = 1
3/20
Heuristic to handle constraints
- Prior mean µ = 2 for g
- Sample only points
s.t. g(x) ≤ 0
- Acquisition function combines PI with Boundary Uncertainty
αPIBU(x) = [g(x) ≥ 0]PIf(x) + [g(x) = 0]βσ2
g(x)
4/20
(2) Optimizing Controller Parameters
- Drieß, Englert & Toussaint: Constrained Bayesian Optimization of Combined Interaction
Force/Task Space Controllers for Manipulations. IROS Workshop’16 5/20
Controller Details
- Non-switching controller for smoothly establishing contacts
– In (each) task space ¨ y∗ = ¨ yref + Kp(yref − y) + Kd( ˙ yref − ˙ y) – Operational space controller (linearized) ¨ q∗ = ¯ Kpq + ¯ Kd ˙ q + ¯ k ¯ Kp = (H + J⊤CJ)-1[HKq
p + J⊤CKpJ]
¯ Kd = (H + J⊤CJ)-1[HKq
d + J⊤CKdJ]
¯ k = (H + J⊤CJ)-1[Hkq + J⊤Ck] – Contact force limit control e ← γe + [|f| > |f ref|] (f ref − f) u = J⊤αe
- Many parameters!
- Esp. α, ˙
yref, Kd
6/20
Optimizing Controller Parameters
- Optimization objectives:
– Low compliance: tr( ¯ Kp) and tr( ¯ Kd) – Contact force error:
- (f ref − f)2 dt
– Peak force on onset: |f os| – Smooth force profile: | d
dtf(t)| + | d2 dt2 f(t)| + | d3 dt3 f(t)|
- dt
– Boolean success: contact and staying in contact
7/20
Optimizing Controller Parameters
- Optimization objectives:
– Low compliance: tr( ¯ Kp) and tr( ¯ Kd) – Contact force error:
- (f ref − f)2 dt
– Peak force on onset: |f os| – Smooth force profile: | d
dtf(t)| + | d2 dt2 f(t)| + | d3 dt3 f(t)|
- dt
– Boolean success: contact and staying in contact
- Establishing contact
- Sliding
7/20
(3) Safe Active Learning & BayesOpt
- SAFEOPT: Safety threshold on the objective f(x) ≥ h
- Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian
- Processes. ICML
’15 8/20
(3) Safe Active Learning & BayesOpt
- SAFEOPT: Safety threshold on the objective f(x) ≥ h
- Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian
- Processes. ICML
’15
- Guarantee to never step outside an unknown g(x) ≤ 0...
– Impossible when no failure data g(x) > 0 exists...
8/20
(3) Safe Active Learning & BayesOpt
- SAFEOPT: Safety threshold on the objective f(x) ≥ h
- Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian
- Processes. ICML
’15
- Guarantee to never step outside an unknown g(x) ≤ 0...
– Impossible when no failure data g(x) > 0 exists... – Unless you assume observation of near boundary discriminative values
Schreiter et al: Safe Exploration for Active Learning with Gaussian Processes. ECML ’15 8/20
Probabilistic guarantees on non-failure
- Acquisition function
α(x) = σ2
f(x)
s.t. µg(x) + νσg(x) ≥ 0
- Specify probability of failure δ after n points with m0 initializations → ν
- Application on cart-pole
9/20
So, what are the issues?
10/20
So, what are the issues?
– Choice of hyper parameters!
10/20
So, what are the issues?
– Choice of hyper parameters! – Stationary covariance functions!
10/20
So, what are the issues?
– Choice of hyper parameters! – Stationary covariance functions! – Isotropic stationary covariance functions!
10/20
- Actually, I’m a fan of Newton Methods
11/20
- Actually, I’m a fan of Newton Methods
- Two messages of classical (convex) optimization:
– Step size (line search, trust region, Wolfe) – Step direction (Newton, quasi-Newton, BFGS, conjugate, covariant)
- Newton methods are prefect for local optimum down-hill
11/20
Model-based Optimization
- If the model is not given: classical model-based optimization (Nodecal
et al. “Derivative-free optimization”)
1: Initialize D with at least 1 2 (n + 1)(n + 2) data points 2: repeat 3:
Compute a regression ˆ f(x) = φ2(x)
⊤β on D 4:
Compute x+ = argminx ˆ f(x) s.t. |x − ˆ x| < α
5:
Compute the improvement ratio ̺ = f(ˆ
x)−f(x+) ˆ f(ˆ x)− ˆ f(x+) 6:
if ̺ > ǫ then
7:
Increase the stepsize α
8:
Accept ˆ x ← x+
9:
Add to data, D ← D ∪ {(x+, f(x+))}
10:
else
11:
if det(D) is too small then // Data improvement
12:
Compute x+ = argmaxx det(D ∪ {x}) s.t. |x − ˆ x| < α
13:
Add to data, D ← D ∪ {(x+, f(x+))}
14:
else
15:
Decrease the stepsize α
16:
end if
17:
end if
18:
Prune the data, e.g., remove argmaxx∈∆ det(D \ {x})
19: until x converges
12/20
This is similar to BayesOpt with polynomial kernel!
13/20
A prior about local polynomial optima
- Assume that the objective has multiple local optima
– Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function
- “Between” the local optima, the function is smooth → standard squared
exponential kernel
14/20
A prior about local polynomial optima
- Assume that the objective has multiple local optima
– Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function
- “Between” the local optima, the function is smooth → standard squared
exponential kernel
- The Mixed-global-local kernel
kMGL(x, x′) = kq(x, x′), x, x′ ∈ Ui, ks(x, x′), x / ∈ Ui, x′ / ∈ Uj 0, else for any i, j kq(x, x′) = (xT x′ + 1)2
14/20
Finding convex neighborhoods
- Data set D = {(xi, yi)}
- U ⊂ D is a convex neighborhood if
{β∗
0, β∗, B∗} = argmin β0,β,B
- k:xk∈U
- (β0 + βT xk + 1
2xT
k Bxk) − yk
2 has a positive definite Hessian B∗
15/20
A heuristic to decrease length-scale
- The SE-part still has a length-scale hyperparameter l
- In each iteration we consider to decrease to ˜
lt < lt−1 αr,t := α∗(˜ lt) α∗(lt−1) , α∗(l) = min
x α(x; l)
for any acquisition function α(x; l)
- Accept smaller lengthscale only if αr,t ≥ h
(e.g., h ≈ 2)
- Robust to non-stationary objectives
- 1
- 0.5
0.5 1 x
- 1
- 0.5
0.5 1 y Counter example function
- 4
- 3
- 2
- 1
1 2 3 5 10 15 20 25 30
Iteration
- 2.5
- 2
- 1.5
- 1
- 0.5
0.5 Median log10 IR Correlation adaption: Counter example LOO-CV Alpha Ratio Optimal
16/20
Mixed-global-local kernel + alpha ratio
2 4 6 8 10 12
Iteration
- 15
- 10
- 5
5
Median log10 IR Quadratic 2D
2 4 6 8 10 12
Iteration
- 10
- 5
5
Median log10 IR Rosenbrock
5 10 15
Iteration
- 5
- 4
- 3
- 2
- 1
Median log10 IR Branin-Hoo
5 10 15 20 25
Iteration
- 4
- 3
- 2
- 1
1
Median log10 IR Hartmann 3D
10 20 30 40 50
Iteration
- 2
- 1.5
- 1
- 0.5
0.5
Median log10 IR Hartmann 6D
5 10 15 20 25
Iteration
- 4
- 3
- 2
- 1
1
Median log10 IR Exponential 3D
5 10 15 20 25
Iteration
- 2
- 1.5
- 1
- 0.5
Median log10 IR Exponential 4D
5 10 15 20 25
Iteration
- 3
- 2
- 1
Median log10 IR Exponential 5D
PES IMGPO EI EI AR+MGL
- PES: Bayesian integration over hyper parameters
- IMGPO: Bayesian update for hyperparameters in each iteration
17/20
...work with Kim Wabersich
18/20
Conclusions
- Solid optimization methods are the savior of robotics!
- Rethink the priors we use for BayesOpt
– Local optima with varying conditioning
- Rethink the objective for choosing hyper parameters
– Maximize optimization progress (∼ expected acquisition) rather than data likelihood
19/20
Thanks
- for your attention!
- to the students:
– Peter Englert (BayesOpt for Manipulation) – Jens Schreiter (Safe Active Learning) – Danny Drieß(BayesOpt for Controller Optimization) – Kim Wabersich (Mixed-global-local kernel & alpha ratio)
- and my lab: