Applications of Constrained BayesOpt in Robotics and Rethinking - - PowerPoint PPT Presentation

applications of constrained bayesopt in robotics and
SMART_READER_LITE
LIVE PREVIEW

Applications of Constrained BayesOpt in Robotics and Rethinking - - PowerPoint PPT Presentation

Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc Toussaint Machine Learning & Robotics Lab University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de NIPS BayesOpt, Dec 2016 1/20


slide-1
SLIDE 1

Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters

Marc Toussaint

Machine Learning & Robotics Lab – University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de NIPS BayesOpt, Dec 2016

1/20

slide-2
SLIDE 2

(1) Learning Manipulation Skills

  • Englert & Toussaint: Combined Optimization and Reinforcement Learning for

Manipulation Skills. R:SS’16 2/20

slide-3
SLIDE 3

Combined Black-Box and Analytical Optimization

Englert & Toussaint: Combined Optimization and Reinforcement Learning for Manipulation Skills. R:SS’16

  • CORL

(Combined Optimization and RL):

– Policy parameters w – analytically known cost function J(w) = E{T

t=0 ct(xt, ut) | w}

– projection, implicitly given by a constraint h(w, θ) = 0 – unknown black-box return function R(θ) ∈ R – unknown black-box success constraint S(θ) ∈ {0, 1} – Problem: min

w,θ J(w) − R(θ)

s.t. h(w, θ) = 0, S(θ) = 1

  • Alternate path optimization

minw J(w) s.t. h(w, θ) = 0 with Bayesian Optimization maxθ R(θ) s.t. S(θ) = 1

3/20

slide-4
SLIDE 4

Heuristic to handle constraints

  • Prior mean µ = 2 for g
  • Sample only points

s.t. g(x) ≤ 0

  • Acquisition function combines PI with Boundary Uncertainty

αPIBU(x) = [g(x) ≥ 0]PIf(x) + [g(x) = 0]βσ2

g(x)

4/20

slide-5
SLIDE 5

(2) Optimizing Controller Parameters

  • Drieß, Englert & Toussaint: Constrained Bayesian Optimization of Combined Interaction

Force/Task Space Controllers for Manipulations. IROS Workshop’16 5/20

slide-6
SLIDE 6

Controller Details

  • Non-switching controller for smoothly establishing contacts

– In (each) task space ¨ y∗ = ¨ yref + Kp(yref − y) + Kd( ˙ yref − ˙ y) – Operational space controller (linearized) ¨ q∗ = ¯ Kpq + ¯ Kd ˙ q + ¯ k ¯ Kp = (H + J⊤CJ)-1[HKq

p + J⊤CKpJ]

¯ Kd = (H + J⊤CJ)-1[HKq

d + J⊤CKdJ]

¯ k = (H + J⊤CJ)-1[Hkq + J⊤Ck] – Contact force limit control e ← γe + [|f| > |f ref|] (f ref − f) u = J⊤αe

  • Many parameters!
  • Esp. α, ˙

yref, Kd

6/20

slide-7
SLIDE 7

Optimizing Controller Parameters

  • Optimization objectives:

– Low compliance: tr( ¯ Kp) and tr( ¯ Kd) – Contact force error:

  • (f ref − f)2 dt

– Peak force on onset: |f os| – Smooth force profile: | d

dtf(t)| + | d2 dt2 f(t)| + | d3 dt3 f(t)|

  • dt

– Boolean success: contact and staying in contact

7/20

slide-8
SLIDE 8

Optimizing Controller Parameters

  • Optimization objectives:

– Low compliance: tr( ¯ Kp) and tr( ¯ Kd) – Contact force error:

  • (f ref − f)2 dt

– Peak force on onset: |f os| – Smooth force profile: | d

dtf(t)| + | d2 dt2 f(t)| + | d3 dt3 f(t)|

  • dt

– Boolean success: contact and staying in contact

  • Establishing contact
  • Sliding

7/20

slide-9
SLIDE 9

(3) Safe Active Learning & BayesOpt

  • SAFEOPT: Safety threshold on the objective f(x) ≥ h
  • Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian
  • Processes. ICML

’15 8/20

slide-10
SLIDE 10

(3) Safe Active Learning & BayesOpt

  • SAFEOPT: Safety threshold on the objective f(x) ≥ h
  • Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian
  • Processes. ICML

’15

  • Guarantee to never step outside an unknown g(x) ≤ 0...

– Impossible when no failure data g(x) > 0 exists...

8/20

slide-11
SLIDE 11

(3) Safe Active Learning & BayesOpt

  • SAFEOPT: Safety threshold on the objective f(x) ≥ h
  • Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian
  • Processes. ICML

’15

  • Guarantee to never step outside an unknown g(x) ≤ 0...

– Impossible when no failure data g(x) > 0 exists... – Unless you assume observation of near boundary discriminative values

Schreiter et al: Safe Exploration for Active Learning with Gaussian Processes. ECML ’15 8/20

slide-12
SLIDE 12

Probabilistic guarantees on non-failure

  • Acquisition function

α(x) = σ2

f(x)

s.t. µg(x) + νσg(x) ≥ 0

  • Specify probability of failure δ after n points with m0 initializations → ν
  • Application on cart-pole

9/20

slide-13
SLIDE 13

So, what are the issues?

10/20

slide-14
SLIDE 14

So, what are the issues?

– Choice of hyper parameters!

10/20

slide-15
SLIDE 15

So, what are the issues?

– Choice of hyper parameters! – Stationary covariance functions!

10/20

slide-16
SLIDE 16

So, what are the issues?

– Choice of hyper parameters! – Stationary covariance functions! – Isotropic stationary covariance functions!

10/20

slide-17
SLIDE 17
  • Actually, I’m a fan of Newton Methods

11/20

slide-18
SLIDE 18
  • Actually, I’m a fan of Newton Methods
  • Two messages of classical (convex) optimization:

– Step size (line search, trust region, Wolfe) – Step direction (Newton, quasi-Newton, BFGS, conjugate, covariant)

  • Newton methods are prefect for local optimum down-hill

11/20

slide-19
SLIDE 19

Model-based Optimization

  • If the model is not given: classical model-based optimization (Nodecal

et al. “Derivative-free optimization”)

1: Initialize D with at least 1 2 (n + 1)(n + 2) data points 2: repeat 3:

Compute a regression ˆ f(x) = φ2(x)

⊤β on D 4:

Compute x+ = argminx ˆ f(x) s.t. |x − ˆ x| < α

5:

Compute the improvement ratio ̺ = f(ˆ

x)−f(x+) ˆ f(ˆ x)− ˆ f(x+) 6:

if ̺ > ǫ then

7:

Increase the stepsize α

8:

Accept ˆ x ← x+

9:

Add to data, D ← D ∪ {(x+, f(x+))}

10:

else

11:

if det(D) is too small then // Data improvement

12:

Compute x+ = argmaxx det(D ∪ {x}) s.t. |x − ˆ x| < α

13:

Add to data, D ← D ∪ {(x+, f(x+))}

14:

else

15:

Decrease the stepsize α

16:

end if

17:

end if

18:

Prune the data, e.g., remove argmaxx∈∆ det(D \ {x})

19: until x converges

12/20

slide-20
SLIDE 20

This is similar to BayesOpt with polynomial kernel!

13/20

slide-21
SLIDE 21

A prior about local polynomial optima

  • Assume that the objective has multiple local optima

– Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function

  • “Between” the local optima, the function is smooth → standard squared

exponential kernel

14/20

slide-22
SLIDE 22

A prior about local polynomial optima

  • Assume that the objective has multiple local optima

– Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function

  • “Between” the local optima, the function is smooth → standard squared

exponential kernel

  • The Mixed-global-local kernel

kMGL(x, x′) =        kq(x, x′), x, x′ ∈ Ui, ks(x, x′), x / ∈ Ui, x′ / ∈ Uj 0, else for any i, j kq(x, x′) = (xT x′ + 1)2

14/20

slide-23
SLIDE 23

Finding convex neighborhoods

  • Data set D = {(xi, yi)}
  • U ⊂ D is a convex neighborhood if

{β∗

0, β∗, B∗} = argmin β0,β,B

  • k:xk∈U
  • (β0 + βT xk + 1

2xT

k Bxk) − yk

2 has a positive definite Hessian B∗

15/20

slide-24
SLIDE 24

A heuristic to decrease length-scale

  • The SE-part still has a length-scale hyperparameter l
  • In each iteration we consider to decrease to ˜

lt < lt−1 αr,t := α∗(˜ lt) α∗(lt−1) , α∗(l) = min

x α(x; l)

for any acquisition function α(x; l)

  • Accept smaller lengthscale only if αr,t ≥ h

(e.g., h ≈ 2)

  • Robust to non-stationary objectives
  • 1
  • 0.5

0.5 1 x

  • 1
  • 0.5

0.5 1 y Counter example function

  • 4
  • 3
  • 2
  • 1

1 2 3 5 10 15 20 25 30

Iteration

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 Median log10 IR Correlation adaption: Counter example LOO-CV Alpha Ratio Optimal

16/20

slide-25
SLIDE 25

Mixed-global-local kernel + alpha ratio

2 4 6 8 10 12

Iteration

  • 15
  • 10
  • 5

5

Median log10 IR Quadratic 2D

2 4 6 8 10 12

Iteration

  • 10
  • 5

5

Median log10 IR Rosenbrock

5 10 15

Iteration

  • 5
  • 4
  • 3
  • 2
  • 1

Median log10 IR Branin-Hoo

5 10 15 20 25

Iteration

  • 4
  • 3
  • 2
  • 1

1

Median log10 IR Hartmann 3D

10 20 30 40 50

Iteration

  • 2
  • 1.5
  • 1
  • 0.5

0.5

Median log10 IR Hartmann 6D

5 10 15 20 25

Iteration

  • 4
  • 3
  • 2
  • 1

1

Median log10 IR Exponential 3D

5 10 15 20 25

Iteration

  • 2
  • 1.5
  • 1
  • 0.5

Median log10 IR Exponential 4D

5 10 15 20 25

Iteration

  • 3
  • 2
  • 1

Median log10 IR Exponential 5D

PES IMGPO EI EI AR+MGL

  • PES: Bayesian integration over hyper parameters
  • IMGPO: Bayesian update for hyperparameters in each iteration

17/20

slide-26
SLIDE 26

...work with Kim Wabersich

18/20

slide-27
SLIDE 27

Conclusions

  • Solid optimization methods are the savior of robotics!
  • Rethink the priors we use for BayesOpt

– Local optima with varying conditioning

  • Rethink the objective for choosing hyper parameters

– Maximize optimization progress (∼ expected acquisition) rather than data likelihood

19/20

slide-28
SLIDE 28

Thanks

  • for your attention!
  • to the students:

– Peter Englert (BayesOpt for Manipulation) – Jens Schreiter (Safe Active Learning) – Danny Drieß(BayesOpt for Controller Optimization) – Kim Wabersich (Mixed-global-local kernel & alpha ratio)

  • and my lab:

20/20