applications of constrained bayesopt in robotics and
play

Applications of Constrained BayesOpt in Robotics and Rethinking - PowerPoint PPT Presentation

Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc Toussaint Machine Learning & Robotics Lab University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de NIPS BayesOpt, Dec 2016 1/20


  1. Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc Toussaint Machine Learning & Robotics Lab – University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de NIPS BayesOpt, Dec 2016 1/20

  2. (1) Learning Manipulation Skills ◦ Englert & Toussaint: Combined Optimization and Reinforcement Learning for Manipulation Skills . R:SS’16 2/20

  3. Combined Black-Box and Analytical Optimization Englert & Toussaint: Combined Optimization and Reinforcement Learning for Manipulation Skills . R:SS’16 • CORL (Combined Optimization and RL): – Policy parameters w – analytically known cost function J ( w ) = E { � T t =0 c t ( x t , u t ) | w } – projection , implicitly given by a constraint h ( w, θ ) = 0 – unknown black-box return function R ( θ ) ∈ R – unknown black-box success constraint S ( θ ) ∈ { 0 , 1 } – Problem: min w,θ J ( w ) − R ( θ ) h ( w, θ ) = 0 , S ( θ ) = 1 s.t. • Alternate path optimization min w J ( w ) h ( w, θ ) = 0 s.t. with Bayesian Optimization max θ R ( θ ) s.t. S ( θ ) = 1 3/20

  4. Heuristic to handle constraints • Prior mean µ = 2 for g • Sample only points g ( x ) ≤ 0 s.t. • Acquisition function combines PI with Boundary Uncertainty α PIBU ( x ) = [ g ( x ) ≥ 0] PI f ( x ) + [ g ( x ) = 0] βσ 2 g ( x ) 4/20

  5. (2) Optimizing Controller Parameters ◦ Drieß, Englert & Toussaint: Constrained Bayesian Optimization of Combined Interaction Force/Task Space Controllers for Manipulations . IROS Workshop’16 5/20

  6. Controller Details • Non-switching controller for smoothly establishing contacts – In (each) task space y ∗ = ¨ y ref + K p ( y ref − y ) + K d ( ˙ y ref − ˙ ¨ y ) – Operational space controller (linearized) q ∗ = ¯ K p q + ¯ q + ¯ ¨ K d ˙ k ¯ K p = ( H + J ⊤ CJ ) -1 [ HK q p + J ⊤ CK p J ] K d = ( H + J ⊤ CJ ) -1 [ HK q ¯ d + J ⊤ CK d J ] k = ( H + J ⊤ CJ ) -1 [ Hk q + J ⊤ Ck ] ¯ – Contact force limit control e ← γe + [ | f | > | f ref | ] ( f ref − f ) u = J ⊤ αe y ref , K d • Many parameters! Esp. α, ˙ 6/20

  7. Optimizing Controller Parameters • Optimization objectives: – Low compliance: tr( ¯ K p ) and tr( ¯ K d ) ( f ref − f ) 2 dt � – Contact force error: – Peak force on onset: | f os | � � � dt f ( t ) | + | d 2 dt 2 f ( t ) | + | d 3 | d – Smooth force profile: dt 3 f ( t ) | dt – Boolean success: contact and staying in contact 7/20

  8. Optimizing Controller Parameters • Optimization objectives: – Low compliance: tr( ¯ K p ) and tr( ¯ K d ) ( f ref − f ) 2 dt � – Contact force error: – Peak force on onset: | f os | � � � dt f ( t ) | + | d 2 dt 2 f ( t ) | + | d 3 | d – Smooth force profile: dt 3 f ( t ) | dt – Boolean success: contact and staying in contact • Establishing contact • Sliding 7/20

  9. (3) Safe Active Learning & BayesOpt • S AFE O PT : Safety threshold on the objective f ( x ) ≥ h ◦ Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian Processes . ICML ’15 8/20

  10. (3) Safe Active Learning & BayesOpt • S AFE O PT : Safety threshold on the objective f ( x ) ≥ h ◦ Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian Processes . ICML ’15 • Guarantee to never step outside an unknown g ( x ) ≤ 0 ... – Impossible when no failure data g ( x ) > 0 exists... 8/20

  11. (3) Safe Active Learning & BayesOpt • S AFE O PT : Safety threshold on the objective f ( x ) ≥ h ◦ Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian Processes . ICML ’15 • Guarantee to never step outside an unknown g ( x ) ≤ 0 ... – Impossible when no failure data g ( x ) > 0 exists... – Unless you assume observation of near boundary discriminative values 8/20 Schreiter et al: Safe Exploration for Active Learning with Gaussian Processes . ECML ’15

  12. Probabilistic guarantees on non-failure • Acquisition function α ( x ) = σ 2 f ( x ) s.t. µ g ( x ) + νσ g ( x ) ≥ 0 • Specify probability of failure δ after n points with m 0 initializations �→ ν • Application on cart-pole 9/20

  13. So, what are the issues? 10/20

  14. So, what are the issues? – Choice of hyper parameters! 10/20

  15. So, what are the issues? – Choice of hyper parameters! – Stationary covariance functions! 10/20

  16. So, what are the issues? – Choice of hyper parameters! – Stationary covariance functions! – Isotropic stationary covariance functions! 10/20

  17. • Actually, I’m a fan of Newton Methods 11/20

  18. • Actually, I’m a fan of Newton Methods • Two messages of classical (convex) optimization: – Step size (line search, trust region, Wolfe) – Step direction (Newton, quasi-Newton, BFGS, conjugate, covariant) • Newton methods are prefect for local optimum down-hill 11/20

  19. Model-based Optimization • If the model is not given: classical model-based optimization (Nodecal et al. “Derivative-free optimization”) 1: Initialize D with at least 1 2 ( n + 1)( n + 2) data points 2: repeat Compute a regression ˆ ⊤ β on D f ( x ) = φ 2 ( x ) 3: Compute x + = argmin x ˆ f ( x ) s.t. | x − ˆ x | < α 4: x ) − f ( x + ) Compute the improvement ratio ̺ = f (ˆ 5: ˆ x ) − ˆ f (ˆ f ( x + ) if ̺ > ǫ then 6: Increase the stepsize α 7: x ← x + Accept ˆ 8: Add to data, D ← D ∪ { ( x + , f ( x + )) } 9: else 10: if det( D ) is too small then // Data improvement 11: Compute x + = argmax x det( D ∪ { x } ) s.t. | x − ˆ x | < α 12: Add to data, D ← D ∪ { ( x + , f ( x + )) } 13: else 14: Decrease the stepsize α 15: end if 16: end if 17: Prune the data, e.g., remove argmax x ∈ ∆ det( D \ { x } ) 18: 19: until x converges 12/20

  20. This is similar to BayesOpt with polynomial kernel! 13/20

  21. A prior about local polynomial optima • Assume that the objective has multiple local optima – Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function • “Between” the local optima, the function is smooth → standard squared exponential kernel 14/20

  22. A prior about local polynomial optima • Assume that the objective has multiple local optima – Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function • “Between” the local optima, the function is smooth → standard squared exponential kernel • The Mixed-global-local kernel k q ( x, x ′ ) , x, x ′ ∈ U i ,     ∈ U i , x ′ / k MGL ( x, x ′ ) = k s ( x, x ′ ) , x / ∈ U j   0 , else  for any i, j k q ( x, x ′ ) = ( x T x ′ + 1) 2 14/20

  23. Finding convex neighborhoods • Data set D = { ( x i , y i ) } • U ⊂ D is a convex neighborhood if � 2 � ( β 0 + β T x k + 1 � 2 x T { β ∗ 0 , β ∗ , B ∗ } = argmin k Bx k ) − y k β 0 ,β,B k : x k ∈ U has a positive definite Hessian B ∗ 15/20

  24. A heuristic to decrease length-scale • The SE-part still has a length-scale hyperparameter l • In each iteration we consider to decrease to ˜ l t < l t − 1 α ∗ (˜ l t ) α ∗ ( l ) = min α r,t := α ∗ ( l t − 1 ) , x α ( x ; l ) for any acquisition function α ( x ; l ) • Accept smaller lengthscale only if α r,t ≥ h (e.g., h ≈ 2 ) • Robust to non-stationary objectives Counter example function 1 Correlation adaption: Counter example 3 0.5 2 0 0.5 1 Median log10 IR -0.5 0 0 y -1 -1 -2 -1.5 LOO-CV -0.5 -3 Alpha Ratio -2 Optimal -4 -1 -2.5 -1 -0.5 0 0.5 1 5 10 15 20 25 30 x Iteration 16/20

  25. Mixed-global-local kernel + alpha ratio Quadratic 2D Rosenbrock Branin-Hoo Hartmann 3D 5 5 0 1 -1 0 Median log10 IR Median log10 IR Median log10 IR Median log10 IR 0 0 -2 -1 -5 -3 -2 -5 -10 -4 -3 -15 -10 -5 -4 2 4 6 8 10 12 2 4 6 8 10 12 5 10 15 5 10 15 20 25 Iteration Iteration Iteration Iteration Hartmann 6D Exponential 3D Exponential 4D Exponential 5D 0.5 1 0 0 0 0 Median log10 IR Median log10 IR Median log10 IR Median log10 IR -0.5 -1 -0.5 -1 -1 PES -1 -2 -2 IMGPO -1.5 EI -1.5 -3 EI AR+MGL -3 -2 -4 -2 10 20 30 40 50 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 Iteration Iteration Iteration Iteration • PES: Bayesian integration over hyper parameters • IMGPO: Bayesian update for hyperparameters in each iteration 17/20

  26. ...work with Kim Wabersich 18/20

  27. Conclusions • Solid optimization methods are the savior of robotics! • Rethink the priors we use for BayesOpt – Local optima with varying conditioning • Rethink the objective for choosing hyper parameters – Maximize optimization progress ( ∼ expected acquisition) rather than data likelihood 19/20

  28. Thanks • for your attention! • to the students: – Peter Englert (BayesOpt for Manipulation) – Jens Schreiter (Safe Active Learning) – Danny Drieß(BayesOpt for Controller Optimization) – Kim Wabersich (Mixed-global-local kernel & alpha ratio) • and my lab: 20/20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend