Applications of Constrained BayesOpt in Robotics and Rethinking - PowerPoint PPT Presentation

Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc Toussaint Machine Learning & Robotics Lab – University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de NIPS BayesOpt, Dec 2016 1/20

(1) Learning Manipulation Skills ◦ Englert & Toussaint: Combined Optimization and Reinforcement Learning for Manipulation Skills . R:SS’16 2/20

Combined Black-Box and Analytical Optimization Englert & Toussaint: Combined Optimization and Reinforcement Learning for Manipulation Skills . R:SS’16 • CORL (Combined Optimization and RL): – Policy parameters w – analytically known cost function J ( w ) = E { � T t =0 c t ( x t , u t ) | w } – projection , implicitly given by a constraint h ( w, θ ) = 0 – unknown black-box return function R ( θ ) ∈ R – unknown black-box success constraint S ( θ ) ∈ { 0 , 1 } – Problem: min w,θ J ( w ) − R ( θ ) h ( w, θ ) = 0 , S ( θ ) = 1 s.t. • Alternate path optimization min w J ( w ) h ( w, θ ) = 0 s.t. with Bayesian Optimization max θ R ( θ ) s.t. S ( θ ) = 1 3/20

Heuristic to handle constraints • Prior mean µ = 2 for g • Sample only points g ( x ) ≤ 0 s.t. • Acquisition function combines PI with Boundary Uncertainty α PIBU ( x ) = [ g ( x ) ≥ 0] PI f ( x ) + [ g ( x ) = 0] βσ 2 g ( x ) 4/20

(2) Optimizing Controller Parameters ◦ Drieß, Englert & Toussaint: Constrained Bayesian Optimization of Combined Interaction Force/Task Space Controllers for Manipulations . IROS Workshop’16 5/20

Controller Details • Non-switching controller for smoothly establishing contacts – In (each) task space y ∗ = ¨ y ref + K p ( y ref − y ) + K d ( ˙ y ref − ˙ ¨ y ) – Operational space controller (linearized) q ∗ = ¯ K p q + ¯ q + ¯ ¨ K d ˙ k ¯ K p = ( H + J ⊤ CJ ) -1 [ HK q p + J ⊤ CK p J ] K d = ( H + J ⊤ CJ ) -1 [ HK q ¯ d + J ⊤ CK d J ] k = ( H + J ⊤ CJ ) -1 [ Hk q + J ⊤ Ck ] ¯ – Contact force limit control e ← γe + [ | f | > | f ref | ] ( f ref − f ) u = J ⊤ αe y ref , K d • Many parameters! Esp. α, ˙ 6/20

Optimizing Controller Parameters • Optimization objectives: – Low compliance: tr( ¯ K p ) and tr( ¯ K d ) ( f ref − f ) 2 dt � – Contact force error: – Peak force on onset: | f os | � � � dt f ( t ) | + | d 2 dt 2 f ( t ) | + | d 3 | d – Smooth force profile: dt 3 f ( t ) | dt – Boolean success: contact and staying in contact 7/20

Optimizing Controller Parameters • Optimization objectives: – Low compliance: tr( ¯ K p ) and tr( ¯ K d ) ( f ref − f ) 2 dt � – Contact force error: – Peak force on onset: | f os | � � � dt f ( t ) | + | d 2 dt 2 f ( t ) | + | d 3 | d – Smooth force profile: dt 3 f ( t ) | dt – Boolean success: contact and staying in contact • Establishing contact • Sliding 7/20

(3) Safe Active Learning & BayesOpt • S AFE O PT : Safety threshold on the objective f ( x ) ≥ h ◦ Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian Processes . ICML ’15 8/20

(3) Safe Active Learning & BayesOpt • S AFE O PT : Safety threshold on the objective f ( x ) ≥ h ◦ Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian Processes . ICML ’15 • Guarantee to never step outside an unknown g ( x ) ≤ 0 ... – Impossible when no failure data g ( x ) > 0 exists... 8/20

(3) Safe Active Learning & BayesOpt • S AFE O PT : Safety threshold on the objective f ( x ) ≥ h ◦ Sui, Gotovos, Burdick, Krause: Safe Exploration for Optimization with Gaussian Processes . ICML ’15 • Guarantee to never step outside an unknown g ( x ) ≤ 0 ... – Impossible when no failure data g ( x ) > 0 exists... – Unless you assume observation of near boundary discriminative values 8/20 Schreiter et al: Safe Exploration for Active Learning with Gaussian Processes . ECML ’15

Probabilistic guarantees on non-failure • Acquisition function α ( x ) = σ 2 f ( x ) s.t. µ g ( x ) + νσ g ( x ) ≥ 0 • Specify probability of failure δ after n points with m 0 initializations �→ ν • Application on cart-pole 9/20

So, what are the issues? 10/20

So, what are the issues? – Choice of hyper parameters! 10/20

So, what are the issues? – Choice of hyper parameters! – Stationary covariance functions! 10/20

So, what are the issues? – Choice of hyper parameters! – Stationary covariance functions! – Isotropic stationary covariance functions! 10/20

• Actually, I’m a fan of Newton Methods 11/20

• Actually, I’m a fan of Newton Methods • Two messages of classical (convex) optimization: – Step size (line search, trust region, Wolfe) – Step direction (Newton, quasi-Newton, BFGS, conjugate, covariant) • Newton methods are prefect for local optimum down-hill 11/20

Model-based Optimization • If the model is not given: classical model-based optimization (Nodecal et al. “Derivative-free optimization”) 1: Initialize D with at least 1 2 ( n + 1)( n + 2) data points 2: repeat Compute a regression ˆ ⊤ β on D f ( x ) = φ 2 ( x ) 3: Compute x + = argmin x ˆ f ( x ) s.t. | x − ˆ x | < α 4: x ) − f ( x + ) Compute the improvement ratio ̺ = f (ˆ 5: ˆ x ) − ˆ f (ˆ f ( x + ) if ̺ > ǫ then 6: Increase the stepsize α 7: x ← x + Accept ˆ 8: Add to data, D ← D ∪ { ( x + , f ( x + )) } 9: else 10: if det( D ) is too small then // Data improvement 11: Compute x + = argmax x det( D ∪ { x } ) s.t. | x − ˆ x | < α 12: Add to data, D ← D ∪ { ( x + , f ( x + )) } 13: else 14: Decrease the stepsize α 15: end if 16: end if 17: Prune the data, e.g., remove argmax x ∈ ∆ det( D \ { x } ) 18: 19: until x converges 12/20

This is similar to BayesOpt with polynomial kernel! 13/20

A prior about local polynomial optima • Assume that the objective has multiple local optima – Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function • “Between” the local optima, the function is smooth → standard squared exponential kernel 14/20

A prior about local polynomial optima • Assume that the objective has multiple local optima – Local optimum: locally convex – Each local optimum might be differently conditioned → we need a highly non-stationary, non-isotropic converance function • “Between” the local optima, the function is smooth → standard squared exponential kernel • The Mixed-global-local kernel k q ( x, x ′ ) , x, x ′ ∈ U i ,     ∈ U i , x ′ / k MGL ( x, x ′ ) = k s ( x, x ′ ) , x / ∈ U j   0 , else  for any i, j k q ( x, x ′ ) = ( x T x ′ + 1) 2 14/20

Finding convex neighborhoods • Data set D = { ( x i , y i ) } • U ⊂ D is a convex neighborhood if � 2 � ( β 0 + β T x k + 1 � 2 x T { β ∗ 0 , β ∗ , B ∗ } = argmin k Bx k ) − y k β 0 ,β,B k : x k ∈ U has a positive definite Hessian B ∗ 15/20

A heuristic to decrease length-scale • The SE-part still has a length-scale hyperparameter l • In each iteration we consider to decrease to ˜ l t < l t − 1 α ∗ (˜ l t ) α ∗ ( l ) = min α r,t := α ∗ ( l t − 1 ) , x α ( x ; l ) for any acquisition function α ( x ; l ) • Accept smaller lengthscale only if α r,t ≥ h (e.g., h ≈ 2 ) • Robust to non-stationary objectives Counter example function 1 Correlation adaption: Counter example 3 0.5 2 0 0.5 1 Median log10 IR -0.5 0 0 y -1 -1 -2 -1.5 LOO-CV -0.5 -3 Alpha Ratio -2 Optimal -4 -1 -2.5 -1 -0.5 0 0.5 1 5 10 15 20 25 30 x Iteration 16/20

Mixed-global-local kernel + alpha ratio Quadratic 2D Rosenbrock Branin-Hoo Hartmann 3D 5 5 0 1 -1 0 Median log10 IR Median log10 IR Median log10 IR Median log10 IR 0 0 -2 -1 -5 -3 -2 -5 -10 -4 -3 -15 -10 -5 -4 2 4 6 8 10 12 2 4 6 8 10 12 5 10 15 5 10 15 20 25 Iteration Iteration Iteration Iteration Hartmann 6D Exponential 3D Exponential 4D Exponential 5D 0.5 1 0 0 0 0 Median log10 IR Median log10 IR Median log10 IR Median log10 IR -0.5 -1 -0.5 -1 -1 PES -1 -2 -2 IMGPO -1.5 EI -1.5 -3 EI AR+MGL -3 -2 -4 -2 10 20 30 40 50 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 Iteration Iteration Iteration Iteration • PES: Bayesian integration over hyper parameters • IMGPO: Bayesian update for hyperparameters in each iteration 17/20

...work with Kim Wabersich 18/20

Conclusions • Solid optimization methods are the savior of robotics! • Rethink the priors we use for BayesOpt – Local optima with varying conditioning • Rethink the objective for choosing hyper parameters – Maximize optimization progress ( ∼ expected acquisition) rather than data likelihood 19/20

Thanks • for your attention! • to the students: – Peter Englert (BayesOpt for Manipulation) – Jens Schreiter (Safe Active Learning) – Danny Drieß(BayesOpt for Controller Optimization) – Kim Wabersich (Mixed-global-local kernel & alpha ratio) • and my lab: 20/20

Applications of Constrained BayesOpt in Robotics and Rethinking - PowerPoint PPT Presentation

Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc Toussaint Machine Learning & Robotics Lab University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de NIPS BayesOpt, Dec 2016 1/20

BayesOpt: Extensions and applications Javier Gonz alez Masterclass, 7-February, 2107

BayesOpt: hot topics and current challenges Javier Gonz alez Masterclass, 7-February, 2107

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

ROBOTICS ROBOTICS A brief history A brief history Basilio Bona ROBOTICA 03CFIOR 1 Outline

Scaling Bayesian Optimization in High Dimensions Stefanie Jegelka, MIT BayesOpt Workshop 2017

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

Bilinear Cryptanalysis of Multivariate Schemes Pierre-Alain FOUQUE Crypto Team cole normale

AI438-006 Study Investigational Fostemsavir (BMS-663068) Dose-Ranging Study AI438-011: Results

Indecomposable extensions of separable Banach spaces Richard Haydon Aleksander Peczy nski

FAIR construction status April 2018 APPA: Atomic & Plasma Physics & Applications

Pressure Vessel Design (PVD) for HPgTPC Prashant Kumar, Vikas Teotia, Sanjay Malhotra Bhabha Atomic

First Quarter 2015 Investor Call M. Terry Turner, President and CEO Harold R. Carpenter, EVP and

13 October 2020 Content 1. What does the future hold? 2. Planning with the NRB in Wills - Stuart

A Bootstrapping story (Bootstrap: starting a company with little capital) We made 500 of each

Applications of Constrained BayesOpt in Robotics and Rethinking - PowerPoint PPT Presentation

Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc Toussaint Machine Learning & Robotics Lab University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de NIPS BayesOpt, Dec 2016 1/20

BayesOpt: Extensions and applications Javier Gonz alez Masterclass, 7-February, 2107

BayesOpt: hot topics and current challenges Javier Gonz alez Masterclass, 7-February, 2107

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Sensors for Robotics

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

ROBOTICS ROBOTICS A brief history A brief history Basilio Bona ROBOTICA 03CFIOR 1 Outline

Scaling Bayesian Optimization in High Dimensions Stefanie Jegelka, MIT BayesOpt Workshop 2017

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

Bilinear Cryptanalysis of Multivariate Schemes Pierre-Alain FOUQUE Crypto Team cole normale

AI438-006 Study Investigational Fostemsavir (BMS-663068) Dose-Ranging Study AI438-011: Results

Indecomposable extensions of separable Banach spaces Richard Haydon Aleksander Peczy nski

FAIR construction status April 2018 APPA: Atomic &amp; Plasma Physics &amp; Applications

Pressure Vessel Design (PVD) for HPgTPC Prashant Kumar, Vikas Teotia, Sanjay Malhotra Bhabha Atomic

First Quarter 2015 Investor Call M. Terry Turner, President and CEO Harold R. Carpenter, EVP and

13 October 2020 Content 1. What does the future hold? 2. Planning with the NRB in Wills - Stuart

A Bootstrapping story (Bootstrap: starting a company with little capital) We made 500 of each

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics

FAIR construction status April 2018 APPA: Atomic & Plasma Physics & Applications