From trajectory optimization to inverse KKT and sequential - PowerPoint PPT Presentation

From trajectory optimization to inverse KKT and sequential manipulation Marc Toussaint Machine Learning & Robotics Lab – University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de Zurich, July 2016 1/51

• Motivation: – Combined Task and Motion Planning – Learning Sequential Manipulation from Demonstration • Approach: Optimization • Outline (1) k -order Markov Path Optimization (KOMO) (2) Learning from demonstration – Inverse KKT (3) Cooperative Manipulation Learning (4) Logic-Geometric Programming 2/51

(1) k -order Markov Path Optimization (KOMO) • Actually, there is nothing “novel” about this, except for the specific choice of conventions. Just Newton ( ∼ 1700). Still, it generalizes CHOMP and many others... 3/51

Conventional Formulation • Given a time discrete controlled system x t +1 = f ( x t , u t ) , minimize T � min c t ( x t , u t ) s.t. x t +1 = f ( x t , u t ) t =1 – Indirect methods: optimize over u 0: T -1 → shooting to recover x 1: T – Direct methods: optimize over x 1: T subject to existence of u t • Standard approaches – Differential Dynamic Programming, iLQG, Approximate Inference Control – Newton steps, Gauss-Newton steps – SQP 4/51

fi KOMO formulation • We represent x t in configuration space. → We have k -order Markov dynamics x t = f ( x t − k : t -1 , u t -1 ) 5/51

KOMO formulation • We represent x t in configuration space. → We have k -order Markov dynamics x t = f ( x t − k : t -1 , u t -1 ) • k -order Motion Optimization (KOMO) T � ∀ T min f t ( x t − k : t ) s.t. t =1 : g t ( x t − k : t ) ≤ 0 , h t ( x t − k : t ) = 0 x t =1 for a path x ∈ R T × n , prefix x k -1:0 , smooth scalar functions f t , smooth vector functions g t and h t . pre fi x 5/51

KOMO formulation – The path costs are typically sum-of-squares, e.g., | M ( x t + x t -2 − 2 x t -1 ) /τ 2 + F | | 2 f t ( x t − k : t ) = | H . – The equality constraints typically represent non-holonomic/non-trivial dynamics, and hard task constraints, e.g., h T ( x T ) = φ ( x T ) − y ∗ t . – The inequality constraints typically represent collisions & limits. 6/51

The structure of the Hessian • The Hessian in the inner loop of a constrained solver will contain terms ⊤ , ∇ 2 f ( x ) , � � ⊤ ∇ h j ( x ) ∇ h j ( x ) ∇ g i ( x ) ∇ g i ( x ) j i • The efficiency of optimization hinges on whether we can efficiently compute Newton steps with such Hessians! 7/51

The structure of the Hessian • The Hessian in the inner loop of a constrained solver will contain terms ⊤ , ∇ 2 f ( x ) , � � ⊤ ∇ h j ( x ) ∇ h j ( x ) ∇ g i ( x ) ∇ g i ( x ) j i • The efficiency of optimization hinges on whether we can efficiently compute Newton steps with such Hessians! • Properties: ⊤ J ( x ) is banded symmetric with width 2( k +1) n − 1 . – The matrix J ( x ) – The Hessian ∇ 2 f ( x ) is banded symmetric with width 2( k +1) n − 1 . – The complexity of computing Newton steps is O ( Tk 2 n 3 ) . – Computing a (Gauss-)Newton step in O ( T ) is “equivalent” to a DDP (Riccati) sweep.   f t ( x t − k : t )   ∆ φ t ( x t − k : t ) =  g t ( x t − k : t )        h t ( x t − k : t )   φ ( x ) = � T t =1 φ t ( x t − k : t ) J ( x ) = ∂φ ( x ) 7/51 ∂x

Augmented Lagrangian • Define the Augmented Lagrangian h j ( x ) 2 + µ ˆ � � � � [ g i ( x ) > 0] g i ( x ) 2 L ( x ) = f ( x ) + κ j h j ( x ) + λ i g i ( x ) + ν j i j i • Centered updates: κ j ← κ j + 2 νh j ( x ′ ) , λ i ← max( λ i + 2 µg i ( x ′ ) , 0) (Hardly mentioned in the literature; analyzed in..) Toussaint: A Novel Augmented Lagrangian Approach for Inequalities and Convergent Any-Time Non-Central Updates . arXiv:1412.4329, 2014 • In practise: typically, the first iteration dominates computational costs, which is conventional squared penalties → hand-tune scalings of h and g for fast convergence in practise. Later iterations do not change conditioning (!) and make constraints precise. Toussaint: KOMO: Newton methods for k-order Markov Constrained Motion Problems . arXiv:1407.0414, 2014 8/51

Further Comments • Unconstrained KOMO is a factor graph → solvable by standard Graph-SLAM solvers (GTSAM). This outperforms CHOMP , TrajOpt by orders of magnitude. (R:SS’16, Boots et al.) • CHOMP = include only transition costs in the Hessian. Otherwise it’s just Newton. • We can include a large-scale ( > k -order) smoothing objective, equivalent to a Gaussian Process prior over the path, still O ( T ) . • Approximate (fixed Lagrangian) constrained MPC regulator (acMPC) around the path: � t + H -1 | 2 � � | x t + H − x ∗ π t : x t − k : t -1 �→ argmin f s ( x s − k : s ) + J t + H ( x t + H − k : t + H -1 ) + ̺ | t + H | x t : t + H s = t ∀ t + H -1 s.t. : g s ( x s − k : s ) ≤ 0 , h s ( x s − k : s ) = 0 s = t Toussaint—in preparation: A tutorial on Newton methods for constrained trajectory optimization and relations to SLAM, Gaussian Process smoothing, and probabilistic inference . Book chapter 9/51

Nathan’s work • Differential-geometric interpretation. Online MPC. Ratliff, Toussaint, Bohg, Schaal: On the Fundamental Importance of Gauss-Newton in Motion Optimization . arXiv:1605.09296 Ratliff, Toussaint, Schaal: Understanding the geometry of workspace obstacles in motion optimization . ICRA’15 Doerr, Ratliff, Bohg, Toussaint, Schaal: Direct loss minimization inverse optimal control . R: SS’15 10/51

Why care about this? • Actually we care about higher-level behaviors – Sequential Manipulation – Learning/Extracting Manipulation Models from Demonstration – Reinforcement Learning of Manipulation – Cooperative Manipulation (IKEA Assembly) • In all these cases, KOMO became our underlying model of motion – E.g., we parameterize the objectives f , and learn these parameters – E.g., we view sequential manipulation as logic+KOMO 11/51

(2) Learning Manipulation Skills from Single Demonstration ◦ 12/51

Research Questions • The policy (space of possible manipulation) is high-dimensional.. – Learning from a single demonstration and few own trials? • What is the prior? • How to generalize? What are the relevant implicit tasks/objectives? (Inverse Optimal Control) 13/51

Sample-efficient (Manipulation) Skill Learning • Great existing work in policy search – Stochastic search (CMA, PI 2 ), “trust region” optimization (REPS) – Bayesian Optimization – Not many demonstrations on (sequential) manipulation • These methods are good – but on what level do they apply? – Sample-efficient only in low dimensional policies (No Free Lunch) – Can’t we identify more structure in demonstrated manipulations? – Can’t we exploit partial models – e.g. of robot’s own kinematics? (not environment!) 14/51

A more structured Manipulation Learning formulation Engert & Toussaint: Combined Optimization and Reinforcement Learning for Manipulation Skills . R:SS’16 • CORL: – Policy: (controller around a) path x – analytically known cost function f ( x ) in KOMO convention – projection , implicitly given by a constraint h ( x, θ ) = 0 – unknown black-box return function R ( θ ) ∈ R – unknown black-box success constraint S ( θ ) ∈ { 0 , 1 } – Problem: min x,θ f ( x ) − R ( θ ) s.t. h ( x, θ ) = 0 , S ( t ) = 1 • Alternate path optimization min x f ( x ) s.t. h ( x, θ ) = 0 with Bayesian Optimization max θ R ( θ ) s.t. S ( θ ) = 1 15/51

◦ 16/51

Caveat • The projection h , which defines θ , needs to be known! • But is this really very unclear? – θ should capture all aspects we do not know apriori in the KOMO – We assume the robot’s own DOFs kinematics/dynamics, and control costs are known – What is not known is how to interact with the environment – θ captures the interaction parameters: points of contact, amount of rotation/movement of external DOFs 17/51

And Generalization? • The above reinforces a single demonstration • Generalization means to capture/model the underlying task 18/51

Inverse KKT to gain generalization • We take KOMO as the generative assumption of demonstrations T � ∀ T min f t ( x t − k : t ) s.t. t =1 : g t ( x t − k : t ) ≤ 0 , h t ( x t − k : t ) = 0 x t =1 • Problem: – Infer f t from demonstrations – We assume f t = w t ◦ Φ t (weighted features). – Invert the KKT conditions → QP over w ’s Englert & Toussaint: Inverse KKT – Learning Cost Functions of Manipulation Tasks from Demonstrations . ISRR’15 19/51

From trajectory optimization to inverse KKT and sequential - PowerPoint PPT Presentation

From trajectory optimization to inverse KKT and sequential manipulation Marc Toussaint Machine Learning & Robotics Lab University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de Zurich, July 2016 1/51 Motivation:

Moving Object Trajectory Mining Moving Object Trajectory Mining Trajectory decomposition

Lagranto 2.0 Contents An new object - trajectory Tutorial trajectory case study

Inverse KKT - Learning Cost functions of Manipulation from Demonstration Englert, P., Vien, N.

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

Pe Pedestria ian n Tra Trajectory jectory Predi redicti ction on Ov Overv rview

Constrained Nonlinear Optimization Moritz Diehl & S ebastien Gros S. Gros, M. Diehl 1 /

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Optimal Control, LQR, Trajectory Optimization Lecture 13 What will you take home today? Intro

Sensor-based trajectory optimization ABB Robotics Master thesis Martin Biel Supervisor: Mikael

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

Trajectory Inverse Kinematics By Conditional Density Models Chao Qin and Miguel .

Energy-efficient Trajectory Tracking for Mobile Devices Based on "Energy-efficient

Trajectory Code Validation j y Slides 04/12/08 04/12/08 AAE 450 Spring 2008 Trajectory

Trajectory planning Trajectory planning 1 1 Basilio Bona 1 ROBOTICA 03CFIOR

Accelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects

CREATIVITA NELLEMBEDDED @ COMPANY OVERVIEW Experience HR Since 1979 206 people SECO

Semiclassical Limit of Large Fermionic Systems Sren Fournais Department of Mathematics, Aarhus

Dr. Tom Mens g n r i u s a e M n i g Dr. Ahmed Zerouali a L l a c n i h c e

Topology-aware OpenMP Process Scheduling Peter Thoman, Hans Moritsch, and Thomas Fahringer

Parents Day Secondary 3 3 March 2018 Objectives of Todays Session To provide an

Jpp PDF time arrival dependence on direction Jordan Seneca August 20 th 2020, update Jordan

Light and clarity in KM3NeT Jordan Seneca My goal for this presentation Inspire, stimulate

From trajectory optimization to inverse KKT and sequential - PowerPoint PPT Presentation

From trajectory optimization to inverse KKT and sequential manipulation Marc Toussaint Machine Learning & Robotics Lab University of Stuttgart marc.toussaint@informatik.uni-stuttgart.de Zurich, July 2016 1/51 Motivation:

Moving Object Trajectory Mining Moving Object Trajectory Mining Trajectory decomposition

Lagranto 2.0 Contents An new object - trajectory Tutorial trajectory case study

Inverse KKT - Learning Cost functions of Manipulation from Demonstration Englert, P., Vien, N.

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

Pe Pedestria ian n Tra Trajectory jectory Predi redicti ction on Ov Overv rview

Constrained Nonlinear Optimization Moritz Diehl &amp; S ebastien Gros S. Gros, M. Diehl 1 /

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Optimal Control, LQR, Trajectory Optimization Lecture 13 What will you take home today? Intro

Sensor-based trajectory optimization ABB Robotics Master thesis Martin Biel Supervisor: Mikael

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

Trajectory Inverse Kinematics By Conditional Density Models Chao Qin and Miguel .

Energy-efficient Trajectory Tracking for Mobile Devices Based on &quot;Energy-efficient

Trajectory Code Validation j y Slides 04/12/08 04/12/08 AAE 450 Spring 2008 Trajectory

Trajectory planning Trajectory planning 1 1 Basilio Bona 1 ROBOTICA 03CFIOR

Accelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects

CREATIVITA NELLEMBEDDED @ COMPANY OVERVIEW Experience HR Since 1979 206 people SECO

Semiclassical Limit of Large Fermionic Systems Sren Fournais Department of Mathematics, Aarhus

Dr. Tom Mens g n r i u s a e M n i g Dr. Ahmed Zerouali a L l a c n i h c e

Topology-aware OpenMP Process Scheduling Peter Thoman, Hans Moritsch, and Thomas Fahringer

Parents Day Secondary 3 3 March 2018 Objectives of Todays Session To provide an

Jpp PDF time arrival dependence on direction Jordan Seneca August 20 th 2020, update Jordan

Light and clarity in KM3NeT Jordan Seneca My goal for this presentation Inspire, stimulate

Constrained Nonlinear Optimization Moritz Diehl & S ebastien Gros S. Gros, M. Diehl 1 /

Energy-efficient Trajectory Tracking for Mobile Devices Based on "Energy-efficient