Robotics Part II: From Learning Model-based Control to Model-free - PowerPoint PPT Presentation

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max - Planck - Institute for Inte lm igent Systems Tübingen, Germany & Computer Science, Neuroscience, & Biomedical Engineering University of Southern California, Los Angeles sschaal@is.mpg.de http://www-amd.is.tuebingen.mpg.de

Where Did We Stop ...

Outline • A Bit of Robotics History • Foundations of Control • Adaptive Control • Learning Control - Model-based Robot Learning - Reinforcement Learning

What Needs to Be Learned in Learning Control? Internal Models Coordinate The Majority of the Learning Transformations Problems Involve Function Approximation Control Policies Value Functions Unsupervised Learning & Classification

Learning Internal Models • Forward Models - models the causal functional relationship ( ) y = f x - for example: ( ) ( ) τ − C q, ! ( ) ! ( ) q = B − 1 q q − G q !! q • Inverse Models - models the inverse of the causal functional relationship ( ) x = f − 1 y - for example: ( ) !! ( ) ! ( ) = τ q + C q, ! q + G q B q q - NOTE: inverse models are not necessarily functions any more!

Inverse Models May Not Be Trivially Learnable

Inverse Models May Not Be Trivially Learnable ( ) t = f θ 1 1 , θ 2 1 ( ) t = f θ 1 2 , θ 2 2 ( ) ? what is f − 1 t

Characteristics of Function Approximation in Robotics • Incremental Learning – large amounts of data – continual learning – to be approximated functions of growing and unknown complexity • Fast Learning – data efficient – computationally efficient – real-time • Robust Learning – minimal interference – hundreds of inputs

Linear Regression: One of the Simplest Function Approximation Methods ( ) = θ x Recall the simple adaptive control model with: f x - find the line through all data points - imagine a spring y attached between the line and each data point - all springs have the same spring constant - points far away generate more “force” (danger of outliers) - springs are vertical - solution is the minimum energy solution achieved by the springs x

Linear Regression: One of the Simplest Function Approximation Methods • The data generating model: w T ! y = ! x + w 0 + ε = w T x + ε ⎡ ⎤ ! T , w = w { } = 0 where x = x T ,1 ⎡ ⎤ , E ε ⎢ ⎥ ⎣ ⎦ w 0 ⎢ ⎥ ⎣ ⎦ • The Least Squares cost function J = 1 ) = 1 T t − y T t − Xw ( ) ( ( ) ( ) 2 t − y 2 t − Xw ⎡ ⎤ ⎡ ⎤ T x 1 t 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ T t 2 x 2 where : t = X = , ⎢ ⎥ ⎢ ⎥ … … ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ t n T x n ⎣ ⎦ ⎣ ⎦ • Minimizing the cost gives   ∂ w = 0 = ∂ J ∂ J ⎛ ⎞ 1 T t − Xw T X ( ) ( ) ( ) the least-square solution 2 t − Xw ⎟ = − t − Xw ⎜ ⎝ ⎠ ∂ w T X = − t T X + w T X T X ( ) = − t T X + Xw t T X = w T X T X X T t = X T Xw thus : or ( ) − 1 X T t w = X T X result :

Recursive Least Squares: An Incremental Version of Linear Regression • Based on the matrix inversion theorem: − 1 = A − 1 + A − 1 B I + CA − 1 B ( ) ( ) − 1 CA − 1 A − BC • Incremental updating of a linear regression model ( ) P n = I 1 − 1 where γ << 1 (note P ≡ X T X Initialize: γ ( ) For every new data point x , t (note that x includes the bias): ⎧ ⎛ ⎞ ⎪ λ P n − P n xx T P n 1 if no forgetting P n + 1 = 1 ⎟ where λ = ⎨ ⎜ ⎝ λ + x T P n x ⎠ < 1 if forgetting ⎪ ⎩ ( ) W n + 1 = W n + P n + 1 x t − W nT x T - NOTE: RLS gives exactly the same solution as linear regression if no forgetting

Making Linear Regression Nonlinear: Locally Weighted Regression Region of Validity 2 θ k Linear Receptive Field Model Activation w 1 0 N ( ) ∑ 2 J = w i y i − x i T β i = 1 Note: Using GPs, SVR, Mixture Models, etc., are other ways to nonlinear regression

Locally Weighted Regression • Piecewise linear function approximation, • Each local model is learned from only local data • No over - fitting due to too many local models ( unlike RBFs, ME )

Locally Weighted Regression Recursive weighted least squares: Linear Model: ( ) n + 1 = β k n + w P learned with T β k n + 1 x y − ! x T β k n k ⎛ ⎞ n ! x ! ⎜ ⎟ x T P n n + 1 = 1 n − P k k P λ P ⎜ ⎟ λ k k w + ! n ! ⎜ x T P ⎟ x [ ] T ˜ T 1 ⎝ ⎠ T y = β x T x + β 0 = β x = x k ˜ x where Gradient descent in penalized leave-one-out Weighting Kernel: learned with local cross-validation (PRESS) cost function: n − α ∂ J n + 1 = M k M k ∂ M ⎛ ⎞ w = exp − 1 T D x − c D = M T M ( ) ( ) 2 x − c where ⎝ ⎠ N n 1 ∑ ∑ 2 J = w k , i y i − ˆ + γ 2 y k , i , − i D k , ij N ∑ i = 1 i = 1, j = 1 w k , i K ∑ i = 1 w i y k Combined ( ) < w gen y = i = 1 add model when if min w k Prediction: K ∑ k w i createnew RF at c K + 1 = x i = 1

Locally Weighted Regression 1.5 1.5 1 1 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0.5 ⊕ ⊕ 0.5 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0 0 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ -0.5 ⊕ ⊕ -0.5 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ -1 -1 ( ) -1.5 -1.5 2 + y ( ) ( ) ,exp − 50 y ( ) ,1.25exp − 5 x ( ) z = max exp − 10 x 2 2 2 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 x x

Locally Weighted Regression Inserted into Adaptive Control

Locally Weighted Regression Learn forward model of task dynamics, then computer controller

Criticism of Locally Weighted Learning • Breaks down in high-dimensional spaces • Computationally expensive and numerically brittle due to (incremental) dxd matrix inversion • Not compatible with modern probabilistic statistical learning algorithms • Too many “manual tuning parameters”

The Curse of Dimensionality • The power of local learning comes from exploiting the discriminative power of local neighborhood relations. • But the notion of a “local” breaks down in high dim. spaces!

The Curse of Dimensionality Movement Data is Locally Low Dimensional 0.25 / / 0.2 Probability 0.15 Thus, locally weighted learning can work if used with local dimensionality reduction! 0.1 0.05 0 / / 105 1 11 21 31 41 105 Dimensionality Derived with Bayesian Factor Analysis

A Bayesian Approach to Locally Weighted Learning • Linear Regression as a Graphical Model y i = x i T β + ε ( ) ε ∼ N 0, ψ y ( ) − 1 Xy β = X T X

A Bayesian Approach to Locally Weighted Learning • Inserting a Partial-Least-Squares-like projection as a set of hidden variables z i , m = x i , m β j + η m d ∑ y i = + ε z i , m m = 1 ( ) ε ∼ N 0, ψ y ( ) η m ∼ N 0, ψ z , m

A Bayesian Approach to Locally Weighted Learning • Robust linear regression with automatic relevance detection (ARD, sparsification) z i , m = x i , m β j + η m d ∑ y i = + ε z i , m m = 1 ( ) ε ∼ N 0, ψ y ( ) η m ∼ N 0, ψ z , m ⎛ ⎞ β m ∼ N 0, 1 ⎜ ⎟ α m ⎝ ⎠ ( ) α m ∼ Gamma a α , b α ,

A Full Bayesian Treatment of Locally Weighted Learning • The final model for full Bayesian parameter adaptation for regression and locality i = 1,.., N ψ y y i ψ z 2 ψ z 1 ψ zd z id b 1 z i 2 b d z i 1 b 2 … x i 1 x i 2 x id w i 1 w id w i 2 h 1 h 2 h d

Locally Weighted Learning In High Dimensional Spaces • Learning the “cross” function in 20-dimensional space 1.5 1.5 1 1 0.5 0.5 TextEnd TextEnd z z 0 0 -0.5 -0.5 1 1 0.5 1 0.5 1 0.5 0.5 0 0 0 0 -0.5 -0.5 -0.5 -0.5 -1 -1 y -1 y -1 x x

Locally Weighted Learning In High Dimensional Spaces • Learning the “cross” function in 20-dimensional space #Receptive Fields / Average #Projections 0.14 70 0.12 60 nMSE on Test Set 0.1 50 0.08 40 2D-cross 10D-cross 0.06 30 20D-cross 0.04 20 0.02 10 0 0 1000 10000 100000 #Training Data Points

Locally Weighted Learning In High Dimensional Spaces • Learning inverse kinematics in 60 dimensional space

Locally Weighted Learning In High Dimensional Spaces • Skill learning

Outline • A Bit of Robotics History • Foundations of Control • Adaptive Control • Learning Control - Model-based Robot Learning - Reinforcement Learning

Given: A Parameterized Policy and a Controller Note: we are now starting to address planning,   i.e,. where do desired trajectories come from?

Trial & Error Learning Reinforcement Learning from Trajectories • Problem: – How can a motor system learn a novel motor skill? – Reinforcement learning is a general approach to this problem, but little work has been done to scale to the high- dimensional continuous state- action domains of humans • Approach: – Teach with imitation learning the initial skill using a parameterized control policy – Provide an objective function for the skill – Perform trial-and-error learning from exploratory trajectories

Robotics Part II: From Learning Model-based Control to Model-free - PowerPoint PPT Presentation

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max - Planck - Institute for Inte lm igent Systems Tbingen, Germany & Computer Science, Neuroscience, & Biomedical Engineering

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Supervised Learning Part 2/3 Kai Arras Social Robotics Lab, University

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

ROBOTICS ROBOTICS A brief history A brief history Basilio Bona ROBOTICA 03CFIOR 1 Outline

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Mobile & Service Robotics

Predicting octane content of gasoline using Near Infrared Spectra Data from: Kalivas, John H.,

Eta Meson Production in Proton-Deuteron Collisions MESON 2014: 13 th International Workshop on

CSEP505: Programming Languages Lecture 5: continuations, types Dan Grossman Spring 2006

Question 1 1. How is the sub-project going to align the US-CMS deliverables with the

Statistical online learning of large-scale imaging-genetics data Data Science Meetup Nice -

Analysis of Distributed Learning Algorithms Ding-Xuan Zhou City University of Hong Kong E-mail:

tt sst PP

Least squares optimal identification of LTI dynamical systems Bart De Moor KU Leuven Dept.EE:

Robotics Part II: From Learning Model-based Control to Model-free - PowerPoint PPT Presentation

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max - Planck - Institute for Inte lm igent Systems Tbingen, Germany & Computer Science, Neuroscience, & Biomedical Engineering

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Sensors for Robotics

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Supervised Learning Part 2/3 Kai Arras Social Robotics Lab, University

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

ROBOTICS ROBOTICS A brief history A brief history Basilio Bona ROBOTICA 03CFIOR 1 Outline

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Mobile &amp; Service Robotics

Predicting octane content of gasoline using Near Infrared Spectra Data from: Kalivas, John H.,

Eta Meson Production in Proton-Deuteron Collisions MESON 2014: 13 th International Workshop on

CSEP505: Programming Languages Lecture 5: continuations, types Dan Grossman Spring 2006

Question 1 1. How is the sub-project going to align the US-CMS deliverables with the

Statistical online learning of large-scale imaging-genetics data Data Science Meetup Nice -

Analysis of Distributed Learning Algorithms Ding-Xuan Zhou City University of Hong Kong E-mail:

tt sst PP

Least squares optimal identification of LTI dynamical systems Bart De Moor KU Leuven Dept.EE:

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Mobile & Service Robotics