PILCO: A Model-Based and Data-Efficient Approach to Policy Search - PowerPoint PPT Presentation

PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016

PILCO Graphical Model PILCO – Probabilistic Inference for Learning COntrol Latent states { X t } evolve through time based on previous states and controls Policy π maps Z t , a noisy observation of X t , into a control, U t CSC2541 November 4, 2016 2/ 19

PILCO Objective Transitions follow dynamic system x t = f ( x t − 1 , u t − 1 ) where x ∈ R D , u ∈ R F and f is a latent function. Let π be parameterized by θ and u t = π ( x t , θ ) . The objective is to find π that minimizes expected cost of following π for T steps Cost function encodes information about a target state, e.g., c ( x ) = 1 − exp( −� x − x target � 2 /σ 2 c ) CSC2541 November 4, 2016 3/ 19

Algorithm CSC2541 November 4, 2016 4/ 19

Dynamics Model Learning Multiple plausible function approximators of f CSC2541 November 4, 2016 6/ 19

Dynamics Model Learning Define a Gaussian process (GP) prior on the latent dynamic function f CSC2541 November 4, 2016 7/ 19

Dynamics Model Learning x � [ x T u T ] T and the Let the prior of f be GP (0 , k (˜ x, ˜ x ′ )) where ˜ squared exponential kernel is given by CSC2541 November 4, 2016 8/ 19

Dynamics Model Learning Let ∆ t = x t − x t − 1 + ε where ε ∼ N (0 , Σ ε ) and Σ ε = diag([ σ ε 1 , . . . , σ ε D ]) . The GP yields one-step predictions (see Section 2.2 in reference 3) Given n training inputs ˜ X = [˜ x 1 , . . . , ˜ x n ] and corresponding training targets y = [∆ 1 , . . . , ∆ n ] , the posterior GP hyper-parameters are learned by evidence maximization (type 2 maximum likelihood). CSC2541 November 4, 2016 9/ 19

Policy Evaluation In evaluating objective J π ( θ ) , we must calculate p ( x t ) since We have x t = x t − 1 + ∆ t − ε , where in general, computing p (∆ t ) is analytically intractable. Instead, p (∆ t ) is approximated with a Gaussian via moment matching. CSC2541 November 4, 2016 11/ 19

Moment Matching Input distribution p ( x t − 1 , u t 1 ) is assumed Gaussian When propagated through the GP model, we obtain p (∆ t ) p (∆ t ) is approximated by a Gaussian via moment matching CSC2541 November 4, 2016 12/ 19

Moment Matching p ( x t ) can now be approximated with N ( µ t , Σ t ) where µ ∆ and Σ ∆ are computed exactly via iterated expectation and variance CSC2541 November 4, 2016 13/ 19

Analytic Gradient for Policy Improvement Let E t = E x t [ c ( x t )] so that J π ( θ ) = � T t =1 E t . E t depends on θ through p ( x t ) , which depends on θ through p ( x t − 1 ) , which depends on θ through µ t and Σ t , . . . , which depends on θ based on µ u and Σ u , where u t = π ( x t , θ ) . Chain rule is used to calculate derivatives Analytic gradients allow for gradient-based non-convex optimization methods, e.g., CG or L-BFGS CSC2541 November 4, 2016 15/ 19

Data-Efficiency CSC2541 November 4, 2016 16/ 19

Advantages and Disadvantages Advantages Data-efficient Incorporates model-uncertainty into long-term planning Does not rely on expert knowledge, i.e., demonstrations, or task-specific prior knowledge. Disadvantages Not an optimal control method. If p ( X i ) do not cover the target region and σ c induces a cost that is very peaked around the target solution, PILCO gets stuck in a local optimum because of zero gradients. Learned dynamics models are only confident in areas of the state space previously observed. Does not take temporal correlation into account. Model uncertainty treated as uncorrelated noise CSC2541 November 4, 2016 17/ 19

Extension: PILCO with Bayesian Filtering R. McAllister and C. Rasmussen, “Data-Efficient Reinforcement Learning in Coninuous-State POMDPs.” https://arxiv.org/abs/1602.02523 CSC2541 November 4, 2016 18/ 19

References 1 M.P. Deisenroth and C.E. Rasmussen, “PILCO: A Model-Based and Data-Efficient Approach to Policy Search” in Proceedings of the 28th International Conference on Machine Learning , Bellevue, WA, USA, 2011. 2 R. McAllister and C. Rasmussen, “Data-Efficient Reinforcement Learning in Coninuous-State POMDPs.” https://arxiv.org/abs/1602.02523 3 C.E. Rasmussen and C.K.I. Williams (2006) Gaussian Processes for Machine Learning . MIT Press. www.gaussianprocess.org/gpml/chapters 4 C.M. Bishop (2006). Pattern Recognition and Machine Learning Chapter 6.4. Springer. ISBN 0-387-31073-8. CSC2541 November 4, 2016 19/ 19

PILCO: A Model-Based and Data-Efficient Approach to Policy Search - PowerPoint PPT Presentation

PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol Latent states { X t } evolve through

PILCO: A Model-Based and Data-Efficient Approach to Policy Search Marc Peter Deisenroth, Carl

Outcome Based Approach in Outcome Based Approach in Outcome Based Approach in Outcome Based

Efficient signal processing using Haskell and LLVM Henning Thielemann 2016-09-15 Efficient

The Model The Model Water Efficient Water Efficient Landscape Ordinance Landscape Ordinance

CS 287 Lecture 20 (Fall 2019) Model-based RL Pieter Abbeel UC Berkeley EECS Outline n

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Efficient Scientific Data Efficient Scientific Data Management on Supercomputers Management on

CGE model development (1) CGE model development (1) Concept of CGE model and Concept of CGE

a model approach Fielded Systems Panel Presentation September 2011 Model Based Approach

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long

Relational Model of Data Thomas Schwarz, SJ Data Model Notation for describing data 1.

Model-Free Methods Model-Free Methods Model-based: use all branches S 2 A 1 S 3 R=2 A 2 S 2 S 1

Efficient Algorithms for Online Decision Problems Dave Buchfuhrer January 15, 2009 The Model

Scien&fic Data Model Han-Wei Shen The Ohio State University What is a Data Model? How do

Energy Efficient Mortgages Initiative Energy efficient Mortgages Action Plan (EeMAP) Energy

CONFORMATION OF THE INTERDISCIPLINARY TEAM "KILLALAB" AS A TOOL FOR ASTROBIOLOGY

Non-parametric regression model Distribution over functions

3 D Programming D Programming What about solid shapes? glutSolidSphere

CS 4204 Computer Graphics Final Exam Preview Virginia Tech Yong Cao Final Exam (90%)

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

the GP Language Philip Cavanagh Graph Isomorphisms Isomorphism: Two graphs which are the

Mezurit 2: Virtual instrumentation for electronics experiments Dr. Brian Standley FOSDEM 2 Feb

Los Alamos Irradiation Readiness Instrumentation Meeting - 29 Jan 2016 Timon Heim - LBL

PILCO: A Model-Based and Data-Efficient Approach to Policy Search - PowerPoint PPT Presentation

PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol Latent states { X t } evolve through

PILCO: A Model-Based and Data-Efficient Approach to Policy Search Marc Peter Deisenroth, Carl

Outcome Based Approach in Outcome Based Approach in Outcome Based Approach in Outcome Based

Efficient signal processing using Haskell and LLVM Henning Thielemann 2016-09-15 Efficient

The Model The Model Water Efficient Water Efficient Landscape Ordinance Landscape Ordinance

CS 287 Lecture 20 (Fall 2019) Model-based RL Pieter Abbeel UC Berkeley EECS Outline n

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Efficient Scientific Data Efficient Scientific Data Management on Supercomputers Management on

CGE model development (1) CGE model development (1) Concept of CGE model and Concept of CGE

a model approach Fielded Systems Panel Presentation September 2011 Model Based Approach

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long

Relational Model of Data Thomas Schwarz, SJ Data Model Notation for describing data 1.

Model-Free Methods Model-Free Methods Model-based: use all branches S 2 A 1 S 3 R=2 A 2 S 2 S 1

Efficient Algorithms for Online Decision Problems Dave Buchfuhrer January 15, 2009 The Model

Scien&amp;fic Data Model Han-Wei Shen The Ohio State University What is a Data Model? How do

Energy Efficient Mortgages Initiative Energy efficient Mortgages Action Plan (EeMAP) Energy

CONFORMATION OF THE INTERDISCIPLINARY TEAM &quot;KILLALAB&quot; AS A TOOL FOR ASTROBIOLOGY

Non-parametric regression model Distribution over functions

3 D Programming D Programming What about solid shapes? glutSolidSphere

CS 4204 Computer Graphics Final Exam Preview Virginia Tech Yong Cao Final Exam (90%)

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

the GP Language Philip Cavanagh Graph Isomorphisms Isomorphism: Two graphs which are the

Mezurit 2: Virtual instrumentation for electronics experiments Dr. Brian Standley FOSDEM 2 Feb

Los Alamos Irradiation Readiness Instrumentation Meeting - 29 Jan 2016 Timon Heim - LBL

Scien&fic Data Model Han-Wei Shen The Ohio State University What is a Data Model? How do

CONFORMATION OF THE INTERDISCIPLINARY TEAM "KILLALAB" AS A TOOL FOR ASTROBIOLOGY