Control Regularization for Reduced Variance Reinforcement Learning - PowerPoint PPT Presentation

Control Regularization for Reduced Variance Reinforcement Learning Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri Yisong Yue, Joel W. Burdick

Reinforcement Learning Reinforcement learning (RL) studies how to use data from interactions with the environment to learn an optimal policy: 𝜌 𝜄 𝑏 𝑡 : 𝑇 × 𝐵 → 0,1 Policy: ∞ 𝛿 𝑢 𝑠 𝑡 𝑢 , 𝑏 𝑢 Reward max 𝐾(𝜄) = max 𝔽 𝜐~𝜌 𝜄 ෍ 𝜄 𝜄 Optimization: 𝑢 𝜐: 𝑡 𝑢 , 𝑏 𝑢 , … , 𝑡 𝑢+𝑂 , 𝑏 𝑢+𝑂 Policy gradient-based optimization with no prior information: Figure from Sergey Levine Williams, 1992; Sutton et al. 1999 Baxter and Bartlett, 2000 Greensmith et al. 2004

Variance in Reinforcement Learning RL methods suffer from high variance in learning (Islam et al. 2017; Henderson et al. 2018) Inverted pendulum 10 random seeds Allows us to optimize policy with no prior information (only sampled trajectories from interactions) Figure from Alex Irpan Greensmith et al. 2004, Zhao et al. 2012 Zhao et al. 2015; Thodoroff et al. 2018

Variance in Reinforcement Learning RL methods suffer from high variance in learning (Islam et al. 2017; Henderson et al. 2018) Inverted pendulum 10 random seeds Allows us to optimize policy with no prior information (only sampled trajectories from interactions) Figure from Alex Irpan However, is this necessary or even desirable? Cartpole 𝑡 𝑢+1 ≈ 𝑔 𝑡 𝑢 + 𝑕 𝑡 𝑢 𝑏 𝑢 Nominal controller is stable but based on: 𝒃 = 𝒗 𝒒𝒔𝒋𝒑𝒔 (𝒕) • Error prone model 𝑀𝑅𝑆 Controller • Linearized dynamics Figure from Kris Hauser Greensmith et al. 2004, Zhao et al. 2012 Zhao et al. 2015; Thodoroff et al. 2018

Regularization with a Control Prior 𝜇 is a regularization parameter weighting Combine control prior, 𝑣 𝑞𝑠𝑗𝑝𝑠 (𝑡) , the prior vs. the with learned controller, 𝑣 𝜄 𝑙 𝑡 , learned controller sampled from 𝜌 𝜄 𝑙 𝑏 𝑡 𝜌 𝜄 𝑙 learned in same manner with samples drawn from new distribution (e.g. ) which can be equivalently expressed as the constrained Under the assumption of Gaussian exploration noise (i.e. 𝜌 𝜄 𝑏 𝑡 has Gaussian distribution): optimization problem, Johannink et al. 2018; Silver et al. 2019

Interpretation of the Prior Theorem 1. Using the mixed policy above, variance from 1 each policy gradient step is reduced by factor 1+𝜇 2 . However, this may introduce bias into the policy where represents the total variation distance between two policies.

Interpretation of the Prior Theorem 1. Using the mixed policy above, variance from 1 each policy gradient step is reduced by factor 1+𝜇 2 . However, this may introduce bias into the policy Strong regularization: The control prior heavily constrains exploration. Stabilize to the red trajectory, but miss green one. Weak regularization: Greater room for exploration, but where represents the total variation distance may not stabilize around red between two policies. trajectory.

Stability Properties from the Prior Regularization allows us to “capture” stability properties from a robust control prior Theorem 2. Assume a stabilizing ℋ ∞ control prior within the set 𝒟 for the dynamical system (14). Then asymptotic stability and forward invariance of the set 𝒯 𝑡𝑢 ⊆ 𝒟 is guaranteed under the regularized policy for all 𝑡 ∈ 𝒟 . Cartpole With a robust control prior, the regularized controller always remains near the equilibrium point, even during learning

Results Data gathered from chain of cars following each other. Goal is to optimize fuel- efficiency of the middle car. Goal is to minimize laptime of simulated racecar Control Regularization helps by providing: See Poster for similar results on CartPole domain • Reduced variance • Higher rewards • Faster learning Code at: https://github.com/rcheng805/CORE-RL • Potential safety guarantees Poster Number: 42 However, high regularization also leads to potential bias

Control Regularization for Reduced Variance Reinforcement Learning - PowerPoint PPT Presentation

Control Regularization for Reduced Variance Reinforcement Learning Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri Yisong Yue, Joel W. Burdick Reinforcement Learning Reinforcement learning (RL) studies how to use data from

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Analysis of variance and regression December 4, 2007 Variance component models Variance

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Some Immediately Noticeable Benefits of using Polytron Reduced temperature n Reduced vibrations n

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Neural Networks: What can a network represent Deep Learning, Fall 2020 1 Recap : Neural

Model order reduction for PDE constrained optimization in vibrations Karl Meerbergen (Joint work

Slide Reduction, RevisitedFilling the Gaps in Lattice SVP Approximation Jianwei Li ISG, RHUL,

Sustainable Transportation Advisory Council Meeting #1 Thursday, March 5, 2020 Co-chairs welcome

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

+ Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast

Efficient Stream Reduction on the GPU Efficient Stream Reduction on the GPU David Roger, Ulf

On Reducing Maximum Independent Set to Minimum Satis fiabili ty Ale x e y Ig n a t ie v , A

Control Regularization for Reduced Variance Reinforcement Learning - PowerPoint PPT Presentation

Control Regularization for Reduced Variance Reinforcement Learning Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri Yisong Yue, Joel W. Burdick Reinforcement Learning Reinforcement learning (RL) studies how to use data from

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Analysis of variance and regression December 4, 2007 Variance component models Variance

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Some Immediately Noticeable Benefits of using Polytron Reduced temperature n Reduced vibrations n

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Neural Networks: What can a network represent Deep Learning, Fall 2020 1 Recap : Neural

Model order reduction for PDE constrained optimization in vibrations Karl Meerbergen (Joint work

Slide Reduction, RevisitedFilling the Gaps in Lattice SVP Approximation Jianwei Li ISG, RHUL,

Sustainable Transportation Advisory Council Meeting #1 Thursday, March 5, 2020 Co-chairs welcome

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

+ Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast

Efficient Stream Reduction on the GPU Efficient Stream Reduction on the GPU David Roger, Ulf

On Reducing Maximum Independent Set to Minimum Satis fiabili ty Ale x e y Ig n a t ie v , A

Regularization Overview Regularization Overview Problems & Multicollinearity We will