Bayesian Methods in Reinforcement Learning Wednesday, June 20th, - PowerPoint PPT Presentation

Bayesian Methods in Reinforcement Learning Wednesday, June 20th, 2007 ICML-07 tutorial Corvallis, Oregon, USA Pascal Poupart (Univ. of Waterloo) Mohammad Ghavamzadeh (Univ. of Alberta) Yaakov Engel (Univ. of Alberta)

Motivation • Why a tutorial on Bayesian Methods for Reinforcement Learning? • Bayesian methods sporadically used in RL • Bayesian RL can be traced back to the 1950’s • Some advantages: – Uncertainty fully captured by probability distribution – Natural optimization of the exploration/exploitation tradeoff – Unifying framework for plain RL, inverse RL, multi- agent RL, imitation learning, active learning, etc. Pascal Poupart ICML-07 Bayeian RL Tutorial

Goal • Add another tool in the toolbox of Reinforcement Learning researchers Thomas Bayes Pascal Poupart ICML-07 Bayeian RL Tutorial

Outline • Intro to RL and Bayesian Learning • History of Bayesian RL • Model-based Bayesian RL – Prior knowledge, policy optimization, discussion, Bayesian approaches for other RL variants • Model-free Bayesian RL – Gaussian process temporal difference, Gaussian process SARSA, Bayesian policy gradient, Bayesian actor-critique algorithms • Demo: control of an octopus arm Pascal Poupart ICML-07 Bayeian RL Tutorial

Common Belief • Reinforcement Learning in AI: – Formalized in the 1980’s by Sutton, Barto and others – Traditional RL algorithms are not Bayesian Bayesian RL is a new approach Wrong! Pascal Poupart ICML-07 Bayeian RL Tutorial

A Bit of History • RL is the problem of controlling a Markov Chain with unknown probabilities. • While the AI community started working on this problem in the 1980’s and called it Reinforcement Learning, the control of Markov chains with unknown probabilities had already been extensively studied in Operations Research since the 1950’s, including Bayesian methods . Pascal Poupart ICML-07 Bayeian RL Tutorial

A Bit of History • Operations Research: Bayesian Reinforcement Learning already studied under the names of – Adaptive control processes [Bellman] – Dual control [Fel’Dbaum] – Optimal learning • 1950’s & 1960’s: Bellman, Fel’Dbaum, Howard and others develop Bayesian techniques to control Markov chains with uncertain probabilities and rewards Pascal Poupart ICML-07 Bayeian RL Tutorial

Bayesian RL Work • Operations Research – Theoretical foundation – Algorithmic solutions for special cases • Bandit problems: Gittins indices – Intractable algorithms for the general case • Artificial Intelligence – Algorithmic advances to improve scalability Pascal Poupart ICML-07 Bayeian RL Tutorial

Artificial Intelligence • (Non-exhaustive list) • Model-based Bayesian RL: Dearden et al. (1999), Strens (2000), Duff (2002, 2003), Mannor et al. (2004, 2007), Madani et al. (2004), Wang et al. (2005), Jaulmes et al. (2005), Poupart et al. (2006), Delage et al. (2007), Wilson et al. (2007). • Model-free Bayesian RL: Dearden et al. (1998); Engel et al. (2003, 2005); Ghavamzadeh et al. (2006, 2007). Pascal Poupart ICML-07 Bayeian RL Tutorial

Model-based Bayesian RL • Markov Decision Process: – X : set of states <x s ,x r > • x s : physical state component Reinforcement • x r : reward component Learning – A : set of actions – p ( x’ | x,a ): transition and reward probabilities • Bayesian Model-based Reinforcement Learning • Encode unknown prob. with random variables θ – i.e., θ xax’ = Pr( x’|x,a ): random variable in [0,1] – i.e., θ xa = Pr(•| x,a ): multinomial distribution Pascal Poupart ICML-07 Bayeian RL Tutorial

Model Learning • Assume prior b ( θ xa ) = Pr( θ xa ) • Learning: use Bayes theorem to compute posterior b xax’ ( θ xa ) = Pr( θ xa |x,a,x’) b xax’ ( θ xa ) = k Pr( θ xa ) Pr( x’|x,a, θ xa ) = k b ( θ xa ) θ xax’ • What is the prior b? • Could we choose b to be in the same class as b xax’ ? Pascal Poupart ICML-07 Bayeian RL Tutorial

Outline • Intro to RL and Bayesian Learning • History of Bayesian RL • Model-based Bayesian RL – Prior knowledge , policy optimization, discussion, Bayesian approaches for other RL variants • Model-free Bayesian RL – Gaussian process temporal difference, Gaussian process SARSA, Bayesian policy gradient, Bayesian actor-critique algorithms • Demo: control of an octopus arm Pascal Poupart ICML-07 Bayeian RL Tutorial

Conjugate Prior • Suppose b is a monomial in θ – i.e. b ( θ xa ) = k Π x’’ ( θ xax’’ ) n xax’’ – 1 • Then b xax’ is also a monomial in θ – b xax’ ( θ xa ) = k [ Π x’’ ( θ xax’’ ) nxax’’ – 1 ] θ xax’ = k Π x’’ ( θ xax’’ ) nxax’’ – 1 + δ ( x’,x’’ ) • Distributions that are closed under Bayesian updates are called conjugate priors Pascal Poupart ICML-07 Bayeian RL Tutorial

Dirichlet Distributions • Dirichlets are monomials over discrete random variables: – Dir( θ xa ; n xa ) = k Π x’’ ( θ xax’’ ) n xax’’ – 1 Dir(p; 1, 1) • Dirichlets are conjugate Dir(p; 2, 8) Dir(p; 20, 80) priors for discrete likelihood distributions Pr(p) 0 0.2 1 p Pascal Poupart ICML-07 Bayeian RL Tutorial

Encoding Prior Knowledge • No knowledge: uniform distribution – E.g., Dir(p; 1, 1) • I believe p is roughly 0.2, Dir(p; 1, 1) Dir(p; 2, 8) then ( n 1 , n 2 ) � ( 0.2k, 0.8k ) Dir(p; 20, 80) – Dir(p; 0.2k, 0.8k) Pr(p) – k : level of confidence 0 0.2 1 p Pascal Poupart ICML-07 Bayeian RL Tutorial

Structural Priors • Suppose probability of two transitions is the same – Tie identical parameters – If Pr(•| x,a ) = Pr(•| x’,a’ ) then θ xa = θ x’a’ – Fewer parameters and pool evidence • Suppose transition dynamics are factored – E.g., transition probabilities can be encoded with a dynamic Bayesian network – Exponentially fewer parameters – E.g. θ x,pa(X) = Pr(X=x|pa(X)) Pascal Poupart ICML-07 Bayeian RL Tutorial

POMDP Formulation • Traditional RL: – X : set of states – A : set of actions unknown – p ( x’ | x,a ): transition probabilities • Bayesian RL POMDP – X × θ : set of states <x, θ > • x: physical state (observable) • θ : model (hidden) – A : set of actions known – Pr( x’, θ ’ | x, θ ,a ): transition probabilities Pascal Poupart ICML-07 Bayeian RL Tutorial

Belief MDP Formulation • Bayesian RL POMDP – X × θ : set of states <x, θ > – A : set of actions – Pr( x’, θ ’ | x, θ ,a ): transition probabilities known • Bayesian RL Belief MDP – X × B : set of states <x,b> – A : set of actions known – p ( x’,b’ | x,b,a ): transition probabilities Pascal Poupart ICML-07 Bayeian RL Tutorial

Policy Optimization • Classic RL: – V*(x) = max a Σ x’ Pr( x’|x,a ) [x r ’ + γ V* ( x’ )] – Hard to tell what needs to be explored – Exploration heuristics: ε -greedy, Boltzmann, etc. • Bayesian RL: – V* ( x,b ) = max a Σ x’ Pr( x’|x,b,a ) [x r ’+ γ V* ( x’,b xax’ )] – Belief b tells us what parts of the model are not well known and therefore worth exploring Pascal Poupart ICML-07 Bayeian RL Tutorial

Bayesian Methods in Reinforcement Learning Wednesday, June 20th, - PowerPoint PPT Presentation

Bayesian Methods in Reinforcement Learning Wednesday, June 20th, 2007 ICML-07 tutorial Corvallis, Oregon, USA Pascal Poupart (Univ. of Waterloo) Mohammad Ghavamzadeh (Univ. of Alberta) Yaakov Engel (Univ. of Alberta) Motivation Why a

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning Bayesian Methods in Reinforcement Learning ICML 2007

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Bayesian Reinforcement Learning in Continuous POMDPs Stphane Ross 1 , Brahim Chaib-draa 2 and

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Seminar: Search and Optimization 1. Organization, Seminar Schedule & Topics Martin Wehrle

The Programming Game An Alternative to GP for Expression Search DAASE/COW Open Workshop Tuesday

Hierarchical)&)Spectral)clustering) Lecture)13) David&Sontag&

The Mesozoic World A quick review of Dinosaurian animals Animals and plants Diversity through

Computational Design Paul Bourke paul.bourke@gmail.com Outline Introduction to iVEC

Outline Motivation 1 Unification problems 2 Formalisation of a nominal C-unification algorithm

The flat trefoil and other oddities Joel Langer Case Western Reserve University ICERM June,

MAS344 Knots and surfaces What is a knot? Not a knot What is a knot? Not a knot A knot What

Bayesian Methods in Reinforcement Learning Wednesday, June 20th, - PowerPoint PPT Presentation

Bayesian Methods in Reinforcement Learning Wednesday, June 20th, 2007 ICML-07 tutorial Corvallis, Oregon, USA Pascal Poupart (Univ. of Waterloo) Mohammad Ghavamzadeh (Univ. of Alberta) Yaakov Engel (Univ. of Alberta) Motivation Why a

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning Bayesian Methods in Reinforcement Learning ICML 2007

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Bayesian Reinforcement Learning in Continuous POMDPs Stphane Ross 1 , Brahim Chaib-draa 2 and

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Seminar: Search and Optimization 1. Organization, Seminar Schedule &amp; Topics Martin Wehrle

The Programming Game An Alternative to GP for Expression Search DAASE/COW Open Workshop Tuesday

Hierarchical)&amp;)Spectral)clustering) Lecture)13) David&amp;Sontag&amp;

The Mesozoic World A quick review of Dinosaurian animals Animals and plants Diversity through

Computational Design Paul Bourke paul.bourke@gmail.com Outline Introduction to iVEC

Outline Motivation 1 Unification problems 2 Formalisation of a nominal C-unification algorithm

The flat trefoil and other oddities Joel Langer Case Western Reserve University ICERM June,

MAS344 Knots and surfaces What is a knot? Not a knot What is a knot? Not a knot A knot What

Seminar: Search and Optimization 1. Organization, Seminar Schedule & Topics Martin Wehrle

Hierarchical)&)Spectral)clustering) Lecture)13) David&Sontag&