Outline Intro to RL and Bayesian Learning History of Bayesian RL - PowerPoint PPT Presentation

Outline • Intro to RL and Bayesian Learning • History of Bayesian RL • Model-based Bayesian RL – Prior knowledge, policy optimization, discussion, Bayesian approaches for other RL variants • Model-free Bayesian RL – Gaussian process temporal difference, Gaussian process SARSA, Bayesian policy gradient, Bayesian actor-critique algorithms • Demo: control of an octopus arm Pascal Poupart ICML-07 Bayeian RL Tutorial 1/63

Common Belief • Reinforcement Learning in AI: – Formalized in the 1980’s by Sutton, Barto and others – Traditional RL algorithms are not Bayesian Bayesian RL is a new approach Wrong! Pascal Poupart ICML-07 Bayeian RL Tutorial 2/63

A Bit of History • RL is the problem of controlling a Markov Chain with unknown probabilities. • While the AI community started working on this problem in the 1980’s and called it Reinforcement Learning, the control of Markov chains with unknown probabilities had already been extensively studied in Operations Research since the 1950’s, including Bayesian methods . Pascal Poupart ICML-07 Bayeian RL Tutorial 3/63

A Bit of History • Operations Research: Bayesian Reinforcement Learning already studied under the names of – Adaptive control processes [Bellman] – Dual control [Fel’Dbaum] – Optimal learning • 1950’s & 1960’s: Bellman, Fel’Dbaum, Howard and others develop Bayesian techniques to control Markov chains with uncertain probabilities and rewards Pascal Poupart ICML-07 Bayeian RL Tutorial 4/63

Bayesian RL Work • Operations Research – Theoretical foundation – Algorithmic solutions for special cases • Bandit problems: Gittins indices – Intractable algorithms for the general case • Artificial Intelligence – Algorithmic advances to improve scalability Pascal Poupart ICML-07 Bayeian RL Tutorial 5/63

Artificial Intelligence • (Non-exhaustive list) • Model-based Bayesian RL: Dearden et al. (1999), Strens (2000), Duff (2002, 2003), Mannor et al. (2004, 2007), Madani et al. (2004), Wang et al. (2005), Jaulmes et al. (2005), Poupart et al. (2006), Delage et al. (2007), Wilson et al. (2007). • Model-free Bayesian RL: Dearden et al. (1998); Engel et al. (2003, 2005); Ghavamzadeh et al. (2006, 2007). Pascal Poupart ICML-07 Bayeian RL Tutorial 6/63

Model-based Bayesian RL • Markov Decision Process: – X : set of states <x s ,x r > • x s : physical state component Reinforcement • x r : reward component Learning – A : set of actions – p ( x’ | x,a ): transition and reward probabilities • Bayesian Model-based Reinforcement Learning • Encode unknown prob. with random variables θ – i.e., θ xax’ = Pr( x’|x,a ): random variable in [0,1] – i.e., θ xa = Pr(•| x,a ): multinomial distribution Pascal Poupart ICML-07 Bayeian RL Tutorial 8/63

Model Learning • Assume prior b ( θ xa ) = Pr( θ xa ) • Learning: use Bayes theorem to compute posterior b xax’ ( θ xa ) = Pr( θ xa |x,a,x’) b xax’ ( θ xa ) = k Pr( θ xa ) Pr( x’|x,a, θ xa ) = k b ( θ xa ) θ xax’ • What is the prior b? • Could we choose b to be in the same class as b xax’ ? Pascal Poupart ICML-07 Bayeian RL Tutorial 9/63

Outline • Intro to RL and Bayesian Learning • History of Bayesian RL • Model-based Bayesian RL – Prior knowledge , policy optimization, discussion, Bayesian approaches for other RL variants • Model-free Bayesian RL – Gaussian process temporal difference, Gaussian process SARSA, Bayesian policy gradient, Bayesian actor-critique algorithms • Demo: control of an octopus arm Pascal Poupart ICML-07 Bayeian RL Tutorial 10/63

Conjugate Prior • Suppose b is a monomial in θ – i.e. b ( θ xa ) = k Π x’’ ( θ xax’’ ) n xax’’ – 1 • Then b xax’ is also a monomial in θ – b xax’ ( θ xa ) = k [ Π x’’ ( θ xax’’ ) nxax’’ – 1 ] θ xax’ = k Π x’’ ( θ xax’’ ) nxax’’ – 1 + δ ( x’,x’’ ) • Distributions that are closed under Bayesian updates are called conjugate priors Pascal Poupart ICML-07 Bayeian RL Tutorial 11/63

Dirichlet Distributions • Dirichlets are monomials over discrete random variables: – Dir( θ xa ; n xa ) = k Π x’’ ( θ xax’’ ) n xax’’ – 1 Dir(p; 1, 1) • Dirichlets are conjugate Dir(p; 2, 8) Dir(p; 20, 80) priors for discrete likelihood distributions Pr(p) 0 0.2 1 p Pascal Poupart ICML-07 Bayeian RL Tutorial 12/63

Encoding Prior Knowledge • No knowledge: uniform distribution – E.g., Dir(p; 1, 1) • I believe p is roughly 0.2, Dir(p; 1, 1) Dir(p; 2, 8) then ( n 1 , n 2 ) � ( 0.2k, 0.8k ) Dir(p; 20, 80) – Dir(p; 0.2k, 0.8k) Pr(p) – k : level of confidence 0 0.2 1 p Pascal Poupart ICML-07 Bayeian RL Tutorial 13/63

Structural Priors • Suppose probability of two transitions is the same – Tie identical parameters – If Pr(•| x,a ) = Pr(•| x’,a’ ) then θ xa = θ x’a’ – Fewer parameters and pool evidence • Suppose transition dynamics are factored – E.g., transition probabilities can be encoded with a dynamic Bayesian network – Exponentially fewer parameters – E.g. θ x,pa(X) = Pr(X=x|pa(X)) Pascal Poupart ICML-07 Bayeian RL Tutorial 14/63

POMDP Formulation • Traditional RL: – X : set of states – A : set of actions unknown – p ( x’ | x,a ): transition probabilities • Bayesian RL POMDP – X × θ : set of states <x, θ > • x: physical state (observable) • θ : model (hidden) – A : set of actions known – Pr( x’, θ ’ | x, θ ,a ): transition probabilities Pascal Poupart ICML-07 Bayeian RL Tutorial 16/63

Belief MDP Formulation • Bayesian RL POMDP – X × θ : set of states <x, θ > – A : set of actions – Pr( x’, θ ’ | x, θ ,a ): transition probabilities known • Bayesian RL Belief MDP – X × B : set of states <x,b> – A : set of actions known – p ( x’,b’ | x,b,a ): transition probabilities Pascal Poupart ICML-07 Bayeian RL Tutorial 18/63

Policy Optimization • Classic RL: – V*(x) = max a Σ x’ Pr( x’|x,a ) [x r ’ + γ V* ( x’ )] – Hard to tell what needs to be explored – Exploration heuristics: ε -greedy, Boltzmann, etc. • Bayesian RL: – V* ( x,b ) = max a Σ x’ Pr( x’|x,b,a ) [x r ’+ γ V* ( x’,b xax’ )] – Belief b tells us what parts of the model are not well known and therefore worth exploring Pascal Poupart ICML-07 Bayeian RL Tutorial 20/63

Exploration/Exploitation Tradeoff • Dilemma: – Maximize immediate rewards (exploitation)? – Or, maximize information gain (exploration)? • Wrong question! • Single objective: max expected total rewards – V μ (x 0 ) = Σ t γ t E[ x r,t ] P(xt| μ ) – Optimal policy μ *: V μ * (x) ≥ V μ (x) for all x, μ • Optimal exploration/exploitation tradeoff Pascal Poupart ICML-07 Bayeian RL Tutorial 21/63

Policy Optimization • Use favorite RL/MDP/POMDP algorithm to solve – V* ( x,b ) = max a Σ x’ Pr( x’|x,b,a ) [x r ’+ γ V* ( x’,b xax’ )] • Some approaches (non-exhaustive list): – Myopic value of information (Dearden et al. 1999) – Thompson sampling (Strens 2000) – Bayesian Sparse sampling (Wang et al. 2005) – Policy gradient (Duff 2002) – POMDP discretization (Jaulmes et al. 2005) – BEETLE (Poupart et al. 2006) Pascal Poupart ICML-07 Bayeian RL Tutorial 22/63

Myopic Value of Information • Dearden, Friedman, Andre (1999) • Myopic value of information: – Expected gain from the observation of a transition • Myopic value of perfect information MVPI(x,a): – Upper bound on myopic value of information – Expected gain from learning the true value of a in x • Action selection – a* = argmax a Q(x,a) + MVPI(x,a) exploit explore Pascal Poupart ICML-07 Bayeian RL Tutorial 23/63

Outline Intro to RL and Bayesian Learning History of Bayesian RL - PowerPoint PPT Presentation

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian RL Prior knowledge, policy optimization, discussion, Bayesian approaches for other RL variants Model-free Bayesian RL Gaussian

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Testing Basic Concepts Some important terms Bug : Error in a program. (Always expect

Bugzilla, Bug-squad and GNOME3 Presented By Akhil Laddha 1 Agenda About me Bugzilla Bug

Compiler Fuzzing: How Much Does It Matter? Michal Marcozzi* Qiyi Tang* Alastair F.

Industrial Applications of Aerodynamic Shape Optimization Antony Jameson John C. Vassberg

Automated Program Repair Opportunities, Challenges, Advances Chris Timperley 1 About me...

Why dont you Nowhere to Store 37% own a bike? 27% Insufficient Locking PROBLEM Dorky

KYOVAS BICYCLE & PEDESTRIAN PLANNING EFFORTS 2017 BIKE SUMMIT SEPTEMBER 24-25, 2017

72 KM. (45 MI LES) C YC LE T RAC KS 10 KM. (6 MI LES) C AR-FREE ST REET S

Sambuz

Useful Links

Newsletter

Mail Us

Outline Intro to RL and Bayesian Learning History of Bayesian RL - PowerPoint PPT Presentation

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian RL Prior knowledge, policy optimization, discussion, Bayesian approaches for other RL variants Model-free Bayesian RL Gaussian

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Testing Basic Concepts Some important terms Bug : Error in a program. (Always expect

Bugzilla, Bug-squad and GNOME3 Presented By Akhil Laddha 1 Agenda About me Bugzilla Bug

Compiler Fuzzing: How Much Does It Matter? Michal Marcozzi* Qiyi Tang* Alastair F.

Industrial Applications of Aerodynamic Shape Optimization Antony Jameson John C. Vassberg

Automated Program Repair Opportunities, Challenges, Advances Chris Timperley 1 About me...

Why dont you Nowhere to Store 37% own a bike? 27% Insufficient Locking PROBLEM Dorky

KYOVAS BICYCLE &amp; PEDESTRIAN PLANNING EFFORTS 2017 BIKE SUMMIT SEPTEMBER 24-25, 2017

72 KM. (45 MI LES) C YC LE T RAC KS 10 KM. (6 MI LES) C AR-FREE ST REET S

Sambuz

Useful Links

Newsletter

Mail Us

KYOVAS BICYCLE & PEDESTRIAN PLANNING EFFORTS 2017 BIKE SUMMIT SEPTEMBER 24-25, 2017