Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic - PowerPoint PPT Presentation

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments Yi Sun, Faustino Gomez, J¨ urgen Schmidhuber IDSIA, USI & SUPSI, Switzerland August 2011 Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 1 / 18

Motivation Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 2 / 18

Motivation An intelligent agent is sent to explore an unknown environment Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 2 / 18

Motivation An intelligent agent is sent to explore an unknown environment Learning through sequential interactions Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 2 / 18

Motivation An intelligent agent is sent to explore an unknown environment Learning through sequential interactions Limited time / resources Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 2 / 18

Motivation An intelligent agent is sent to explore an unknown environment Learning through sequential interactions Limited time / resources Question: How should the agent choose the actions, so that it learns the environment as effectively as possible? Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 2 / 18

Motivation An intelligent agent is sent to explore an unknown environment Learning through sequential interactions Limited time / resources Question: How should the agent choose the actions, so that it learns the environment as effectively as possible? Example: Learning the transition model of a Markovian environment using only 100 < s , a , s ′ > triples Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 2 / 18

Preliminary Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 3 / 18

Preliminary A Markov Reward Process (MRP) is defined by the 4-tuple ⟨ S , P , r , γ ⟩ Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 3 / 18

Preliminary A Markov Reward Process (MRP) is defined by the 4-tuple ⟨ S , P , r , γ ⟩ S = { 1, . . . , S } is the state space Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 3 / 18

Preliminary A Markov Reward Process (MRP) is defined by the 4-tuple ⟨ S , P , r , γ ⟩ S = { 1, . . . , S } is the state space P is an S × S transition matrix with { P } i , j = Pr [ s t + 1 = j ∣ s t = i ] Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 3 / 18

Preliminary A Markov Reward Process (MRP) is defined by the 4-tuple ⟨ S , P , r , γ ⟩ S = { 1, . . . , S } is the state space P is an S × S transition matrix with { P } i , j = Pr [ s t + 1 = j ∣ s t = i ] r ∈ R S is the reward function Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 3 / 18

Preliminary A Markov Reward Process (MRP) is defined by the 4-tuple ⟨ S , P , r , γ ⟩ S = { 1, . . . , S } is the state space P is an S × S transition matrix with { P } i , j = Pr [ s t + 1 = j ∣ s t = i ] r ∈ R S is the reward function γ ∈ [ 0,1 ) is the discount factor Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 3 / 18

Preliminary A Markov Reward Process (MRP) is defined by the 4-tuple ⟨ S , P , r , γ ⟩ S = { 1, . . . , S } is the state space P is an S × S transition matrix with { P } i , j = Pr [ s t + 1 = j ∣ s t = i ] r ∈ R S is the reward function γ ∈ [ 0,1 ) is the discount factor The Value Function , v ∈ R S , is the solution of the Bellman equation v = r + γ Pv . Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 3 / 18

Preliminary A Markov Reward Process (MRP) is defined by the 4-tuple ⟨ S , P , r , γ ⟩ S = { 1, . . . , S } is the state space P is an S × S transition matrix with { P } i , j = Pr [ s t + 1 = j ∣ s t = i ] r ∈ R S is the reward function γ ∈ [ 0,1 ) is the discount factor The Value Function , v ∈ R S , is the solution of the Bellman equation v = r + γ Pv . Let L = I − γ P , then v = L − r Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 3 / 18

Preliminary v = Φ θ , where Linear function approximation (LFA): ˆ Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 4 / 18

Preliminary v = Φ θ , where Linear function approximation (LFA): ˆ Φ = [ φ 1 , . . . , φ N ] are N ( N ≪ S ) basis functions Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 4 / 18

Preliminary v = Φ θ , where Linear function approximation (LFA): ˆ Φ = [ φ 1 , . . . , φ N ] are N ( N ≪ S ) basis functions θ = [ θ 1 , . . . , θ N ] ⊺ are the weights Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 4 / 18

Preliminary v = Φ θ , where Linear function approximation (LFA): ˆ Φ = [ φ 1 , . . . , φ N ] are N ( N ≪ S ) basis functions θ = [ θ 1 , . . . , θ N ] ⊺ are the weights The Bellman Error ε ∈ R S is defined as ε = r + γ P ˆ v − ˆ v = r − L Φ θ . Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 4 / 18

Preliminary v = Φ θ , where Linear function approximation (LFA): ˆ Φ = [ φ 1 , . . . , φ N ] are N ( N ≪ S ) basis functions θ = [ θ 1 , . . . , θ N ] ⊺ are the weights The Bellman Error ε ∈ R S is defined as ε = r + γ P ˆ v − ˆ v = r − L Φ θ . ε ≡ 0 ⇐ ⇒ v ≡ Φ θ Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 4 / 18

Preliminary v = Φ θ , where Linear function approximation (LFA): ˆ Φ = [ φ 1 , . . . , φ N ] are N ( N ≪ S ) basis functions θ = [ θ 1 , . . . , θ N ] ⊺ are the weights The Bellman Error ε ∈ R S is defined as ε = r + γ P ˆ v − ˆ v = r − L Φ θ . ε ≡ 0 ⇐ ⇒ v ≡ Φ θ ε is the expectation of the TD error Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 4 / 18

Preliminary v = Φ θ depends on both θ and Φ . The LFA ˆ Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 5 / 18

Preliminary v = Φ θ depends on both θ and Φ . The LFA ˆ To find θ : Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 5 / 18

Preliminary v = Φ θ depends on both θ and Φ . The LFA ˆ To find θ : TD (Sutton, 1988), LSTD (Bradtke et al., 1996), etc. Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 5 / 18

Preliminary v = Φ θ depends on both θ and Φ . The LFA ˆ To find θ : TD (Sutton, 1988), LSTD (Bradtke et al., 1996), etc. To construct Φ : Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 5 / 18

Preliminary v = Φ θ depends on both θ and Φ . The LFA ˆ To find θ : TD (Sutton, 1988), LSTD (Bradtke et al., 1996), etc. To construct Φ : Bellman error basis functions (BEBFs, Wu and Givan, 2005; Keller et al. 2006; Parr et al. 2007; Mahadevan and Liu 2010) Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 5 / 18

Preliminary v = Φ θ depends on both θ and Φ . The LFA ˆ To find θ : TD (Sutton, 1988), LSTD (Bradtke et al., 1996), etc. To construct Φ : Bellman error basis functions (BEBFs, Wu and Givan, 2005; Keller et al. 2006; Parr et al. 2007; Mahadevan and Liu 2010) Proto-value basis functions (Mahadevan et al., 2006) Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 5 / 18

Preliminary v = Φ θ depends on both θ and Φ . The LFA ˆ To find θ : TD (Sutton, 1988), LSTD (Bradtke et al., 1996), etc. To construct Φ : Bellman error basis functions (BEBFs, Wu and Givan, 2005; Keller et al. 2006; Parr et al. 2007; Mahadevan and Liu 2010) Proto-value basis functions (Mahadevan et al., 2006) Reduced-rank predictive state representations (Boots and Gordon, 2010) Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 5 / 18

Preliminary v = Φ θ depends on both θ and Φ . The LFA ˆ To find θ : TD (Sutton, 1988), LSTD (Bradtke et al., 1996), etc. To construct Φ : Bellman error basis functions (BEBFs, Wu and Givan, 2005; Keller et al. 2006; Parr et al. 2007; Mahadevan and Liu 2010) Proto-value basis functions (Mahadevan et al., 2006) Reduced-rank predictive state representations (Boots and Gordon, 2010) L1-regularized feature selection (Kolter and Ng, 2009) Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 5 / 18

Bellman Error Basis Functions Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 6 / 18

Bellman Error Basis Functions Intuition: ”Bellman error, loosely speaking, point[s] towards the optimal value function”, (Parr et al., 2007) Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 6 / 18

Bellman Error Basis Functions Intuition: ”Bellman error, loosely speaking, point[s] towards the optimal value function”, (Parr et al., 2007) Construction: Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 6 / 18

Bellman Error Basis Functions Intuition: ”Bellman error, loosely speaking, point[s] towards the optimal value function”, (Parr et al., 2007) Construction: φ ( 1 ) = r Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 6 / 18

Bellman Error Basis Functions Intuition: ”Bellman error, loosely speaking, point[s] towards the optimal value function”, (Parr et al., 2007) Construction: φ ( 1 ) = r At stage k > 1 Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 6 / 18

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic - PowerPoint PPT Presentation

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments Yi Sun, Faustino Gomez, J urgen Schmidhuber IDSIA, USI & SUPSI, Switzerland August 2011 Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 1 / 18

Dear friends, do Dear friends, do no not be surprised a t be surprised at the pain t the

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Cyverse Discovery Environment: UNCs Implementation of the Community Edition 228 DAVIS LIBRARY,

Causal Discovery and Forecasting in Nonstationary Environments with State - Space Models Biwei

CAT Coalition Technical Resources Working Group Quarterly Meeting August 14, 2019 11:00-12:30

Chapter 4 The Adaptive Arts Organization Management & the Arts, 5e, (C) Wm. Byrnes, 2014

How COVID-19 has Affected the Environment and Climate Change within the Built Environment TEAM

From a Calculus to an Execution Environment for Stream Processing Robert Soul Martin Hirzel Bu

Remote Visualization of Large Scale Data for Ultra-High Resolution Display Environment Sungwon

State of the Art Post Exploitation in Hardened PHP Environments Stefan Esser

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic - PowerPoint PPT Presentation

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments Yi Sun, Faustino Gomez, J urgen Schmidhuber IDSIA, USI & SUPSI, Switzerland August 2011 Sun,Gomez,Schmidhuber (IDSIA) Bayesian Exploration 08/11 1 / 18

Dear friends, do Dear friends, do no not be surprised a t be surprised at the pain t the

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Cyverse Discovery Environment: UNCs Implementation of the Community Edition 228 DAVIS LIBRARY,

Causal Discovery and Forecasting in Nonstationary Environments with State - Space Models Biwei

CAT Coalition Technical Resources Working Group Quarterly Meeting August 14, 2019 11:00-12:30

Chapter 4 The Adaptive Arts Organization Management &amp; the Arts, 5e, (C) Wm. Byrnes, 2014

How COVID-19 has Affected the Environment and Climate Change within the Built Environment TEAM

From a Calculus to an Execution Environment for Stream Processing Robert Soul Martin Hirzel Bu

Remote Visualization of Large Scale Data for Ultra-High Resolution Display Environment Sungwon

State of the Art Post Exploitation in Hardened PHP Environments Stefan Esser

Chapter 4 The Adaptive Arts Organization Management & the Arts, 5e, (C) Wm. Byrnes, 2014