A Bayesian Approach to Finding Compact Representations for - PowerPoint PPT Presentation

A Bayesian Approach to Finding Compact Representations for Reinforcement Learning Special thanks to Joelle Pineau for presenting our paper - July, 2012 1

Authors Alborz Geramifard Stefanie Tellex David Wingate Nicholas Roy Jonathan How 2

Vision Solving Large Sequential Decision Making Problems Formulated as MDPs . 3

Reinforcement Learning π ( s ) : S → A a t s t , r t " 1 # � X γ t � 1 r t � Q π ( s, a ) = E π � s 0 = s, a 0 = a, � t =1 4

Linear Function Approximation φ 1 θ 1 φ 2 θ 2 s � . Q π ( s, a ) ≈ φ ( s, a ) > θ . . φ n θ n 5

Challenge Our focus φ Good Representation ( ) Good V alue Function ( ) Q Good Policy ( ) π 6

Approach φ Observed Data Samples D Representation Q V alue Function Policy π G ∈ { 0 , 1 } Policy is good? 7

Approach φ Observed Data Samples D Representation Q V alue Function Ideally: φ ∗ = argmax P ( φ | G, D ) φ Policy π Using G instead of G ∈ { 0 , 1 } Policy is good? G=1 for brevity 7

Approach φ ∗ = argmax P ( φ | G, D ) φ Big! Extended features ∨ ∧ 7 8 Logical combinations of φ Problem: ∈ primitive features such as 1 2 3 4 5 6 Primitive features f 8 = f 4 ∧ f 6 8

Approach φ ∗ = argmax P ( φ | G, D ) φ Big! Extended features ∨ ∧ 7 8 Logical combinations of φ Problem: ∈ primitive features such as 1 2 3 4 5 6 Primitive features f 8 = f 4 ∧ f 6 | | Insight: P ( φ | G, D ) P ( G | φ , D ) P ( φ ) . ∝ 8

Approach | | P ( φ | G, D ) P ( G | φ , D ) P ( φ ) . ∝ Prior Likelihood 9

Approach | | P ( φ | G, D ) P ( G | φ , D ) P ( φ ) . ∝ Prior Likelihood Likelihood: φ , D - Find the best policy given ( we used LSPI ) π ← [ Lagoudakis et al. 2003 ] Simulate trajectories for P ( G | φ , D ) ∝ e η V π ( s 0 ) - estimating V π ( s 0 ) A well performing policy is more likely to be a Good policy! 9

Approach | | P ( φ | G, D ) P ( G | φ , D ) P ( φ ) . ∝ Prior Likelihood Likelihood: φ , D - Find the best policy given ( we used LSPI ) π ← [ Lagoudakis et al. 2003 ] Simulate trajectories for P ( G | φ , D ) ∝ e η V π ( s 0 ) - estimating V π ( s 0 ) A well performing policy is more likely to be a Good policy! Prior: [ Goodman et al. 2008 ] - Representations with less number of features are more likely. - Representations with simple features are more likely. 9

Approach | | P ( φ | G, D ) P ( G | φ , D ) P ( φ ) . ∝ Posterior Inference: Use Metropolis - Hastings ( MH ) to sample from the posterior. MH+LSPI = MHPI 10

Approach | | P ( φ | G, D ) P ( G | φ , D ) P ( φ ) . ∝ Posterior Inference: Use Metropolis - Hastings ( MH ) to sample from the posterior. MH+LSPI = MHPI Markov Chain Monte - Carlo: φ 0 φ Propose Accept probabilistically based on the posterior 10

Approach | | P ( φ | G, D ) P ( G | φ , D ) P ( φ ) . ∝ Posterior Propose Function: Inference: φ 0 Use Metropolis - Hastings ∧ 9 ( MH ) to sample from the posterior. ∨ ∧ 7 8 φ MH+LSPI = MHPI d 1 2 3 4 5 6 d A Markov Chain Monte - Carlo: Extended features ∨ ∧ ∧ ∧ 7 8 7 8 φ 0 φ Propose Mutate 1 2 3 4 5 6 1 2 3 4 5 6 Remove Primitive features ∧ 7 Accept probabilistically based on the posterior 10 1 2 3 4 5 6 Figure 2: Representation of primitive and extended features and

Maze 200 Initial Samples Initial features: row and column indicators Noiseless Actions: → , ← , ↓ , ↑ 1 2 3 100 1000 4 900 90 # of Steps to the Goal 5 800 80 # of Samples 700 6 70 600 60 7 Steps 500 50 8 400 40 9 300 30 200 10 20 100 11 10 0 0 200 400 600 800 1000 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 11 MH Iteration Iteration MHPI Iteration # of Extended Features ( a ) Domain ( b ) Posterior Distribution ( c ) Sampled Performance ( d ) Resulting Policy Figure 3: Maze domain empirical results 11

BlocksW orld 1000 Initial Samples Initial features: on ( A,B ) 20 % noise of dropping the block 200 100 # of Steps to Make the Tower 180 90 160 80 # of Steps to Make the Tower 7.75 140 # of Samples 70 6.75 120 60 goal Steps 100 50 5.75 80 40 start 4.75 60 30 40 3.75 20 20 10 2.75 0 0 1 2 3 4 5 6 7 8 9 10 11 12 0 200 400 600 800 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 MHPI Iteration # of Extended Features MH Iteration Iteration # of Extended Features ( a ) Domain ( b ) Posterior Distribution ( c ) Sampled Performance ( d ) Performance Dist. Figure 4: BlocksWorld 12

Inverted Pendulum 1000 Initial Samples θ , ˙ Initial features: Discretize into 21 buckets separately θ Gaussian noise was added to torque values 350 3000 · 300 θ # of Balancing Steps 2500 250 # of Samples # of Balancing Steps 3000 θ 200 2000 Steps 2500 150 1500 2000 100 τ 1500 1000 50 1000 0 0 1 >1 500 0 1 2 3 4 5 6 7 8 9 10 11 0 100 200 300 400 500 # of Extended Features # of Extended Features MHPI Iteration MH Iteration Iteration ( a ) Domain ( b ) Posterior Distribution ( c ) Performance ( d ) Performance Dist. Figure 5: 13

Inverted Pendulum 1000 Initial Samples θ , ˙ Initial features: Discretize into 21 buckets separately θ Gaussian noise was added to torque values Many proposed representations were rejected initially 350 3000 · 300 θ # of Balancing Steps 2500 250 # of Samples # of Balancing Steps 3000 θ 200 2000 Steps 2500 150 1500 2000 100 τ 1500 1000 50 1000 0 0 1 >1 500 0 1 2 3 4 5 6 7 8 9 10 11 0 100 200 300 400 500 # of Extended Features # of Extended Features MHPI Iteration MH Iteration Iteration ( a ) Domain ( b ) Posterior Distribution ( c ) Performance ( d ) Performance Dist. Figure 5: 13

Inverted Pendulum 1000 Initial Samples θ , ˙ Initial features: Discretize into 21 buckets separately θ Gaussian noise was added to torque values Many proposed representations were rejected initially 350 3000 · 300 θ # of Balancing Steps 2500 250 # of Samples # of Balancing Steps 3000 θ 200 2000 Steps 2500 150 1500 2000 100 τ 1500 1000 50 1000 0 0 1 >1 500 0 1 2 3 4 5 6 7 8 9 10 11 0 100 200 300 400 500 # of Extended Features # of Extended Features MHPI Iteration MH Iteration Iteration ( a ) Domain ( b ) Posterior Distribution ( c ) Performance ( d ) Performance Dist. Figure 5: xtended features hurt the performance, the ex- 21 ≤ θ < 0) ∧ (0 . 4 ≤ ˙ feature ( − π θ < 0 . 6) Key feature: agent to complete the task successfully. 13

Contributions Introduced a Bayesian approach for finding concise yet expressive representations for solving MDPs. Introduced MHPI as a new RL technique that expands the representation using limited samples. Empirically demonstrated the e ff ectiveness of our approach in 3 domains. Feature Work: V π ( s 0 ) Reuse the data for estimating for policy iteration Relax the need of a simulator to generate trajectories Importance sampling [ Sutton and Barto, 1998 ] Model - free Monte Carlo [ Fonteneau et al., 2010 ] 14

A Bayesian Approach to Finding Compact Representations for - PowerPoint PPT Presentation

A Bayesian Approach to Finding Compact Representations for Reinforcement Learning Special thanks to Joelle Pineau for presenting our paper - July, 2012 1 Authors Alborz Geramifard Stefanie Tellex David Wingate Nicholas Roy Jonathan How 2

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Interstate Medical Licensure Compact Overview Define Need for compact Compacts in

Compact Subsets Theorem Suppose that K is a subset of a topological space X. 1 If X is compact

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

61A Lecture 16 Announcements String Representations String Representations 4 String

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Multi-wavelength cross-correlation methods Mara Salvato (With a thanks to F. Guglielmetti &

Fast approximate planning in POMDPs Geoff Gordon ggordon@cs.cmu.edu Joelle Pineau, Geoff Gordon,

Unreproducible Research is Reproducible Xavier Bouthillier Csar Laurent Pascal Vincent Take

CS 112: Intro to Comp Prog CS 112: Intro to Comp Prog Lecture Review Data Types String

IRIM at TRECVID 2017: Instance Search Presenter : Pierre-Etienne Martin Boris Mansencal, Jenny

Activities of Daily Living Indexing by Hierarchical HMM for Dementia Diagnostics Svebor Karaman,

Sequence-Level Knowledge Distillation Yoon Kim Alexander M. Rush HarvardNLP Code:

CS 378: Autonomous Intelligent Robotics (FRI) Dr. Todd Hester Are there any questions?

Sambuz

Useful Links

Newsletter

Mail Us

A Bayesian Approach to Finding Compact Representations for - PowerPoint PPT Presentation

A Bayesian Approach to Finding Compact Representations for Reinforcement Learning Special thanks to Joelle Pineau for presenting our paper - July, 2012 1 Authors Alborz Geramifard Stefanie Tellex David Wingate Nicholas Roy Jonathan How 2

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Interstate Medical Licensure Compact Overview Define Need for compact Compacts in

Compact Subsets Theorem Suppose that K is a subset of a topological space X. 1 If X is compact

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

61A Lecture 16 Announcements String Representations String Representations 4 String

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Multi-wavelength cross-correlation methods Mara Salvato (With a thanks to F. Guglielmetti &amp;

Fast approximate planning in POMDPs Geoff Gordon ggordon@cs.cmu.edu Joelle Pineau, Geoff Gordon,

Unreproducible Research is Reproducible Xavier Bouthillier Csar Laurent Pascal Vincent Take

CS 112: Intro to Comp Prog CS 112: Intro to Comp Prog Lecture Review Data Types String

IRIM at TRECVID 2017: Instance Search Presenter : Pierre-Etienne Martin Boris Mansencal, Jenny

Activities of Daily Living Indexing by Hierarchical HMM for Dementia Diagnostics Svebor Karaman,

Sequence-Level Knowledge Distillation Yoon Kim Alexander M. Rush HarvardNLP Code:

CS 378: Autonomous Intelligent Robotics (FRI) Dr. Todd Hester Are there any questions?

Sambuz

Useful Links

Newsletter

Mail Us

Multi-wavelength cross-correlation methods Mara Salvato (With a thanks to F. Guglielmetti &