Model Estimation Within Planning and Learning Alborz Geramifard - PowerPoint PPT Presentation

Model Estimation Within Planning and Learning Alborz Geramifard ICML W orkshop - June 2011 agf@mit.edu 1

Joint W ork Joshua Redding Joshua Joseph Jonathan How 2

+1 -1 -0.01 20% noise Conservative Aggressive Optimal 3

Big Picture P l a n n e r Fast, safe, sub - optimal solution M o d e l Estimator of the true model using a parametric form L e a r n e r A reinforcement learning algorithm running online 4

Question A framework to integrate P l a n n e r M o d e l L e a r n e r Goal: explore safely reduce sample complexity reach optimal solution asymptotically 5

Existing Gap Overly Restrictive [ Heger 1994 ] Lack of Analytical Convergence [ Geibel et al. 2005 ] No Safety Guarantees [ Abbeel et al. 2005 ] Requires Planner’s V alue Function [ Knox et al. 2010 ] 6

Contributions Extended our previous framework to support adaptive modeling. Empirically verified the advantage of the new approach. Discussed the limitation of our approach and provided two potential solutions. 7

1 Approach Exploit & Explore P l a n n e r e r L e a r n intelligent Cooperative Control Architecture ( iCCA ) Planner initializes the policy and regulates the exploration of the learner. 8

[ ACC 2011 ] 1 Previous work 9

[ ACC 2011 ] 1 Previous work S t a t i c O ffl ine: π p n n e r e l P l a M o d 9

[ ACC 2011 ] 1 Previous work S t a t i c O ffl ine: π p n n e r e l P l a M o d Online: r n e r L e a 9

[ ACC 2011 ] 1 Previous work S t a t i c O ffl ine: π p n n e r e l P l a M o d Online: T y p e R m a x r n e r L e a Suggest action? 9

[ ACC 2011 ] 1 Previous work S t a t i c O ffl ine: π p n n e r e l P l a M o d Online: T y p e R m a x r n e r L e a Suggest action? No a ∼ π p 9

[ ACC 2011 ] 1 Previous work S t a t i c O ffl ine: π p n n e r e l P l a M o d Online: T y p e R m a x Y es a ∼ π l r n e r L e a Suggest action? No a ∼ π p 9

[ ACC 2011 ] 1 Previous work S t a t i c O ffl ine: π p n n e r e l P l a M o d Online: T y p e R m a x Y es a ∼ π l r n e r L e a Suggest action? No a ∼ π p π p 9

[ ACC 2011 ] 1 Previous work S t a t i c O ffl ine: π p n n e r e l P l a M o d Online: T y p e R m a x Y es a ∼ π l r n e r L e a Suggest action? No a ∼ π p Safe Action? π p 9

[ ACC 2011 ] 1 Previous work S t a t i c O ffl ine: π p n n e r e l P l a M o d Online: T y p e R m a x Y es a ∼ π l r n e r L e a Suggest action? No a ∼ π p No Safe Action? π p 9

[ ACC 2011 ] 1 Previous work S t a t i c O ffl ine: π p n n e r e l P l a M o d Online: T y p e R m a x Y es a ∼ π l r n e r L e a Suggest action? No a ∼ π p No Y es Safe Action? π p 9

1 New Approach A d a p t Online: i v e 1 π p n n e r e l P l a M o d T y p e R m a x 2 Y es a ∼ π l r n e r L e a Suggest action? No a ∼ π p No Y es Safe Action? π p 10

Empirical Results 100 learning trials with the Gridworld Sarsa ε - greedy Policy iCCA Noise = 40 % Planner’s policy = conservative AM-iCCA Initial noise = 40 % If noise ≤ 25 % planner’s policy = aggressive else planner’s policy = conservative 11

Empirical Results 1 AM-iCCA 0.5 0 Aggressive Policy − 0.5 Conservative Policy iCCA Return − 1 − 1.5 − 2 Sarsa − 2.5 − 3 − 3.5 0 2000 4000 6000 8000 10000 Steps 12 daptive Mo

Extensions What if the parametric form of the model can not represent the true model? Knownness is high ⇒ Ignore safety checking Estimate the value of planner policies by reflecting back on the past data. 13

Contributions Extended our previous framework to support adaptive modeling. Empirically verified the advantage of the new approach. Discussed the limitation of our approach and provided two potential solutions. 14

Model Estimation Within Planning and Learning Alborz Geramifard - PowerPoint PPT Presentation

Model Estimation Within Planning and Learning Alborz Geramifard ICML W orkshop - June 2011 agf@mit.edu 1 Joint W ork Joshua Redding Joshua Joseph Jonathan How 2 +1 -1 -0.01 20% noise Conservative Aggressive Optimal 3 +1 -1

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Applied Statistics Lecturer: Serena Arima Regression model Model estimation Properties OLS

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Model Estimation, Testing, and Reporting Model Estimation, Testing, and Reporting PSYC 575 PSYC

Area Type Sub Model Estimation Area Type Sub Model Estimation AT classification used in: AT

Density Ratio Estimation Density Ratio Estimation in Machine Learning in Machine Learning

Planning a Software Project Agenda Background Process planning Effort estimation

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Adaptive estimation in functional linear model: a model selection approach Angelina Roche joint

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

ESTIMATION AS UNCERTAINTY REDUCTION What is this estimation thing, anyway? Michael Godeck

HUD Review Requirements 182 Contents PHA Submissions for HUD Approval Contract Actions

Toward General Diagnosis of Static Errors Danfeng Zhang and Andrew C. Myers Cornell University

Congressional Request Advancing the Science of Climate Change Generic Report Summary Slides

June 3, 2014 Brian Haigh, MD Learning Objectives Appreciate complexity and impact of risk

Where are you going with those types? Vincent St-Amour, Sam Tobin-Hochstadt, Matthew Flatt,

All City Council Live! Info Academic Update Q&A Raise your hand to be called on! Use the

Clacc 2019: An Update on OpenACC Support for Clang and LLVM Joel E. Denny, Seyong Lee, Jeffrey S.

Polly-ACC: Transparent Compilation to Heterogeneous Hardware Torsten Hoefler (with Tobias Grosser)

Model Estimation Within Planning and Learning Alborz Geramifard - PowerPoint PPT Presentation

Model Estimation Within Planning and Learning Alborz Geramifard ICML W orkshop - June 2011 agf@mit.edu 1 Joint W ork Joshua Redding Joshua Joseph Jonathan How 2 +1 -1 -0.01 20% noise Conservative Aggressive Optimal 3 +1 -1

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Applied Statistics Lecturer: Serena Arima Regression model Model estimation Properties OLS

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Model Estimation, Testing, and Reporting Model Estimation, Testing, and Reporting PSYC 575 PSYC

Area Type Sub Model Estimation Area Type Sub Model Estimation AT classification used in: AT

Density Ratio Estimation Density Ratio Estimation in Machine Learning in Machine Learning

Planning a Software Project Agenda Background Process planning Effort estimation

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Adaptive estimation in functional linear model: a model selection approach Angelina Roche joint

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

ESTIMATION AS UNCERTAINTY REDUCTION What is this estimation thing, anyway? Michael Godeck

HUD Review Requirements 182 Contents PHA Submissions for HUD Approval Contract Actions

Toward General Diagnosis of Static Errors Danfeng Zhang and Andrew C. Myers Cornell University

Congressional Request Advancing the Science of Climate Change Generic Report Summary Slides

June 3, 2014 Brian Haigh, MD Learning Objectives Appreciate complexity and impact of risk

Where are you going with those types? Vincent St-Amour, Sam Tobin-Hochstadt, Matthew Flatt,

All City Council Live! Info Academic Update Q&amp;A Raise your hand to be called on! Use the

Clacc 2019: An Update on OpenACC Support for Clang and LLVM Joel E. Denny, Seyong Lee, Jeffrey S.

Polly-ACC: Transparent Compilation to Heterogeneous Hardware Torsten Hoefler (with Tobias Grosser)

All City Council Live! Info Academic Update Q&A Raise your hand to be called on! Use the