Metareasoning for Deliberation Time Distribution in the Prost Planner - PowerPoint PPT Presentation

Motivation Approaches for Metareasoners Results Summary Metareasoning for Deliberation Time Distribution in the Prost Planner Ferdinand Badenberg University of Basel Bachelor Thesis Presentation, 2017

Motivation Approaches for Metareasoners Results Summary Outline Motivation 1 Why Metareasoning? Metareasoning Problem Approaches for Metareasoners 2 Hand Made Functions Metareasoner of Lin. et al. Improvements for the Metareasoner Results 3 Results for the Hand Made Functions Results for the Formal Procedure

Motivation Approaches for Metareasoners Results Summary Table of Contents Motivation 1 Why Metareasoning? Metareasoning Problem Approaches for Metareasoners 2 Hand Made Functions Metareasoner of Lin. et al. Improvements for the Metareasoner Results 3 Results for the Hand Made Functions Results for the Formal Procedure

Motivation Approaches for Metareasoners Results Summary Cycle act think Planner Environment reward, next state

Motivation Approaches for Metareasoners Results Summary Motivation Why Metareasoning? Optimise policy in given time Allocate time to think where it is needed Act if decision is easy, clear best action Think if decision is difficult, multiple actions very close

Motivation Approaches for Metareasoners Results Summary Metareasoning Problem Metareasoning Problem Steps from finite horizon MDP Rounds Limited time Anytime search algorithm Metareasoner Decision to think or act Based on specific values for these factors After one thinking cycle of the algorithm Goal: only think when necessary

Motivation Approaches for Metareasoners Results Summary Hand Made Functions Idea Allocate time for each step Think for as long as time is left State of the search algorithm not considered Functions Tested: 1 Uniform (Standard) 2 First 3 Linear 4 Hyperbolic

Motivation Approaches for Metareasoners Results Summary Example Time Distribution

Motivation Approaches for Metareasoners Results Summary Formal Metareasoner of Lin et al. Metareasoner Idea: think if change of policy is likely, act if it will stay the same Only considers expected reward estimations (Q-values) of search algorithm Act if Q act ≥ Q think How are they calculated?

Motivation Approaches for Metareasoners Results Summary Formal Metareasoner: Q think and Q act Q think Expected reward of the policy after another thinking cycle Simplification: only best action is relevant Estimate probability of action a being the best after the next thinking cycle Estimate expected reward given that action a is chosen Needed: next Q-values for each action Q act Intuitive idea: Q-value of current best action But: average of current Q-value and next Q-value

Motivation Approaches for Metareasoners Results Summary Formal Metareasoner: Estimation of Next Q-values Estimating Next Q-values Idea: base next change in Q-values on previous change in Q-values Assumption: next ∆ Q -value no larger than the previous one Draw random ρ between 0 and 1 ∆ Q ( a ) = ˆ ∆ Q ( a ) ∗ ρ for all actions a

Motivation Approaches for Metareasoners Results Summary Line Segment Example: UCT Q think > Q act 1 e 2 a 1 0 . 8 e 1 b Scaled Values a 2 0 . 6 0 . 4 0 . 2 a 3 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Unit Interval

Motivation Approaches for Metareasoners Results Summary Line Segment Example: UCT Q think = Q act 1 e 1 , b 0 . 8 a 1 Scaled Values a 2 0 . 6 a 3 0 . 4 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Unit Interval

Motivation Approaches for Metareasoners Results Summary Improvements Minimum Thinking Time Problem: assumption is often not true early on Improvement: think for at least T min seconds

Motivation Approaches for Metareasoners Results Summary Cthink + Cthink + Problem: time left is not considered Improvement: subtract C think from Q think

Motivation Approaches for Metareasoners Results Summary Cthink Cthink Problem: stopping with time left is useless Improvement: allow a negative C think

Motivation Approaches for Metareasoners Results Summary Results Hand Made Functions Results Problem Uniform Hyperbolic First Linear Wildfire 74 71 80 81 Triangle 72 65 72 75 Academic 37 37 34 45 Elevators 93 93 91 94 Tamarisk 93 92 91 94 Sysadmin 94 94 90 91 Recon 97 97 96 99 Game 97 93 94 93 Traffic 96 96 97 97 Crossing 87 89 91 99 Skill 91 91 88 93 Navigation 65 58 83 82 Total 83 82 84 86

Motivation Approaches for Metareasoners Results Summary Results Formal Procedure Results Cthink + Problem Uniform Lin et al. Minimum Cthink Wildfire 60 90 86 95 68 Triangle 78 67 62 59 68 Academic 39 32 36 35 38 Elevators 98 71 83 83 97 Tamarisk 68 86 90 92 96 Sysadmin 100 36 67 74 82 Recon 56 75 75 97 98 Game 97 64 82 86 96 Traffic 85 90 87 98 99 Crossing 88 58 78 83 89 Skill 25 71 69 86 100 Navigation 82 26 25 28 83 Total 86 56 70 72 83

Motivation Approaches for Metareasoners Results Summary Summary Result Summary Hand made functions performed very well Default metareasoner severely underestimates thinking The improvements proved to be very useful

Motivation Approaches for Metareasoners Results Summary Summary Outlook More general hand made functions Improve formal procedure: Consider all previous ∆ Q -values Replace random ρ More sophisticated cost of thinking: combination of two approaches

Motivation Approaches for Metareasoners Results Summary Questions?

Appendix BRTDP vs UCT BRTDP Used in original paper Cost setting Uses upper bound of the actual Q-value Monotonously decreasing UCT Used by Prost planner Reward setting No guarantees

Appendix BRTDP vs UCT: Visualisation

Appendix Line Segment Example: BRTDP Q think < Q act 1 a 1 0 . 8 Scaled Values 0 . 6 a 2 a 3 0 . 4 b e 3 0 . 2 e 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Unit Interval

Appendix Wildfire Time per Step

Metareasoning for Deliberation Time Distribution in the Prost Planner - PowerPoint PPT Presentation

Motivation Approaches for Metareasoners Results Summary Metareasoning for Deliberation Time Distribution in the Prost Planner Ferdinand Badenberg University of Basel Bachelor Thesis Presentation, 2017 Motivation Approaches for Metareasoners

An axiomatic approach to metareasoning on nominal algebras in HOAS Marino Miculan Universit` a

Policy Deliberation and Electoral Returns: Experimental Evidence from Benin and the Philippines

Cross National Study of Upstream Public Cross National Study of Upstream Public Deliberation on

WELCOME! Evaluation of Moral Case Deliberation and its impact in Europe: Methods, best practices

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

INCORPORATING LARGE-SCALE CITIZEN INCORPORATING LARGE-SCALE CITIZEN DELIBERATION INTO

2019-20 Budget Board Deliberation: Monday, October 7, 2019 @ 4:00 p.m. Public Engagement:

Democratic Deliberation in Bioethics Jason L. Schwartz, Ph.D., MBE Harold T. Shapiro Fellow in

Sequential Deliberation for Social Choice ALI MOHAMMAD FARAJI MOJTABA FAYAZBAKHSH Problem

A Deliberation Layer for Instantiating Robot Execution Plans from Abstract Task Descriptions

Town of LaSalle PROPOSED 2018 BUDGET & BUSINESS PLAN December 13 th , 14 th , 15 th 2018

Good deliberation Raymond De Vries Center for Bioethics and Social Sciences in Medicine

Calderdale CCG and Greater Huddersfield CCG Governing Bodies meeting in parallel Findings from

Tackling Wicked Problems: The Case for Facilitative Leadership Martn Carcasson Director of the

Chapter 5 Deliberation with Nondeterministic Domain Automated Planning Models and Acting Malik

SOGBOFA as heuristic guidance for THTS Ferdinand Badenberg Universit at Basel 20.5.2020

Genesis 21 THE PROMISE, THE PROBLEM, THE PA TRIARCH Opening Observations Chapter 21 is a short

Symbolic Network: Generalized Neural Policies for Relational MDPs Sankalp Garg ICML 2020 Joint

10/3/2017 He Who Is Everlasting Psalms 90:1-2 Lord, you have been our dwelling place throughout

Math 211 Math 211 Lecture #6 Mixing Problems January 29, 2001 2 Solving x = a ( t ) x + f

A A h ( t ) = Q i ( t ) Q o ( t ) Q i Q o ( t ) = r 2 gh ( t ) = K h ( t ) , h

Is System Identification Just Machine Learning? Keith Worden Dynamics Research Group Department

TELLICO VILLAGE PROPERTY OWNERS ASSOCIATION BOARD MEETING AGENDA Wednesday, April 22, 2020 1:30

Metareasoning for Deliberation Time Distribution in the Prost Planner - PowerPoint PPT Presentation

Motivation Approaches for Metareasoners Results Summary Metareasoning for Deliberation Time Distribution in the Prost Planner Ferdinand Badenberg University of Basel Bachelor Thesis Presentation, 2017 Motivation Approaches for Metareasoners

An axiomatic approach to metareasoning on nominal algebras in HOAS Marino Miculan Universit` a

Policy Deliberation and Electoral Returns: Experimental Evidence from Benin and the Philippines

Cross National Study of Upstream Public Cross National Study of Upstream Public Deliberation on

WELCOME! Evaluation of Moral Case Deliberation and its impact in Europe: Methods, best practices

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

INCORPORATING LARGE-SCALE CITIZEN INCORPORATING LARGE-SCALE CITIZEN DELIBERATION INTO

2019-20 Budget Board Deliberation: Monday, October 7, 2019 @ 4:00 p.m. Public Engagement:

Democratic Deliberation in Bioethics Jason L. Schwartz, Ph.D., MBE Harold T. Shapiro Fellow in

Sequential Deliberation for Social Choice ALI MOHAMMAD FARAJI MOJTABA FAYAZBAKHSH Problem

A Deliberation Layer for Instantiating Robot Execution Plans from Abstract Task Descriptions

Town of LaSalle PROPOSED 2018 BUDGET &amp; BUSINESS PLAN December 13 th , 14 th , 15 th 2018

Good deliberation Raymond De Vries Center for Bioethics and Social Sciences in Medicine

Calderdale CCG and Greater Huddersfield CCG Governing Bodies meeting in parallel Findings from

Tackling Wicked Problems: The Case for Facilitative Leadership Martn Carcasson Director of the

Chapter 5 Deliberation with Nondeterministic Domain Automated Planning Models and Acting Malik

SOGBOFA as heuristic guidance for THTS Ferdinand Badenberg Universit at Basel 20.5.2020

Genesis 21 THE PROMISE, THE PROBLEM, THE PA TRIARCH Opening Observations Chapter 21 is a short

Symbolic Network: Generalized Neural Policies for Relational MDPs Sankalp Garg ICML 2020 Joint

10/3/2017 He Who Is Everlasting Psalms 90:1-2 Lord, you have been our dwelling place throughout

Math 211 Math 211 Lecture #6 Mixing Problems January 29, 2001 2 Solving x = a ( t ) x + f

A A h ( t ) = Q i ( t ) Q o ( t ) Q i Q o ( t ) = r 2 gh ( t ) = K h ( t ) , h

Is System Identification Just Machine Learning? Keith Worden Dynamics Research Group Department

TELLICO VILLAGE PROPERTY OWNERS ASSOCIATION BOARD MEETING AGENDA Wednesday, April 22, 2020 1:30

Town of LaSalle PROPOSED 2018 BUDGET & BUSINESS PLAN December 13 th , 14 th , 15 th 2018