Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau - PowerPoint PPT Presentation

Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 Dana Nau: Lecture slides for Automated Planning 1 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Motivation c a b Intended ● Until now, we’ve assumed outcome c that each action has only one grasp(c) a b possible outcome ◆ But often that’s unrealistic ● In many situations, actions may have a b more than one possible outcome Unintended ◆ Action failures outcome » e.g., gripper drops its load ◆ Exogenous events » e.g., road closed ● Would like to be able to plan in such situations ● One approach: Markov Decision Processes Dana Nau: Lecture slides for Automated Planning 2 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Stochastic Systems ● Stochastic system : a triple Σ = ( S, A, P ) ◆ S = finite set of states ◆ A = finite set of actions ◆ P a ( s ʹ″ | s ) = probability of going to s ʹ″ if we execute a in s ◆ ∑ s ʹ″ ∈ S P a ( s ʹ″ | s ) = 1 ● Several different possible action representations ◆ e.g., Bayes networks, probabilistic operators ● The book does not commit to any particular representation ◆ It only deals with the underlying semantics ◆ Explicit enumeration of each P a ( s ʹ″ | s ) Dana Nau: Lecture slides for Automated Planning 3 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Example wait 2 ● Robot r1 starts wait at location l1 move(r1,l2,l1) ◆ State s1 in the diagram ● Objective is to wait get r1 to location l4 ◆ State s4 in Goal wait Start the diagram Dana Nau: Lecture slides for Automated Planning 4 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Example wait 2 ● Robot r1 starts wait at location l1 move(r1,l2,l1) ◆ State s1 in the diagram ● Objective is to wait get r1 to location l4 ◆ State s4 in Goal wait Start the diagram ● No classical plan (sequence of actions) can be a solution, because we can’t guarantee we’ll be in a state where the next action is applicable π = 〈 move(r1,l1,l2), move(r1,l2,l3), move(r1,l3,l4) 〉 Dana Nau: Lecture slides for Automated Planning 5 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Policies wait 2 wait move(r1,l2,l1) π 1 = { (s1, move(r1,l1,l2)), (s2, move(r1,l2,l3)), (s3, move(r1,l3,l4)), (s4, wait), wait (s5, wait) } Goal wait Start ● Policy : a function that maps states into actions ◆ Write it as a set of state-action pairs Dana Nau: Lecture slides for Automated Planning 6 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Initial States wait ● For every state s , there 2 will be a probability wait P ( s ) that the system move(r1,l2,l1) starts in s ● The book assumes there’s a unique state wait s 0 such that the system always starts in s 0 Goal wait Start ● In the example, s 0 = s 1 ◆ P ( s 1 ) = 1 ◆ P ( s ) = 0 for all s ≠ s 1 Dana Nau: Lecture slides for Automated Planning 7 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Histories ● History : a sequence wait of system states 2 h = 〈 s 0 , s 1 , s 2 , s 3 , s 4 , … 〉 wait h 0 = 〈 s1, s3, s1, s3, s1, … 〉 move(r1,l2,l1) h 1 = 〈 s1, s2, s3, s4, s4, … 〉 h 2 = 〈 s1, s2, s5, s5, s5, … 〉 h 3 = 〈 s1, s2, s5, s4, s4, … 〉 wait h 4 = 〈 s1, s4, s4, s4, s4, … 〉 h 5 = 〈 s1, s1, s4, s4, s4, … 〉 Goal wait Start h 6 = 〈 s1, s1, s1, s4, s4, … 〉 h 7 = 〈 s1, s1, s1, s1, s1, … 〉 ● Each policy induces a probability distribution over histories ◆ If h = 〈 s 0 , s 1 , … 〉 then P ( h | π ) = P ( s 0 ) ∏ i ≥ 0 P π ( S i ) ( s i+1 | s i ) The book omits this because it assumes a unique starting state Dana Nau: Lecture slides for Automated Planning 8 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Example wait 2 π 1 = { (s1, move(r1,l1,l2)), wait (s2, move(r1,l2,l3)), move(r1,l2,l1) (s3, move(r1,l3,l4)), (s4, wait), (s5, wait) } wait Goal wait Start h 1 = 〈 s1, s2, s3, s4, s4, … 〉 P ( h 1 | π 1 ) = 1 × 1 × .8 × 1 × … = 0.8 goal h 2 = 〈 s1, s2, s5, s5 … 〉 P ( h 2 | π 1 ) = 1 × 1 × .2 × 1 × … = 0.2 P ( h | π 1 ) = 0 for all other h so π 1 reaches the goal with probability 0.8 Dana Nau: Lecture slides for Automated Planning 9 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Example wait 2 π 2 = { (s1, move(r1,l1,l2)), wait wait (s2, move(r1,l2,l3)), move(r1,l2,l1) (s3, move(r1,l3,l4)), (s4, wait), (s5, move(r1,l5,l4)) } wait Goal wait Start h 1 = 〈 s1, s2, s3, s4, s4, … 〉 P ( h 1 | π 2 ) = 1 × 0.8 × 1 × 1 × … = 0.8 h 3 = 〈 s1, s2, s5, s4, s4, … 〉 P ( h 3 | π 2 ) = 1 × 0.2 × 1 × 1 × … = 0.2 P ( h | π 1 ) = 0 for all other h goal so π 2 reaches the goal with probability 1 Dana Nau: Lecture slides for Automated Planning 10 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Example wait π 3 = { (s1, move(r1,l1,l4)), 2 (s2, move(r1,l2,l1)), wait (s3, move(r1,l3,l4)), move(r1,l2,l1) (s4, wait), (s5, move(r1,l5,l4) } π 3 reaches the goal with wait probability 1.0 Goal wait Start goal h 4 = 〈 s1, s4, s4, s4, … 〉 P ( h 4 | π 3 ) = 0.5 × 1 × 1 × 1 × 1 × … = 0.5 h 5 = 〈 s1, s1, s4, s4, s4, … 〉 P ( h 5 | π 3 ) = 0.5 × 0.5 × 1 × 1 × 1 × … = 0.25 h 6 = 〈 s1, s1, s1, s4, s4, … 〉 P ( h 6 | π 3 ) = 0.5 × 0.5 × 0.5 × 1 × 1 × … = 0.125 • • • h 7 = 〈 s1, s1, s1, s1, s1, s1, … 〉 P ( h 7 | π 3 ) = 0.5 × 0.5 × 0.5 × 0.5 × 0.5 × … = 0 Dana Nau: Lecture slides for Automated Planning 11 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

r = –100 Utility wait wait ● Numeric cost C ( s,a ) for each state s and action a ● Numeric reward R ( s ) for each state s ● No explicit goals any more ◆ Desirable states have wait high rewards wait Start ● Example: ◆ C ( s ,wait ) = 0 at every state except s3 ◆ C ( s,a ) = 1 for each “ horizontal ” action ◆ C ( s,a ) = 100 for each “ vertical ” action ◆ R as shown ● Utility of a history: ◆ If h = 〈 s 0 , s 1 , … 〉 , then V ( h | π ) = ∑ i ≥ 0 [ R ( s i ) – C ( s i , π ( s i ))] Dana Nau: Lecture slides for Automated Planning 12 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

r = –100 Example wait wait π 1 = { (s1, move(r1,l1,l2)), (s2, move(r1,l2,l3)), (s3, move(r1,l3,l4)), (s4, wait), (s5, wait) } wait wait Start h 1 = 〈 s1, s2, s3, s4, s4, … 〉 h 2 = 〈 s1, s2, s5, s5 … 〉 V ( h 1 | π 1 ) = [ R ( s1 ) – C( s1, π 1 ( s1 ))] + [ R ( s2 ) – C( s2, π 1 ( s2 ))] + [ R ( s3 ) – C( s3, π 1 ( s3 ))] + [ R ( s4 ) – C( s4, π 1 ( s4 ))] + [ R ( s4 ) – C( s4, π 1 ( s4 ))] + … = [0 – 100] + [0 – 1] + [0 – 100] + [100 – 0] + [100 – 0] + … = ∞ V ( h 2 | π 1 ) = [0 – 100] + [0 – 1] + [–100 – 0] + [–100 – 0] + [–100 – 0] + … = – ∞ Dana Nau: Lecture slides for Automated Planning 13 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

r = –100 Discounted Utility wait wait ● We often need to use a discount factor , γ γ = 0.9 ◆ 0 ≤ γ ≤ 1 ● Discounted utility wait of a history: wait Start V ( h | π ) = ∑ i ≥ 0 γ i [ R ( s i ) – C ( s i , π ( s i ))] ◆ Distant rewards/costs have less influence ◆ Convergence is guaranteed if 0 ≤ γ < 1 ● Expected utility of a policy: ◆ E ( π ) = ∑ h P ( h | π ) V ( h | π ) Dana Nau: Lecture slides for Automated Planning 14 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

r = –100 Example wait wait π 1 = { (s1, move(r1,l1,l2)), (s2, move(r1,l2,l3)), (s3, move(r1,l3,l4)), (s4, wait), (s5, wait) } γ = 0.9 wait wait h 1 = 〈 s1, s2, s3, s4, s4, … 〉 Start h 2 = 〈 s1, s2, s5, s5 … 〉 V ( h 1 | π 1 ) = .9 0 [0 – 100] + .9 1 [0 – 1] + .9 2 [0 – 100] + .9 3 [100 – 0] + .9 4 [100 – 0] + … = 547.9 V ( h 2 | π 1 ) = .9 0 [0 – 100] + .9 1 [0 – 1] + .9 2 [–100 – 0] + .9 3 [–100 – 0] + … = –910.1 E ( π 1 ) = 0.8 V ( h 1 | π 1 ) + 0.2 V ( h 2 | π 1 ) = 0.8(547.9) + 0.2(–910.1) = 256.3 Dana Nau: Lecture slides for Automated Planning 15 Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License: http://creativecommons.org/licenses/by-nc-sa/2.0/

Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau - PowerPoint PPT Presentation

Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 Dana Nau: Lecture slides for Automated Planning 1 Licensed

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? CPTs?

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Markov Systems, Markov Decision Processes, and Dynamic Programming Andrew W. Moore Note to

Introduction to Partially Observable Markov Decision Processes CS 886 Sequential Decision Making

The simplex method is strongly polynomial for deterministic Markov decision processes Ian Post

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

Markov Decision Processes Philipp Koehn 7 April 2020 Philipp Koehn Artificial Intelligence:

Software Verification : Introduction Ranjit Jhala, UC San Diego April 4, 2013 What is

Verified Decision Procedures for Equivalence of Regular Expressions Tobias Nipkow & Dmitriy

Seminar Decision Procedures and Applications Instructor: Viorica Sofronie-Stokkermans Universit

Data structures with arithmetic constraints: non-disjoint combinations E. Nicolini, C.

Combining Evidence Module Introduction CS6200: Information Retrieval Evidence of Relevance So

Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Logistic Regression (slides borrowed from Tom Mitchell, Barnabs Pczos & Aarti Singh 1

Sambuz

Useful Links

Newsletter

Mail Us

Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau - PowerPoint PPT Presentation

Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 Dana Nau: Lecture slides for Automated Planning 1 Licensed

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? CPTs?

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Markov Systems, Markov Decision Processes, and Dynamic Programming Andrew W. Moore Note to

Introduction to Partially Observable Markov Decision Processes CS 886 Sequential Decision Making

The simplex method is strongly polynomial for deterministic Markov decision processes Ian Post

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

Markov Decision Processes Philipp Koehn 7 April 2020 Philipp Koehn Artificial Intelligence:

Software Verification : Introduction Ranjit Jhala, UC San Diego April 4, 2013 What is

Verified Decision Procedures for Equivalence of Regular Expressions Tobias Nipkow &amp; Dmitriy

Seminar Decision Procedures and Applications Instructor: Viorica Sofronie-Stokkermans Universit

Data structures with arithmetic constraints: non-disjoint combinations E. Nicolini, C.

Combining Evidence Module Introduction CS6200: Information Retrieval Evidence of Relevance So

Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Logistic Regression (slides borrowed from Tom Mitchell, Barnabs Pczos &amp; Aarti Singh 1

Sambuz

Useful Links

Newsletter

Mail Us

Verified Decision Procedures for Equivalence of Regular Expressions Tobias Nipkow & Dmitriy

Logistic Regression (slides borrowed from Tom Mitchell, Barnabs Pczos & Aarti Singh 1