Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 - PowerPoint PPT Presentation

Planning and Optimization December 4, 2019 — G1. Factored MDPs G1.1 Factored MDPs Planning and Optimization G1. Factored MDPs G1.2 Probabilistic Planning Tasks Malte Helmert and Thomas Keller G1.3 Complexity Universit¨ at Basel G1.4 Estimated Policy Evaluation December 4, 2019 G1.5 Summary M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 1 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 2 / 34 Content of this Course Content of this Course: Factored MDPs Foundations Logic Foundations Classical Heuristics Heuristic Factored MDPs Search Constraints Planning Monte-Carlo Methods Explicit MDPs Probabilistic Factored MDPs M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 3 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 4 / 34

G1. Factored MDPs Factored MDPs G1. Factored MDPs Factored MDPs Factored MDPs We would like to specify MDPs and SSPs with large state spaces. In classical planning, we introduced planning tasks to represent G1.1 Factored MDPs large transition systems compactly. ◮ represent aspects of the world in terms of state variables ◮ states are a valuation of state variables ◮ n state variables induce 2 n states � exponentially more compact than “explicit” representation M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 5 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 6 / 34 G1. Factored MDPs Factored MDPs G1. Factored MDPs Factored MDPs Finite-Domain State Variables Syntax of Operators Definition (SSP and MDP Operators) An SSP operator o over state variables V is an MDP operator Definition (Finite-Domain State Variable) with three properties: A finite-domain state variable is a symbol v with an associated ◮ a precondition pre ( o ), a logical formula over V domain dom( v ), which is a finite non-empty set of values. ◮ an effect eff ( o ) over V , defined on the following slides Let V be a finite set of finite-domain state variables. ◮ a cost cost ( o ) ∈ R + 0 A state s over V is an assignment s : V → � v ∈ V dom( v ) An MDP operator o over state variables V is an object such that s ( v ) ∈ dom( v ) for all v ∈ V . with three properties: A formula over V is a propositional logic formula whose atomic ◮ a precondition pre ( o ), a logical formula over V propositions are of the form v = d where v ∈ V and d ∈ dom( v ). ◮ an effect eff ( o ) over V , defined on the following slides ◮ a reward reward ( o ) over V , defined on the following slides For simplicity, we only consider finite-domain state variables here. Whenever we just say operator (without SSP or MDP), both kinds of operators are allowed. M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 7 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 8 / 34

G1. Factored MDPs Factored MDPs G1. Factored MDPs Factored MDPs Syntax of Effects Effects: Intuition Definition (Effect) Effects over state variables V are inductively defined as follows: Intuition for effects: ◮ If v ∈ V is a finite-domain state variable and d ∈ dom( v ), ◮ Atomic effects can be understood as assignments then v := d is an effect (atomic effect). that update the value of a state variable. ◮ If e 1 , . . . , e n are effects, then ( e 1 ∧ · · · ∧ e n ) is an effect ◮ A conjunctive effect e = ( e 1 ∧ · · · ∧ e n ) means that (conjunctive effect). all subeffects e 1 , . . . , e n take place simultaneously. The special case with n = 0 is the empty effect ⊤ . ◮ A probabilistic effect e = ( p 1 : e 1 | . . . | p n : e n ) means that ◮ If e 1 , . . . , e n are effects and p 1 , . . . , p n ∈ [0 , 1] such that exactly one subeffect e i ∈ { e 1 , . . . , e n } takes place with � n i =1 p i = 1, then ( p 1 : e 1 | . . . | p n : e n ) is an effect probability p i . (probabilistic effect). Note: To simplify definitions, conditional effects are omitted. M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 9 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 10 / 34 G1. Factored MDPs Factored MDPs G1. Factored MDPs Factored MDPs Semantics of Effects Semantics of Operators Definition Definition (Applicable, Outcomes) The effect set [ e ] of an effect e is a set of pairs � p , w � , where p is Let V be a set of finite-domain state variables. a probability 0 < p ≤ 1 and w is a partial assignment. The effect Let s be a state over V , and let o be an operator over V . set [ e ] is the set obtained recursively as Operator o is applicable in s if s | = pre ( o ). [ v := d ] = {� 1 . 0 , { v �→ d }�} , The outcomes of applying an operator o in s , written s � o � , are [ e ∧ e ′ ] = � � {� p · p ′ , w ∪ w ′ �} , � {� p , s ′ s � o � = w �} , � p , w �∈ [ e ] � p ′ , w ′ �∈ [ e ′ ] � p , w �∈ [ eff ( o )] n � [ p 1 : e 1 | . . . | p n : e n ] = {� p i · p , w � | � p , w � ∈ [ e i ] } . with s ′ w ( v ) = d if v = d ∈ w and s ′ w ( v ) = s ( v ) otherwise and � is like � but merges � p , s ′ � and � p ′ , s ′ � to � p + p ′ , s ′ � . i =1 where � is like � but merges � p , w ′ � and � p ′ , w ′ � to � p + p ′ , w ′ � . M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 11 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 12 / 34

G1. Factored MDPs Factored MDPs G1. Factored MDPs Probabilistic Planning Tasks Rewards Definition (Reward) A reward over state variables V is inductively defined as follows: G1.2 Probabilistic Planning Tasks ◮ c ∈ R is a reward ◮ If χ is a propositional formula over V , [ χ ] is a reward ◮ If r and r ′ are rewards, r + r ′ , r − r ′ , r · r ′ and r r ′ are rewards Applying an MDP operator o in s induces reward reward ( o )( s ), i.e., the value of the arithmetic function reward ( o ) where all occurrences of v ∈ V are replaced with s ( v ). M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 13 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 14 / 34 G1. Factored MDPs Probabilistic Planning Tasks G1. Factored MDPs Probabilistic Planning Tasks Probabilistic Planning Tasks Mapping SSP Planning Tasks to SSPs Definition (SSP and MDP Planning Task) Definition (SSP Induced by an SSP Planning Task) An SSP planning task is a 4-tuple Π = � V , I , O , γ � where The SSP planning task Π = � V , I , O , γ � induces ◮ V is a finite set of finite-domain state variables, the SSP T = � S , L , c , T , s 0 , S ⋆ � , where ◮ I is a valuation over V called the initial state, ◮ S is the set of all states over V , ◮ O is a finite set of SSP operators over V , and ◮ L is the set of operators O , ◮ γ is a formula over V called the goal. ◮ c ( o ) = cost ( o ) for all o ∈ O , An MDP planning task is a 4-tuple Π = � V , I , O , d � where � if o applicable in s and � p , s ′ � ∈ s � o � p ◮ V is a finite set of finite-domain state variables, ◮ T ( s , o , s ′ ) = 0 otherwise ◮ I is a valuation over V called the initial state, ◮ s 0 = I , and ◮ O is a finite set of MDP operators over V , and ◮ S ⋆ = { s ∈ S | s | = γ } . ◮ d ∈ (0 , 1) is the discount factor. A probabilistic planning task is an SSP or MDP planning task. M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 15 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 16 / 34

G1. Factored MDPs Probabilistic Planning Tasks G1. Factored MDPs Complexity Mapping MDP Planning Tasks to MDPs Definition (MDP Induced by an MDP Planning Task) The MDP planning task Π = � V , I , O , γ � induces the MDP T = � S , L , R , T , s 0 , γ � , where G1.3 Complexity ◮ S is the set of all states over V , ◮ L is the set of operators O , ◮ R ( s , o ) = reward ( o )( s ) for all o ∈ O and s ∈ S , � if o applicable in s and � p , s ′ � ∈ s � o � p ◮ T ( s , o , s ′ ) = 0 otherwise ◮ s 0 = I , and ◮ γ = d . M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 17 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 18 / 34 G1. Factored MDPs Complexity G1. Factored MDPs Complexity Complexity of Probabilistic Planning Membership in EXP Theorem PolicyEx ∈ EXP Definition (Policy Existence) Policy existence ( PolicyEx ) is the following decision problem: Proof. Given: SSP planning task Π The number of states in an SSP planning task is exponential in the Is there a proper policy for Π? Question: number of variables. The induced SSP can be solved in time polynomial in | S | · | L | via linear programming and hence in time exponential in the input size. M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 19 / 34 M. Helmert, T. Keller (Universit¨ at Basel) Planning and Optimization December 4, 2019 20 / 34

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 - PowerPoint PPT Presentation

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and Optimization G1. Factored MDPs G1.2 Probabilistic Planning Tasks Malte Helmert and Thomas Keller G1.3 Complexity Universit at Basel G1.4

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Optimization Optimization Goal: Find the minimizer ! that minimizes the objective (cost)

Next Generation ACO Model Benefit Enhancements March 28, 2017 Disclaimer The comments made on

Reinforcement Learning Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction Model-based

Markov Decision Processes Robert Platt Northeastern University Some images and slides are used

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Programming the 808: A project-based unit for rhythm pedagogy Jus<n London SMT 2017 Pedagogy

Presented by Bla Otrek Agenda Overview Introduction - work rhythms Awareness

Scaling up sustainably through better processes Mission Vision Strategy Programs Roadmaps

Corenet Hackath on - Wellbein g Australian Chapter - Wellbeing Team Problem Statement Problem

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 - PowerPoint PPT Presentation

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and Optimization G1. Factored MDPs G1.2 Probabilistic Planning Tasks Malte Helmert and Thomas Keller G1.3 Complexity Universit at Basel G1.4

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&amp;N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Optimization Optimization Goal: Find the minimizer ! that minimizes the objective (cost)

Next Generation ACO Model Benefit Enhancements March 28, 2017 Disclaimer The comments made on

Reinforcement Learning Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction Model-based

Markov Decision Processes Robert Platt Northeastern University Some images and slides are used

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Programming the 808: A project-based unit for rhythm pedagogy Jus&lt;n London SMT 2017 Pedagogy

Presented by Bla Otrek Agenda Overview Introduction - work rhythms Awareness

Scaling up sustainably through better processes Mission Vision Strategy Programs Roadmaps

Corenet Hackath on - Wellbein g Australian Chapter - Wellbeing Team Problem Statement Problem

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Programming the 808: A project-based unit for rhythm pedagogy Jus<n London SMT 2017 Pedagogy