Planning and Optimization G1. Factored MDPs Malte Helmert and - PowerPoint PPT Presentation

Planning and Optimization G1. Factored MDPs Malte Helmert and Thomas Keller Universit¨ at Basel December 4, 2019

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Content of this Course Foundations Logic Classical Heuristics Constraints Planning Explicit MDPs Probabilistic Factored MDPs

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Content of this Course: Factored MDPs Foundations Heuristic Factored MDPs Search Monte-Carlo Methods

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Factored MDPs

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Factored MDPs We would like to specify MDPs and SSPs with large state spaces. In classical planning, we introduced planning tasks to represent large transition systems compactly. represent aspects of the world in terms of state variables states are a valuation of state variables n state variables induce 2 n states � exponentially more compact than “explicit” representation

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Finite-Domain State Variables Definition (Finite-Domain State Variable) A finite-domain state variable is a symbol v with an associated domain dom( v ), which is a finite non-empty set of values. Let V be a finite set of finite-domain state variables. A state s over V is an assignment s : V → � v ∈ V dom( v ) such that s ( v ) ∈ dom( v ) for all v ∈ V . A formula over V is a propositional logic formula whose atomic propositions are of the form v = d where v ∈ V and d ∈ dom( v ). For simplicity, we only consider finite-domain state variables here.

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Syntax of Operators Definition (SSP and MDP Operators) An SSP operator o over state variables V is an MDP operator with three properties: a precondition pre ( o ), a logical formula over V an effect eff ( o ) over V , defined on the following slides a cost cost ( o ) ∈ R + 0 An MDP operator o over state variables V is an object with three properties: a precondition pre ( o ), a logical formula over V an effect eff ( o ) over V , defined on the following slides a reward reward ( o ) over V , defined on the following slides Whenever we just say operator (without SSP or MDP), both kinds of operators are allowed.

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Syntax of Effects Definition (Effect) Effects over state variables V are inductively defined as follows: If v ∈ V is a finite-domain state variable and d ∈ dom( v ), then v := d is an effect (atomic effect). If e 1 , . . . , e n are effects, then ( e 1 ∧ · · · ∧ e n ) is an effect (conjunctive effect). The special case with n = 0 is the empty effect ⊤ . If e 1 , . . . , e n are effects and p 1 , . . . , p n ∈ [0 , 1] such that � n i =1 p i = 1, then ( p 1 : e 1 | . . . | p n : e n ) is an effect (probabilistic effect). Note: To simplify definitions, conditional effects are omitted.

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Effects: Intuition Intuition for effects: Atomic effects can be understood as assignments that update the value of a state variable. A conjunctive effect e = ( e 1 ∧ · · · ∧ e n ) means that all subeffects e 1 , . . . , e n take place simultaneously. A probabilistic effect e = ( p 1 : e 1 | . . . | p n : e n ) means that exactly one subeffect e i ∈ { e 1 , . . . , e n } takes place with probability p i .

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Semantics of Effects Definition The effect set [ e ] of an effect e is a set of pairs � p , w � , where p is a probability 0 < p ≤ 1 and w is a partial assignment. The effect set [ e ] is the set obtained recursively as [ v := d ] = {� 1 . 0 , { v �→ d }�} , � � [ e ∧ e ′ ] = {� p · p ′ , w ∪ w ′ �} , � p , w �∈ [ e ] � p ′ , w ′ �∈ [ e ′ ] n � [ p 1 : e 1 | . . . | p n : e n ] = {� p i · p , w � | � p , w � ∈ [ e i ] } . i =1 where � is like � but merges � p , w ′ � and � p ′ , w ′ � to � p + p ′ , w ′ � .

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Semantics of Operators Definition (Applicable, Outcomes) Let V be a set of finite-domain state variables. Let s be a state over V , and let o be an operator over V . Operator o is applicable in s if s | = pre ( o ). The outcomes of applying an operator o in s , written s � o � , are � {� p , s ′ s � o � = w �} , � p , w �∈ [ eff ( o )] with s ′ w ( v ) = d if v = d ∈ w and s ′ w ( v ) = s ( v ) otherwise and � is like � but merges � p , s ′ � and � p ′ , s ′ � to � p + p ′ , s ′ � .

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Rewards Definition (Reward) A reward over state variables V is inductively defined as follows: c ∈ R is a reward If χ is a propositional formula over V , [ χ ] is a reward If r and r ′ are rewards, r + r ′ , r − r ′ , r · r ′ and r r ′ are rewards Applying an MDP operator o in s induces reward reward ( o )( s ), i.e., the value of the arithmetic function reward ( o ) where all occurrences of v ∈ V are replaced with s ( v ).

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Probabilistic Planning Tasks

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Probabilistic Planning Tasks Definition (SSP and MDP Planning Task) An SSP planning task is a 4-tuple Π = � V , I , O , γ � where V is a finite set of finite-domain state variables, I is a valuation over V called the initial state, O is a finite set of SSP operators over V , and γ is a formula over V called the goal. An MDP planning task is a 4-tuple Π = � V , I , O , d � where V is a finite set of finite-domain state variables, I is a valuation over V called the initial state, O is a finite set of MDP operators over V , and d ∈ (0 , 1) is the discount factor. A probabilistic planning task is an SSP or MDP planning task.

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Mapping SSP Planning Tasks to SSPs Definition (SSP Induced by an SSP Planning Task) The SSP planning task Π = � V , I , O , γ � induces the SSP T = � S , L , c , T , s 0 , S ⋆ � , where S is the set of all states over V , L is the set of operators O , c ( o ) = cost ( o ) for all o ∈ O , � if o applicable in s and � p , s ′ � ∈ s � o � p T ( s , o , s ′ ) = 0 otherwise s 0 = I , and S ⋆ = { s ∈ S | s | = γ } .

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Mapping MDP Planning Tasks to MDPs Definition (MDP Induced by an MDP Planning Task) The MDP planning task Π = � V , I , O , γ � induces the MDP T = � S , L , R , T , s 0 , γ � , where S is the set of all states over V , L is the set of operators O , R ( s , o ) = reward ( o )( s ) for all o ∈ O and s ∈ S , � if o applicable in s and � p , s ′ � ∈ s � o � p T ( s , o , s ′ ) = 0 otherwise s 0 = I , and γ = d .

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Complexity

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Complexity of Probabilistic Planning Definition (Policy Existence) Policy existence ( PolicyEx ) is the following decision problem: Given: SSP planning task Π Is there a proper policy for Π? Question:

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Membership in EXP Theorem PolicyEx ∈ EXP Proof. The number of states in an SSP planning task is exponential in the number of variables. The induced SSP can be solved in time polynomial in | S | · | L | via linear programming and hence in time exponential in the input size.

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary EXP-completeness of Probabilistic Planning Theorem PolicyEx is EXP -complete. Proof Sketch. Membership for PolicyEx : see previous slide. Hardness is shown by Littman (1997) by reducing the EXP-complete game G 4 to PolicyEx .

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Estimated Policy Evaluation

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Large SSPs and MDPs Before: optimal policies and exact state-values for small SSPs and MDPs. Now: focus on large SSPs and MDPs Further algorithms not necessarily optimal (may generate suboptimal policies)

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Interleaved Planning & Execution Number of states of executable policy usually exponential in number of state variables For large SSPs and MDPs, executable policy cannot be provided explicitly. Solution: (possibly approximate) compact representation of executable policy required to describe solution ⇒ not part of this lecture. Alternative solution: interleave planning and execution

Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Interleaved Planning & Execution for SSPs Plan-execute-monitor cycle for SSP T : plan action a for the current state s execute a observe new current state s ′ set s := s ′ repeat until s ∈ S ⋆

Planning and Optimization G1. Factored MDPs Malte Helmert and - PowerPoint PPT Presentation

Planning and Optimization G1. Factored MDPs Malte Helmert and Thomas Keller Universit at Basel December 4, 2019 Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Content of this Course Foundations Logic

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Successive Shortest Path (SSP) Algorithm with Multipliers Birgit Engels ZAIK University of

The Recycling Rate of Atmospheric Moisture Over the Past Two Decades (1988-2008) Liming Li,

IIT Bombay Course Code : EE 611 Department: Electrical Engineering Instructor Name: Jayanta

TLS/SSL TLS/SSL aims to provide a secure channel Username and Server Bob Client Alice password

The Advance Payment Model Application Process January 5, 2012 Agenda Todays Open Door Forum

allman-dkim-ssp-02 Jim Fenton <fenton@cisco.com> IETF 68 Prague March 21, 2007 1

Developing an International Sponsored Student Program (SSP) 2 0 15 N A F S A A n n u a l Co n

Knapsack problems in hyperbolic groups Andrey Nikolaev (Stevens Institute) GAGTA, May 2013

Planning and Optimization G1. Factored MDPs Malte Helmert and - PowerPoint PPT Presentation

Planning and Optimization G1. Factored MDPs Malte Helmert and Thomas Keller Universit at Basel December 4, 2019 Factored MDPs Planning Tasks Complexity Estimated Policy Evaluation Summary Content of this Course Foundations Logic

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&amp;N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Successive Shortest Path (SSP) Algorithm with Multipliers Birgit Engels ZAIK University of

The Recycling Rate of Atmospheric Moisture Over the Past Two Decades (1988-2008) Liming Li,

IIT Bombay Course Code : EE 611 Department: Electrical Engineering Instructor Name: Jayanta

TLS/SSL TLS/SSL aims to provide a secure channel Username and Server Bob Client Alice password

The Advance Payment Model Application Process January 5, 2012 Agenda Todays Open Door Forum

allman-dkim-ssp-02 Jim Fenton &lt;fenton@cisco.com&gt; IETF 68 Prague March 21, 2007 1

Developing an International Sponsored Student Program (SSP) 2 0 15 N A F S A A n n u a l Co n

Knapsack problems in hyperbolic groups Andrey Nikolaev (Stevens Institute) GAGTA, May 2013

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

allman-dkim-ssp-02 Jim Fenton <fenton@cisco.com> IETF 68 Prague March 21, 2007 1