Planning and Optimization F2. Bellman Equation & Linear - PowerPoint PPT Presentation

Planning and Optimization F2. Bellman Equation & Linear Programming Malte Helmert and Thomas Keller Universit¨ at Basel November 27, 2019

Introduction Bellman Equation Linear Programming Summary Content of this Course Foundations Logic Classical Heuristics Constraints Planning Explicit MDPs Probabilistic Factored MDPs

Introduction Bellman Equation Linear Programming Summary Content of this Course: Explicit MDPs Foundations Linear Programing Explicit MDPs Policy Iteration Value Iteration

Introduction Bellman Equation Linear Programming Summary Introduction

Introduction Bellman Equation Linear Programming Summary Quality of Solutions Solution in classical planning: plan Optimality criterion of a solution in classical planning: minimize plan cost Solution in probabilistic planning: policy What is the optimality criterion of a solution in probabilistic planning?

Introduction Bellman Equation Linear Programming Summary Example: Swiss Lotto Example (Swiss Lotto) What is the expected payoff of placing one bet in Swiss Lotto for a cost of CHF 2 . 50 with (simplified) payouts and probabilities: CHF 30 . 000 . 000 with prob. 1 / 31474716 (6 + 1) CHF 1 . 000 . 000 with prob. 1 / 5245786 (6) CHF 5 . 000 with prob. 1 / 850668 (5) CHF 50 with prob. 1 / 111930 (4) CHF 10 with prob. 1 / 11480 (3) 30000000 31474716 + 1000000 5000 Solution: 5245786 + 850668+ 50 10 111930 + 11480 − 2 . 5 ≈ − 1 . 35 .

Introduction Bellman Equation Linear Programming Summary Expected Values under Uncertainty Definition (Expected Value of a Random Variable) Let X be a random variable with a finite number of outcomes d 1 , . . . , d n ∈ R , and let d i happen with probability p i ∈ [0 , 1] (for i = 1 , . . . n ) s.t. � n i =1 p i = 1. The expected value of X is E [ X ] = � n i =1 ( p i · d i ).

Introduction Bellman Equation Linear Programming Summary Bellman Equation

Introduction Bellman Equation Linear Programming Summary Value Functions for MDPs Definition (Value Functions for MDPs) Let T = � S , L , c , T , s 0 , γ � be an MDP and π be an executable policy for T . The state-value V π ( s ) of s under π is defined as V π ( s ) := Q π ( s , π ( s )) where the action-value Q π ( s , ℓ ) of s and ℓ under π is defined as � T ( s , ℓ, s ′ ) · V π ( s ′ ) . Q π ( s , ℓ ) := R ( s , ℓ ) + γ · s ′ ∈ succ( s ,ℓ ) The state-value V π ( s ) describes the expected reward of applying π in MDP T , starting from s .

Introduction Bellman Equation Linear Programming Summary Bellman Equation in MDPs Definition (Bellman Equation in MDPs) Let T = � S , L , c , T , s 0 , γ � be an MDP. The Bellman equation for a state s of T is the set of equations that describes V ⋆ ( s ), where V ⋆ ( s ) := max ℓ ∈ L ( s ) Q ⋆ ( s , ℓ ) � T ( s , ℓ, s ′ ) · V ⋆ ( s ′ ) . Q ⋆ ( s , ℓ ) := R ( s , ℓ ) + γ · s ′ ∈ succ( s ,ℓ ) The solution V ⋆ ( s ) of the Bellman equation describes the maximal expected reward that can be achieved from state s in MDP T .

Introduction Bellman Equation Linear Programming Summary Optimal Policy in MDPs What is the policy that achieves the maximal expected reward? Definition (Optimal Policy in MDPs) Let T = � S , L , c , T , s 0 , γ � be an MDP. A policy π is an optimal policy if π ( s ) ∈ arg max ℓ ∈ L ( s ) Q ⋆ ( s , ℓ ) for all s ∈ S and the expected reward of π in T is V ⋆ ( s 0 ). W.l.o.g., we assume the optimal policy is unique and written as π ⋆ .

Introduction Bellman Equation Linear Programming Summary Value Functions for SSPs Definition (Value Functions for SSPs) Let T = � S , L , c , T , s 0 , S ⋆ � be an SSP and π be an executable policy for T . The state-value V π ( s ) of s under π is defined as � 0 if s ∈ S ⋆ V π ( s ) := Q π ( s , π ( s )) otherwise, where the action-value Q π ( s , ℓ ) of s and ℓ under π is defined as � Q π ( s , ℓ ) := c ( ℓ ) + T ( s , ℓ, s ′ ) · V π ( s ′ ) . s ′ ∈ succ( s ,ℓ ) The state-value V π ( s ) describes the expected cost of applying π in SSP T , starting from s .

Introduction Bellman Equation Linear Programming Summary Bellman Equation in SSPs Definition (Bellman Equation in SSPs) Let T = � S , L , c , T , s 0 , S ⋆ � be an SSP. The Bellman equation for a state s of T is the set of equations that describes V ⋆ ( s ), where V ⋆ ( s ) := min ℓ ∈ L ( s ) Q ⋆ ( s , ℓ ) � T ( s , ℓ, s ′ ) · V ⋆ ( s ′ ) . Q ⋆ ( s , ℓ ) := c ( ℓ ) + s ′ ∈ succ( s ,ℓ ) The solution V ⋆ ( s ) of the Bellman equation describes the minimal expected cost that can be achieved from state s in SSP T .

Introduction Bellman Equation Linear Programming Summary Optimal Policy in SSPs What is the policy that achieves the minimal expected cost? Definition (Optimal Policy in SSPs) Let T = � S , L , c , T , s 0 , S ⋆ � be an SSP. A policy π is an optimal policy if π ( s ) ∈ arg min ℓ ∈ L ( s ) Q ⋆ ( s , ℓ ) for all s ∈ S and the expected cost of π in T is V ⋆ ( s 0 ). W.l.o.g., we assume the optimal policy is unique and written as π ⋆ .

Introduction Bellman Equation Linear Programming Summary Proper SSP Policy Definition (Proper SSP Policy) Let T = � S , L , c , T , s 0 , S ⋆ � be an SSP and π be an executable policy for T . π is proper if it reaches a goal state from each reachable state with probability 1, i.e. if n � � p i = 1 i =1 p 1: ℓ 1 → s ′ ,..., s ′′ pn : ℓ n − − − − − − → s ⋆ s for all states s ∈ S π ( s ).

Introduction Bellman Equation Linear Programming Summary Linear Programming

Introduction Bellman Equation Linear Programming Summary Content of this Course: Explicit MDPs Foundations Linear Programing Explicit MDPs Policy Iteration Value Iteration

Introduction Bellman Equation Linear Programming Summary Linear Programming for SSPs Bellman equation gives set of equations that describes expected cost for each state there are | S | variables and | S | equations (assuming Q ⋆ is replaced in V ⋆ with corresponding equation) If we solve these equations, we have solved the SSP Problem: how can we deal with the minimization? ⇒ We have solved the “same” problem before with the help of an LP solver

Introduction Bellman Equation Linear Programming Summary Reminder: LP for Shortest Path in State Space Variables Non-negative variable Distance s for each state s Objective Maximize Distance s 0 Subject to Distance s ⋆ = 0 for all goal states s ⋆ ℓ → s ′ Distance s ≤ Distance s ′ + c ( ℓ ) for all transitions s −

Introduction Bellman Equation Linear Programming Summary LP for Expected Cost in SSP Variables Non-negative variable ExpCost s for each state s Objective Maximize ExpCost s 0 Subject to ExpCost s ⋆ = 0 for all goal states s ⋆ � T ( s , ℓ, s ′ ) · ExpCost s ′ ) + c ( ℓ ) ExpCost s ≤ ( s ′ ∈ S for all s ∈ S and ℓ ∈ L ( s )

Introduction Bellman Equation Linear Programming Summary LP for Expected Reward in MDP Variables Non-negative variable ExpReward s for each state s Objective Minimize ExpReward s 0 Subject to � T ( s , ℓ, s ′ )ExpReward s ′ ) + R ( s , ℓ ) ExpReward s ≥ ( γ · s ′ ∈ S for all s ∈ S and ℓ ∈ L ( s )

Introduction Bellman Equation Linear Programming Summary Complexity of Probabilistic Planning optimal solution for MDPs or SSPs can be computed with LP solver requires | S | variables and | S | · | L | constraints we know that LPs can be solved in polynomial time ⇒ solving MDPs or SSPs is a polynomial time problem How does this relate to the complexity result for classical planning? Solving MDPs or SSPs is polynomial in | S | · | L |

Introduction Bellman Equation Linear Programming Summary Summary

Planning and Optimization F2. Bellman Equation & Linear - PowerPoint PPT Presentation

Planning and Optimization F2. Bellman Equation & Linear Programming Malte Helmert and Thomas Keller Universit at Basel November 27, 2019 Introduction Bellman Equation Linear Programming Summary Content of this Course Foundations

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

On computing an optimal semi-matching Frantiek Gal ck joint work with Jn Katreni c

Chapter 4: Problem Statement CK Cheng Dept. of Computer Science and Engineering University of

Analyzing, Comparing and Debugging Schema Mappings Emanuel Sallinger Vienna University of

Table of Contents Tutorial on Fair Division Introduction . . . . . . . . . . . . . . . . . . . .

Optimum Sequential Procedures for Detecting Changes in Processes George V. Moustakides

An Approach to Robustness in Matching Problems under Ordinal Preferences Post-viva presentation

Algorithmic Aspects of Temporal Betweenness Sebastian Bu Hendrik Molter Rolf Niedermeier

DS-optimal designs for RCR models with heteroscedastic errors Mateusz Wilk 1 , 2 , Aleksander

Planning and Optimization F2. Bellman Equation & Linear - PowerPoint PPT Presentation

Planning and Optimization F2. Bellman Equation & Linear Programming Malte Helmert and Thomas Keller Universit at Basel November 27, 2019 Introduction Bellman Equation Linear Programming Summary Content of this Course Foundations

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Classical Planning Systems ICS 271 Fall 2014 Outline: Planning Planning environments

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Planning and Optimization October 16, 2019 C2. Delete Relaxation: Properties of Relaxed

Planning 2.0 BLMs Final Planning Rule http://www.blm.gov/plan2 1 Planning 2.0 Outline

Classical Planning Systems Chapter 10 R&amp;N ICS 271 Fall 2016 Outline: Planning Planning

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

On computing an optimal semi-matching Frantiek Gal ck joint work with Jn Katreni c

Chapter 4: Problem Statement CK Cheng Dept. of Computer Science and Engineering University of

Analyzing, Comparing and Debugging Schema Mappings Emanuel Sallinger Vienna University of

Table of Contents Tutorial on Fair Division Introduction . . . . . . . . . . . . . . . . . . . .

Optimum Sequential Procedures for Detecting Changes in Processes George V. Moustakides

An Approach to Robustness in Matching Problems under Ordinal Preferences Post-viva presentation

Algorithmic Aspects of Temporal Betweenness Sebastian Bu Hendrik Molter Rolf Niedermeier

DS-optimal designs for RCR models with heteroscedastic errors Mateusz Wilk 1 , 2 , Aleksander

Classical Planning Systems Chapter 10 R&N ICS 271 Fall 2016 Outline: Planning Planning