Solving stochastic dynamic programming models without transition - PowerPoint PPT Presentation

Solving stochastic dynamic programming models without transition matrices Paul L. Fackler Department of Agricultural & Applied Economics and Department of Applied Ecology North Carolina State University Computational Sustainability Seminar Nov. 3, 2017 1 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Outline Brief review of dynamic programming curses of dimensionality index vectors DP algorithms Expected Value (EV) functions Staged models Models with deterministic post-action states Factored Models Factored models & conditional independence Evaluation of EV functions Results for two spatial models: dynamic reserve site selection control of an invasive species on a spatial network Models with transition functions and random noise Wrap-up 2 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Dynamic Programming Problems Given state values 𝑇 , action values 𝐵 , reward function 𝑆(𝑇, 𝐵) , state transition probability matrix 𝑄(𝑇 + |𝑇, 𝐵) and discount factor 𝜀 , solve ∞ 𝑊(𝑇) = max 𝜀 𝑢 𝐹 𝑢 [𝑆(𝑇 𝑢 , 𝐵(𝑇 𝑢 ))] ∑ 𝐵(𝑇) 𝑢=0 Equivalently solve Bellman’s equation: 𝑄(𝑇 + |𝑇, 𝐵(𝑇))𝑊(𝑇 + ) 𝑊(𝑇) = max 𝑆(𝑇, 𝐵(𝑇)) + 𝜀 ∑ 𝑇 + 𝐵(𝑇) Find the strategy 𝐵(𝑇) that maximizes: the current reward R plus the discount factor 𝜀 times 𝑄(𝑇 + |𝑇, 𝐵)𝑊(𝑇 + ) the expected future value ∑ 𝑇 + 3 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Curses of dimensionality Problem size grows exponentially with increases in the number of variables Powell discusses 3 curses: growth in the state space growth in the action space growth in the outcome space In discrete models we represent the size of the state space as 𝑜 𝑡 the size of the state/action space as 𝑜 𝑦 The state transition probability matrix is 𝑜 𝑡 × 𝑜 𝑦 Focus here on problems for which vectors of size 𝑜 𝑦 can be stored and manipulated but matrices of size 𝑜 𝑡 × 𝑜 𝑦 are problematic Thus the focus in on moderately sized problems By having techniques to solve moderately sized problems we can gain insight into the quality of heuristic or approximate methods that must be used for large problems 4 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Index Vectors Vectors composed of positive integers Used for: extraction expansion shuffling Let: 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 2 0 0 2 0 2 0 1 𝐵 = 𝐶 = 2 1 2 1 0 3 0 2 1 1 [ 1] 3 3 0 0 3 0 1 3 1 0 [ 1] 3 1 𝐽 = [5 8] extracts the rows of 𝐶 with the first column equal to 2: 𝐶(𝐽, 1) = 2 6 7 𝐽 = [1 6] expands 𝐵 so 𝐵(𝐽, : ) = 𝐶(: , [1 2]) 1 2 2 3 3 4 4 5 5 6 𝐽 = [1 6] expands 𝐵 so 𝐵(𝐽, : ) = 𝐶(: , [1 3]) 2 1 2 3 4 3 4 5 6 5 5 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Dynamic Programming with Index Vectors Consider a DP model with 2 state variables each binary and 3 possible actions 𝑇 lists all possible states and matrix 𝑌 lists all possible state/action combinations: 1 0 0 1 0 1 1 1 0 1 1 1 0 0 2 0 0 0 1 2 0 1 𝑇 = [ ] 𝑌 = 2 1 0 1 0 1 1 2 1 1 3 0 0 3 0 1 3 1 0 [ 1] 3 1 Column 1 of 𝑌 is the action and columns 2 and 3 are the 2 states The expansion index vector that gives the states in each row of 𝑌 is 𝐽 𝑦 = [1 4] 2 3 4 1 2 3 4 1 2 3 This expands 𝑇 so 𝑇(𝐽 𝑦 , : ) = 𝑌(: , [2 3]) 6 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Strategies as Index Vectors A strategy can be specified as an extraction index vector with the 𝑗 th element associated with state 𝑗 : 𝐽 𝑏 = [1 6 7 12 ] yields: 1 0 0 2 0 1 𝑌(𝐽 𝑏 , : ) = [ ] 2 1 0 3 1 1 i.e., a strategy that associates action 1 with state 1, action 2 with states 2 and 3 and action 3 with state 4 Strategy vectors select a single row of 𝑌 for each state so 𝑌(𝐽 𝑏 , 𝐾 𝑡 ) = 𝑇 where 𝐾 𝑡 is an index of the columns of 𝑌 associated with the state variables 7 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Dynamic Programming Algorithms Typically solved with function iteration or policy iteration Both use a maximization step that, for a given value function vector 𝑊 , solves: ̃ 𝑗 = 𝑘: 𝐽 𝑦 (𝑘)=𝑗 [𝑆 + 𝜀𝑄 ⊤ 𝑊] 𝑘 𝑊 max with the associated strategy vector 𝐽 𝑏 : 𝑏 = argmax [𝑆 + 𝜀𝑄 ⊤ 𝑊] 𝑘 𝐽 𝑗 𝑘: 𝐽 𝑦 (𝑘)=𝑗 This is followed by a value function update step Function iteration updates 𝑊 using: ̃ 𝑊 ← 𝑊 Policy iteration updates 𝑊 by solving: 𝑋𝑊 = (𝐽 − 𝜀𝑄[: , 𝐽 𝑏 ] ⊤ )𝑊 = 𝑆[: , 𝐽 𝑏 ] When the discount factor 𝜀 < 1 the matrix 𝑋 = 𝐽 − 𝜀𝑄[: , 𝐽 𝑏 ] ⊤ is row-wise diagonally dominant 8 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Dynamic Programming with Expected Value (EV) functions An EV function 𝑤 transforms the future state vector into its expectation conditional on current states and actions ( 𝑌 ): 𝑤(𝑊 + ) = 𝐹[𝑊 + |𝑌] An indexed evaluation transforms the future state vector into its expectation condition on the states and actions indexed by 𝐽 𝑏 𝑤(𝑊 + ,𝐽 𝑏 ) = 𝐹[𝑊 + |𝑌[𝐽 𝑏 , : ]] The maximization step uses a full EV evaluation: 𝑘: 𝐽 𝑦 (𝑘)=𝑗 𝑆 𝑘 + 𝜀[𝑤(𝑊)] 𝑘 max Value function updates use an indexed evaluation Function iteration: 𝑊 ← 𝑆[𝐽 𝑏 ] + 𝜀𝑤(𝑊, 𝐽 𝑏 ) Policy iteration (solve for 𝑊 ): ℎ(𝑊) = 𝑊 − 𝜀𝑤(𝑊, 𝐽 𝑏 ) = 𝑆[𝐽 𝑏 ] Note that policy iteration with EV functions cannot be solved using direct methods (e.g., LU decomposition) but can be solved efficiently using iterative Krylov methods 9 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Advantages to using EV functions The EV function 𝑤 can often be evaluated far faster and use far less memory than using the transition matrix 𝑄 There are at least 3 situations in which EV functions are advantageous: Sparse staged transition matrices Deterministic actions Factored models with conditional independence When the state transition occurs in 2 stages the transition matrix can be written as 𝑄 = 𝑄 2 𝑄 1 where 𝑄 1 and 𝑄 2 are both sparse but their product is not A deterministic action transforms the current state into a post-decision state ̃𝐵 where 𝐵 has a single 1 in each column The transition matrix can be written as 𝑄 = 𝑄 In factored models individual state variables have their own transition matrices that are conditioned on a subset of the current states and actions 10 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

SPOMs with staged transitions Stochastic Patch Occupancy Models (SPOMs): 𝑂 sites w/ each site either empty or occupied (0/1) Individual site transition matrices for each stage are triangular: 𝐹 𝑗 = [1 1 − 𝑓 𝑗 ] 𝐷 𝑗 = [1 − 𝑑 𝑗 𝑓 𝑗 0 1] 0 𝑑 𝑗 2 𝑂 possible state values 𝑄 has 4 𝑂 elements and is dense If the transition is decomposed into extinction and colonization phases: 𝑄 = 𝐹𝐷 or 𝑄 = 𝐷𝐹 𝐹 and 𝐷 are sparse with each have 3 𝑂 non-zero elements in these matrices 11 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Sparsity patterns for extinction and colonization transition matrices For 𝑂 = 10 𝐹 𝐷 12 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Typical computational times for SPOM model Time required to do a basic matrix-vector and matrix-matrix multiply 𝑂 8 9 10 11 12 13 14 𝐹 ⊤ (𝐷 ⊤ 𝑤) 0.026 0.065 0.086 0.136 0.292 1.672 4.870 𝑄𝑤 0.014 0.036 0.084 0.801 4.011 15.298 64.277 𝑄 = 𝐷𝐹 0.008 0.008 0.046 0.154 0.724 3.499 19.332 0.100 0.075 0.056 0.042 0.032 0.024 0.018 density Rows 1 & 2 display the time required for 1000 evaluations using factored form 𝐹 ⊤ (𝐷 ⊤ 𝑤) and full form 𝑄 ⊤ 𝑤 Row 3 shows the setup time required to a form 𝑄 Row 4 shows the fraction of non-zero elements in 𝐹 and 𝐷 These results are even more dramatic if each site can be classified into more than 2 categories. 13 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Solving stochastic dynamic programming models without transition - PowerPoint PPT Presentation

Solving stochastic dynamic programming models without transition matrices Paul L. Fackler Department of Agricultural & Applied Economics and Department of Applied Ecology North Carolina State University Computational Sustainability Seminar

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Dynamic Programming Dynamic Programming Steps. 9 View the problem solution as the result of a

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Dynamic Programming Modan Dynamic programming? Recursion? What is DP? The

Lecture 18: Elements of Dynamic Programming COMS10007 - Algorithms Dr. Christian Konrad

Stochastic Differential Dynamic Logic for Stochastic Hybrid Systems Andr e Platzer Logical

Dynamic programming 1 Dynamic programming also solve a problem by combining the solutions to

Dynamic Programming December 15, 2016 CMPE 250 Dynamic Programming December 15, 2016 1 / 60

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

Stochastic Chemical Reaction Networks Matthew Douglas Johnston University of Waterloo Fall 2010

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Without sustaining injury Without sustaining injury Without sustaining injury Without sustaining

Dynamic Programming Talk 5 by Daniela and Christoph Content Reinforcement Learning Problem

Alloy Analyzer 4 Tutorial Session 4: Dynamic Modeling Greg Dennis and Rob Seater Software

SPEERMINT Working Group Administriva mailing list: speermint@ietf.org subscribe:

Hyperparameter Optimization using Hyperopt Yassine Alouini - Paul Coursaux 03/11/2016 @qucit

The Dynamic Analysis Model Instructor: Dr. Hany H. Ammar Dept. of Computer Science and

CS 331: Artificial Intelligence Intelligent Agents 1 General Properties of AI Systems Sensors

UML Essentials Dynamic Modeling Excerpts from: Object Oriented Software Engineering by

DYNAMIC MARGINAL CONTRIBUTION MECHANISM By Dirk Bergemann and Juuso Vlimki July 2007 COWLES

Sambuz

Useful Links

Newsletter

Mail Us