solving stochastic dynamic programming models without
play

Solving stochastic dynamic programming models without transition - PowerPoint PPT Presentation

Solving stochastic dynamic programming models without transition matrices Paul L. Fackler Department of Agricultural & Applied Economics and Department of Applied Ecology North Carolina State University Computational Sustainability Seminar


  1. Solving stochastic dynamic programming models without transition matrices Paul L. Fackler Department of Agricultural & Applied Economics and Department of Applied Ecology North Carolina State University Computational Sustainability Seminar Nov. 3, 2017 1 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  2. Outline Brief review of dynamic programming curses of dimensionality index vectors DP algorithms Expected Value (EV) functions Staged models Models with deterministic post-action states Factored Models Factored models & conditional independence Evaluation of EV functions Results for two spatial models: dynamic reserve site selection control of an invasive species on a spatial network Models with transition functions and random noise Wrap-up 2 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  3. Dynamic Programming Problems Given state values 𝑇 , action values 𝐡 , reward function 𝑆(𝑇, 𝐡) , state transition probability matrix 𝑄(𝑇 + |𝑇, 𝐡) and discount factor πœ€ , solve ∞ π‘Š(𝑇) = max πœ€ 𝑒 𝐹 𝑒 [𝑆(𝑇 𝑒 , 𝐡(𝑇 𝑒 ))] βˆ‘ 𝐡(𝑇) 𝑒=0 Equivalently solve Bellman’s equation: 𝑄(𝑇 + |𝑇, 𝐡(𝑇))π‘Š(𝑇 + ) π‘Š(𝑇) = max 𝑆(𝑇, 𝐡(𝑇)) + πœ€ βˆ‘ 𝑇 + 𝐡(𝑇) Find the strategy 𝐡(𝑇) that maximizes: the current reward R plus the discount factor πœ€ times 𝑄(𝑇 + |𝑇, 𝐡)π‘Š(𝑇 + ) the expected future value βˆ‘ 𝑇 + 3 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  4. Curses of dimensionality Problem size grows exponentially with increases in the number of variables Powell discusses 3 curses: growth in the state space growth in the action space growth in the outcome space In discrete models we represent the size of the state space as π‘œ 𝑑 the size of the state/action space as π‘œ 𝑦 The state transition probability matrix is π‘œ 𝑑 Γ— π‘œ 𝑦 Focus here on problems for which vectors of size π‘œ 𝑦 can be stored and manipulated but matrices of size π‘œ 𝑑 Γ— π‘œ 𝑦 are problematic Thus the focus in on moderately sized problems By having techniques to solve moderately sized problems we can gain insight into the quality of heuristic or approximate methods that must be used for large problems 4 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  5. Index Vectors Vectors composed of positive integers Used for: extraction expansion shuffling Let: 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 2 0 0 2 0 2 0 1 𝐡 = 𝐢 = 2 1 2 1 0 3 0 2 1 1 [ 1] 3 3 0 0 3 0 1 3 1 0 [ 1] 3 1 𝐽 = [5 8] extracts the rows of 𝐢 with the first column equal to 2: 𝐢(𝐽, 1) = 2 6 7 𝐽 = [1 6] expands 𝐡 so 𝐡(𝐽, : ) = 𝐢(: , [1 2]) 1 2 2 3 3 4 4 5 5 6 𝐽 = [1 6] expands 𝐡 so 𝐡(𝐽, : ) = 𝐢(: , [1 3]) 2 1 2 3 4 3 4 5 6 5 5 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  6. Dynamic Programming with Index Vectors Consider a DP model with 2 state variables each binary and 3 possible actions 𝑇 lists all possible states and matrix π‘Œ lists all possible state/action combinations: 1 0 0 1 0 1 1 1 0 1 1 1 0 0 2 0 0 0 1 2 0 1 𝑇 = [ ] π‘Œ = 2 1 0 1 0 1 1 2 1 1 3 0 0 3 0 1 3 1 0 [ 1] 3 1 Column 1 of π‘Œ is the action and columns 2 and 3 are the 2 states The expansion index vector that gives the states in each row of π‘Œ is 𝐽 𝑦 = [1 4] 2 3 4 1 2 3 4 1 2 3 This expands 𝑇 so 𝑇(𝐽 𝑦 , : ) = π‘Œ(: , [2 3]) 6 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  7. Strategies as Index Vectors A strategy can be specified as an extraction index vector with the 𝑗 th element associated with state 𝑗 : 𝐽 𝑏 = [1 6 7 12 ] yields: 1 0 0 2 0 1 π‘Œ(𝐽 𝑏 , : ) = [ ] 2 1 0 3 1 1 i.e., a strategy that associates action 1 with state 1, action 2 with states 2 and 3 and action 3 with state 4 Strategy vectors select a single row of π‘Œ for each state so π‘Œ(𝐽 𝑏 , 𝐾 𝑑 ) = 𝑇 where 𝐾 𝑑 is an index of the columns of π‘Œ associated with the state variables 7 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  8. Dynamic Programming Algorithms Typically solved with function iteration or policy iteration Both use a maximization step that, for a given value function vector π‘Š , solves: Μƒ 𝑗 = π‘˜: 𝐽 𝑦 (π‘˜)=𝑗 [𝑆 + πœ€π‘„ ⊀ π‘Š] π‘˜ π‘Š max with the associated strategy vector 𝐽 𝑏 : 𝑏 = argmax [𝑆 + πœ€π‘„ ⊀ π‘Š] π‘˜ 𝐽 𝑗 π‘˜: 𝐽 𝑦 (π‘˜)=𝑗 This is followed by a value function update step Function iteration updates π‘Š using: Μƒ π‘Š ← π‘Š Policy iteration updates π‘Š by solving: π‘‹π‘Š = (𝐽 βˆ’ πœ€π‘„[: , 𝐽 𝑏 ] ⊀ )π‘Š = 𝑆[: , 𝐽 𝑏 ] When the discount factor πœ€ < 1 the matrix 𝑋 = 𝐽 βˆ’ πœ€π‘„[: , 𝐽 𝑏 ] ⊀ is row-wise diagonally dominant 8 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  9. Dynamic Programming with Expected Value (EV) functions An EV function 𝑀 transforms the future state vector into its expectation conditional on current states and actions ( π‘Œ ): 𝑀(π‘Š + ) = 𝐹[π‘Š + |π‘Œ] An indexed evaluation transforms the future state vector into its expectation condition on the states and actions indexed by 𝐽 𝑏 𝑀(π‘Š + ,𝐽 𝑏 ) = 𝐹[π‘Š + |π‘Œ[𝐽 𝑏 , : ]] The maximization step uses a full EV evaluation: π‘˜: 𝐽 𝑦 (π‘˜)=𝑗 𝑆 π‘˜ + πœ€[𝑀(π‘Š)] π‘˜ max Value function updates use an indexed evaluation Function iteration: π‘Š ← 𝑆[𝐽 𝑏 ] + πœ€π‘€(π‘Š, 𝐽 𝑏 ) Policy iteration (solve for π‘Š ): β„Ž(π‘Š) = π‘Š βˆ’ πœ€π‘€(π‘Š, 𝐽 𝑏 ) = 𝑆[𝐽 𝑏 ] Note that policy iteration with EV functions cannot be solved using direct methods (e.g., LU decomposition) but can be solved efficiently using iterative Krylov methods 9 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  10. Advantages to using EV functions The EV function 𝑀 can often be evaluated far faster and use far less memory than using the transition matrix 𝑄 There are at least 3 situations in which EV functions are advantageous: Sparse staged transition matrices Deterministic actions Factored models with conditional independence When the state transition occurs in 2 stages the transition matrix can be written as 𝑄 = 𝑄 2 𝑄 1 where 𝑄 1 and 𝑄 2 are both sparse but their product is not A deterministic action transforms the current state into a post-decision state ̃𝐡 where 𝐡 has a single 1 in each column The transition matrix can be written as 𝑄 = 𝑄 In factored models individual state variables have their own transition matrices that are conditioned on a subset of the current states and actions 10 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  11. SPOMs with staged transitions Stochastic Patch Occupancy Models (SPOMs): 𝑂 sites w/ each site either empty or occupied (0/1) Individual site transition matrices for each stage are triangular: 𝐹 𝑗 = [1 1 βˆ’ 𝑓 𝑗 ] 𝐷 𝑗 = [1 βˆ’ 𝑑 𝑗 𝑓 𝑗 0 1] 0 𝑑 𝑗 2 𝑂 possible state values 𝑄 has 4 𝑂 elements and is dense If the transition is decomposed into extinction and colonization phases: 𝑄 = 𝐹𝐷 or 𝑄 = 𝐷𝐹 𝐹 and 𝐷 are sparse with each have 3 𝑂 non-zero elements in these matrices 11 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  12. Sparsity patterns for extinction and colonization transition matrices For 𝑂 = 10 𝐹 𝐷 12 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

  13. Typical computational times for SPOM model Time required to do a basic matrix-vector and matrix-matrix multiply 𝑂 8 9 10 11 12 13 14 𝐹 ⊀ (𝐷 ⊀ 𝑀) 0.026 0.065 0.086 0.136 0.292 1.672 4.870 𝑄𝑀 0.014 0.036 0.084 0.801 4.011 15.298 64.277 𝑄 = 𝐷𝐹 0.008 0.008 0.046 0.154 0.724 3.499 19.332 0.100 0.075 0.056 0.042 0.032 0.024 0.018 density Rows 1 & 2 display the time required for 1000 evaluations using factored form 𝐹 ⊀ (𝐷 ⊀ 𝑀) and full form 𝑄 ⊀ 𝑀 Row 3 shows the setup time required to a form 𝑄 Row 4 shows the fraction of non-zero elements in 𝐹 and 𝐷 These results are even more dramatic if each site can be classified into more than 2 categories. 13 Solving stochastic dynamic programming models without transition matrices Paul L. Fackler, NCSU

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend