Extending SDDP-style Algorithms for Multistage Stochastic - - PowerPoint PPT Presentation

extending sddp style algorithms for multistage stochastic
SMART_READER_LITE
LIVE PREVIEW

Extending SDDP-style Algorithms for Multistage Stochastic - - PowerPoint PPT Presentation

Extending SDDP-style Algorithms for Multistage Stochastic Programming Dave Morton Industrial Engineering & Management Sciences Northwestern University Joint work with: Oscar Dowson, Daniel Duque, and Bernardo Pagnoncelli Collaborators


slide-1
SLIDE 1

Extending SDDP-style Algorithms for Multistage Stochastic Programming Dave Morton Industrial Engineering & Management Sciences Northwestern University Joint work with: Oscar Dowson, Daniel Duque, and Bernardo Pagnoncelli

slide-2
SLIDE 2

Collaborators

slide-3
SLIDE 3

Hydroelectric Power Itaipu (14 GW)

slide-4
SLIDE 4

Yuba, Bear and South Feather Hydrological Basin

slide-5
SLIDE 5

SDDP Stochastic Dual Dynamic Programming

slide-6
SLIDE 6

SLP-T z∗ = min

x1≥0 c1x1 + Eξ2|ξ1V2(x1, ξ2)

s.t. A1x1 = B1x0 + b1 where for t = 2, . . . , T, Vt(xt−1, ξt) = min

xt≥0 ctxt + Eξt+1|ξ1,...,ξtVt+1(xt, ξt+1)

s.t. Atxt = Btxt−1 + bt and where VT+1 ≡ 0 Vt(·, ξt) is piecewise linear and convex

slide-7
SLIDE 7

SLP-T Assumptions for SDDP

  • Relatively complete recourse, finite optimal solution
  • ξt = (At, Bt, bt, ct) is inter-stage independent
  • Or, (At, Bt, ct) is inter-stage independent and bt satisfies, e.g.,

– bt = Ψ(bt−1) + εt with εt inter-stage independent; or, – bt = Ψ(bt−1) · εt with εt inter-stage independent

  • Sample space: Ωt = Σ2 × Σ3 × · · · × Σt with |Σt| modest
  • T may be large
slide-8
SLIDE 8

What Does “Solution” Mean? A solution is a policy

slide-9
SLIDE 9

SDDP

…
 …


(a) Forward Pass

…
 …
 …
 …
 …
 …
 …


(b) Backward Pass

slide-10
SLIDE 10

SDDP Master Programs min

xt,θt

ctxt + θt s.t. Atxt = Btxt−1 + bt −Gk

t xt + θt ≥ gk t , k = 1, 2, . . . , K

xt ≥ 0

slide-11
SLIDE 11

Partially Observable Multistage Stochastic Programming

Or, an alternative to DRO when you don’t really know the distribution An apology: Not talking about Wasserstein-based DRO for SLP-T via an SDDP Algorithm (with Daniel Duque)

slide-12
SLIDE 12

Policy Graphs (Dowson) A policy graph for SLP-3 with inter-stage independence: 1 2 3 Unfolds to a scenario tree: 1 2L 2H 3LH 3LL 3HL 3HH

slide-13
SLIDE 13

Policy Graphs A Markov-switching model: Random transitions:

slide-14
SLIDE 14

Inventory Example

R DA HA DB HB

1 2 1 2

1 1 ρ ρ

Demand model A: P(ω = 1) = 0.2 P(ω = 2) = 0.8 Demand model B: P(ω = 1) = 0.8 P(ω = 2) = 0.2 Di : Di(x) = min

u,x′≥0 u + Eω[Hi(x′, ω)]

s.t. x′ = x + u Hi : Hi(x, ω) = min

u,x′≥0 2u + x′ + ρDi(x)

s.t. x′ = x + u − ω

slide-15
SLIDE 15

Policy Graphs

Each node i:

u = πi(x, ω) x′ = Ti(x, u, ω) Ωi Ci(x, u, ω) x x′ ω

A policy graph:

  • G = (R, N, E, Φ)
  • ωj ∈ Ωj: node-wise independent noise
  • feasible controls: u ∈ Ui(x, ω)
  • transition function: x′ = Ti(x, u, ω)
  • one-step cost function: Ci(x, u, ω)
slide-16
SLIDE 16

Policy Graphs

min

π Ei∈R+; ω∈Ωi[Vi(xR, ω)]

(1) where Vi(x, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + Ej∈i+; ϕ∈Ωj [Vj(x′, ϕ)] s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) (2) Goal: Find πi(x, ω) that solves (1) for each i ∈ N, x, and ω (A1) N is finite (A2) Ωi is finite and ωi is node-wise independent ∀i ∈ N (A3) Excluding cost-to-go term, subproblem (2) is an LP (A4) Subproblem (2) has finite optimal solution (A5) Hit leaf node with probability 1 (or graph G is acyclic)

slide-17
SLIDE 17

Policy Graphs with Partial Observability

Extend policy graph to: G = (R, N, E, Φ, A) where A partitions N:

  • A∈A

A = N A ∩ A′ = ∅, A = A′ We know the current ambiguity set, A, but not which node Full observability: A = {{i} : i ∈ N}, i.e., |A| = 1 But, could have |A| = 2, where we know the stage but not the node

slide-18
SLIDE 18

Updates to the Belief State

R DA HA DB HB

1 2 1 2

1 1 ρ ρ

A = {A1, A2}, with A1 = {DA, DB} and A2 = {HA, HB}

P{Node = k | ω, A} = 1k∈A · P{ω | Node = k}P{Node = k} P{ω} bk ← [1k∈A · P(ω ∈ Ωk)]

i∈N biφik

  • i∈N bi
  • j∈A φijP(ω ∈ Ωj)

b ← B(b, ω) = Dω

AΦ⊤b

  • i∈N bi
  • j∈A φijP(ω ∈ Ωj)
slide-19
SLIDE 19

Policy Graphs with Partial Observability Each node:

b ← B(b, ω) u = πi(x, ω, b) x′ = Ti(x, u, ω) Ωi Ci(x, u, ω) x, b x′, b ω

  • All nodes in an ambiguity set have the same Ci, Ti, and Ui
  • Children i+, transition probabilities φij, even Ωi may differ
slide-20
SLIDE 20

Policy Graphs with Partial Observability min

π Ei∈R+; ω∈Ωi[Vi(xR, Bi(bR, ω), ω)]

(3) where Vi(x, b, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + V(x′, b) s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) and where V(x′, b) =

  • j∈N

bj

  • k∈N

φjk

  • ϕ∈Ωk

P(ϕ ∈ Ωk) · Vk(x′, Bk(b, ϕ), ϕ) Goal: Find πA(x, b, ω) that solves (3) for each A ∈ A, x, b, and ω

slide-21
SLIDE 21

Saddle Property of Cost-to-go Function

Vi(x, b, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + V(x′, b) s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) where V(x′, b) =

  • j∈N

bj

  • k∈N

φjk

  • ϕ∈Ωk

P(ϕ ∈ Ωk) · Vk(x′, Bk(b, ϕ), ϕ) Assume (A1)-(A5) with G acyclic Lemma 1. Fix i, b, ω. Then, Vi(x, b, ω) is piecewise linear convex in x. Lemma 2. Fix x′. Then, V(x′, b) is piecewise linear concave in b. Theorem 1. V(x′, b) is a piecewise linear saddle function, which is convex in x′ for fixed b and concave in b for fixed x′.

slide-22
SLIDE 22

Linear Interpolation: Towards an SDDP Algorithm

¯ b1 = 0 ¯ b2 ¯ b3 ¯ b4 ¯ b5 = 1 b V(b)

V(b) = max

γ≥0 K

  • k=1

γkV(¯ bk) s.t.

K

  • k=1

γk = 1

K

  • k=1

γk¯ bk = b

slide-23
SLIDE 23

Saddle Function with Interpolated Cuts

x′ b V(x′, b)

slide-24
SLIDE 24

Computing Cuts for What? Vi(x, b, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + VA(x′, b) s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) where VA(x′, b) =

  • j∈A

bj

  • k∈j+

φjk

  • ϕ∈Ωk

P(ϕ ∈ Ωk) · Vk(x′, Bk(b, ϕ), ϕ)

slide-25
SLIDE 25

SDDP Master Program V K

i (x, b, ω) = min u,¯ x,x′,θ max γ≥0 Ci(¯

x, u, ω) +

K

  • k=1

γkθk s.t. ¯ x = x [λ] u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω)

K

  • k=1

γkbk = b [µ]

K

  • k=1

γk = 1 [ν] θk ≥ Gkx′ + gk, k = 1, . . . , K

slide-26
SLIDE 26

SDDP Master Program V K

i (x, b, ω) =

min

u,¯ x,x′,ν,µ Ci(¯

x, u, ω) + µ⊤b + ν s.t. ¯ x = x, [λ] u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) µ⊤bk + ν ≥ Gkx′ + gk, k = 1, . . . , K Theorem 2. Assume (A1)-(A5) with G acyclic. Let the sample paths of the “obvious” SDDP algorithm be generated independently at each iteration. Then, the algorithm converges to an optimal policy almost surely in a finite number of iterations.

slide-27
SLIDE 27

Inventory Example

R DA HA DB HB

1 2 1 2

1 1 ρ ρ

Demand model A: P(ω = 1) = 0.2 P(ω = 2) = 0.8 Demand model B: P(ω = 1) = 0.8 P(ω = 2) = 0.2 Di : Di(x) = min

u,x′≥0 u + Eω[Hi(x′, ω)]

s.t. x′ = x + u Hi : Hi(x, ω) = min

u,x′≥0 2u + x′ + ρDi(x)

s.t. x′ = x + u − ω

slide-28
SLIDE 28

Inventory Example: Train Four Policies

  • 1. fully observable: distribution known upon departing R
  • 2. partially observable: ambiguity partition {DA, DB}, {HA, HB}
  • 3. risk-neutral average demand: demand equally likely to be 1 or 2
  • 4. DRO average demand: modified χ2 method with radius 0.25
slide-29
SLIDE 29

Inventory Example: Train Four Policies

  • 2000 out-of-sample costs over 50 periods; quartiles; ρ = 0.9

Fully

  • bservable

Partially

  • bservable

Risk neutral average demand DRO average demand

10 15 20 25 30

Discounted simulated cost ($) Fully

  • bservable

Partially

  • bservable

Risk neutral average demand DRO average demand

60 80 100 120 140

Undiscounted simulated cost ($)

slide-30
SLIDE 30

Inventory Example One Sample Path of the Partially Observable Policy

2 4 6 8 10 12 0.2 0.4 0.6 0.8 1 Periods Belief in model A (a) Belief 2 4 6 8 10 12 0.5 1 1.5 2 Periods Units (b) First-stage buy 2 4 6 8 10 12 0.5 1 1.5 2 Periods Units (c) Inventory

slide-31
SLIDE 31

Concluding Thoughts

  • Partially observable multistage stochastic programs

– Saddle-cut SDDP algorithm – SDDP.jl (Dowson and Kapelevich)

  • Related saddle-function work in stochastic programming

– Baucke et al. (2018): risk measures – Downward et al. (2018): stage-wise dependent obj. coefficients

  • Closely related ideas are well known in POMDPs

– Contextual, multi-model, concurrent MDPs – We allow continuous state and action spaces via convexity

  • Countably infinite LPs for cyclic case
  • We did not handle decision-dependent learning

– b ← B(b, ω) versus b ← B(b, ω, u)

slide-32
SLIDE 32

Concluding Thoughts

http://www.optimization-online.org/DB_HTML/2019/03/7141.html