[PPT] - Extending SDDP-style Algorithms for Multistage Stochastic PowerPoint Presentation

SLIDE 1

Extending SDDP-style Algorithms for Multistage Stochastic Programming Dave Morton Industrial Engineering & Management Sciences Northwestern University Joint work with: Oscar Dowson, Daniel Duque, and Bernardo Pagnoncelli

SLIDE 2

Collaborators

SLIDE 3

Hydroelectric Power Itaipu (14 GW)

SLIDE 4

Yuba, Bear and South Feather Hydrological Basin

SLIDE 5

SDDP Stochastic Dual Dynamic Programming

SLIDE 6

SLP-T z∗ = min

x1≥0 c1x1 + Eξ2|ξ1V2(x1, ξ2)

s.t. A1x1 = B1x0 + b1 where for t = 2, . . . , T, Vt(xt−1, ξt) = min

xt≥0 ctxt + Eξt+1|ξ1,...,ξtVt+1(xt, ξt+1)

s.t. Atxt = Btxt−1 + bt and where VT+1 ≡ 0 Vt(·, ξt) is piecewise linear and convex

SLIDE 7

SLP-T Assumptions for SDDP

Relatively complete recourse, finite optimal solution
ξt = (At, Bt, bt, ct) is inter-stage independent
Or, (At, Bt, ct) is inter-stage independent and bt satisfies, e.g.,

– bt = Ψ(bt−1) + εt with εt inter-stage independent; or, – bt = Ψ(bt−1) · εt with εt inter-stage independent

Sample space: Ωt = Σ2 × Σ3 × · · · × Σt with |Σt| modest
T may be large

SLIDE 8

What Does “Solution” Mean? A solution is a policy

SLIDE 9

SDDP

…  … 

(a) Forward Pass

…  …  …  …  …  …  … 

(b) Backward Pass

SLIDE 10

SDDP Master Programs min

xt,θt

ctxt + θt s.t. Atxt = Btxt−1 + bt −Gk

t xt + θt ≥ gk t , k = 1, 2, . . . , K

xt ≥ 0

SLIDE 11

Partially Observable Multistage Stochastic Programming

Or, an alternative to DRO when you don’t really know the distribution An apology: Not talking about Wasserstein-based DRO for SLP-T via an SDDP Algorithm (with Daniel Duque)

SLIDE 12

Policy Graphs (Dowson) A policy graph for SLP-3 with inter-stage independence: 1 2 3 Unfolds to a scenario tree: 1 2L 2H 3LH 3LL 3HL 3HH

SLIDE 13

Policy Graphs A Markov-switching model: Random transitions:

SLIDE 14

Inventory Example

R DA HA DB HB

1 2 1 2

1 1 ρ ρ

Demand model A: P(ω = 1) = 0.2 P(ω = 2) = 0.8 Demand model B: P(ω = 1) = 0.8 P(ω = 2) = 0.2 Di : Di(x) = min

u,x′≥0 u + Eω[Hi(x′, ω)]

s.t. x′ = x + u Hi : Hi(x, ω) = min

u,x′≥0 2u + x′ + ρDi(x)

s.t. x′ = x + u − ω

SLIDE 15

Policy Graphs

Each node i:

u = πi(x, ω) x′ = Ti(x, u, ω) Ωi Ci(x, u, ω) x x′ ω

A policy graph:

G = (R, N, E, Φ)
ωj ∈ Ωj: node-wise independent noise
feasible controls: u ∈ Ui(x, ω)
transition function: x′ = Ti(x, u, ω)
one-step cost function: Ci(x, u, ω)

SLIDE 16

Policy Graphs

min

π Ei∈R+; ω∈Ωi[Vi(xR, ω)]

(1) where Vi(x, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + Ej∈i+; ϕ∈Ωj [Vj(x′, ϕ)] s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) (2) Goal: Find πi(x, ω) that solves (1) for each i ∈ N, x, and ω (A1) N is finite (A2) Ωi is finite and ωi is node-wise independent ∀i ∈ N (A3) Excluding cost-to-go term, subproblem (2) is an LP (A4) Subproblem (2) has finite optimal solution (A5) Hit leaf node with probability 1 (or graph G is acyclic)

SLIDE 17

Policy Graphs with Partial Observability

Extend policy graph to: G = (R, N, E, Φ, A) where A partitions N:

A∈A

A = N A ∩ A′ = ∅, A = A′ We know the current ambiguity set, A, but not which node Full observability: A = {{i} : i ∈ N}, i.e., |A| = 1 But, could have |A| = 2, where we know the stage but not the node

SLIDE 18

Updates to the Belief State

R DA HA DB HB

1 2 1 2

1 1 ρ ρ

A = {A1, A2}, with A1 = {DA, DB} and A2 = {HA, HB}

P{Node = k | ω, A} = 1k∈A · P{ω | Node = k}P{Node = k} P{ω} bk ← [1k∈A · P(ω ∈ Ωk)]

i∈N biφik

i∈N bi
j∈A φijP(ω ∈ Ωj)

b ← B(b, ω) = Dω

AΦ⊤b

i∈N bi
j∈A φijP(ω ∈ Ωj)

SLIDE 19

Policy Graphs with Partial Observability Each node:

b ← B(b, ω) u = πi(x, ω, b) x′ = Ti(x, u, ω) Ωi Ci(x, u, ω) x, b x′, b ω

All nodes in an ambiguity set have the same Ci, Ti, and Ui
Children i+, transition probabilities φij, even Ωi may differ

SLIDE 20

Policy Graphs with Partial Observability min

π Ei∈R+; ω∈Ωi[Vi(xR, Bi(bR, ω), ω)]

(3) where Vi(x, b, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + V(x′, b) s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) and where V(x′, b) =

j∈N

bj

k∈N

φjk

ϕ∈Ωk

P(ϕ ∈ Ωk) · Vk(x′, Bk(b, ϕ), ϕ) Goal: Find πA(x, b, ω) that solves (3) for each A ∈ A, x, b, and ω

SLIDE 21

Saddle Property of Cost-to-go Function

Vi(x, b, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + V(x′, b) s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) where V(x′, b) =

j∈N

bj

k∈N

φjk

ϕ∈Ωk

P(ϕ ∈ Ωk) · Vk(x′, Bk(b, ϕ), ϕ) Assume (A1)-(A5) with G acyclic Lemma 1. Fix i, b, ω. Then, Vi(x, b, ω) is piecewise linear convex in x. Lemma 2. Fix x′. Then, V(x′, b) is piecewise linear concave in b. Theorem 1. V(x′, b) is a piecewise linear saddle function, which is convex in x′ for fixed b and concave in b for fixed x′.

SLIDE 22

Linear Interpolation: Towards an SDDP Algorithm

¯ b1 = 0 ¯ b2 ¯ b3 ¯ b4 ¯ b5 = 1 b V(b)

V(b) = max

γ≥0 K

k=1

γkV(¯ bk) s.t.

K

k=1

γk = 1

K

k=1

γk¯ bk = b

SLIDE 23

Saddle Function with Interpolated Cuts

x′ b V(x′, b)

SLIDE 24

Computing Cuts for What? Vi(x, b, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + VA(x′, b) s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) where VA(x′, b) =

j∈A

bj

k∈j+

φjk

ϕ∈Ωk

P(ϕ ∈ Ωk) · Vk(x′, Bk(b, ϕ), ϕ)

SLIDE 25

SDDP Master Program V K

i (x, b, ω) = min u,¯ x,x′,θ max γ≥0 Ci(¯

x, u, ω) +

K

k=1

γkθk s.t. ¯ x = x [λ] u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω)

K

k=1

γkbk = b [µ]

K

k=1

γk = 1 [ν] θk ≥ Gkx′ + gk, k = 1, . . . , K

SLIDE 26

SDDP Master Program V K

i (x, b, ω) =

min

u,¯ x,x′,ν,µ Ci(¯

x, u, ω) + µ⊤b + ν s.t. ¯ x = x, [λ] u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) µ⊤bk + ν ≥ Gkx′ + gk, k = 1, . . . , K Theorem 2. Assume (A1)-(A5) with G acyclic. Let the sample paths of the “obvious” SDDP algorithm be generated independently at each iteration. Then, the algorithm converges to an optimal policy almost surely in a finite number of iterations.

SLIDE 27

Inventory Example

R DA HA DB HB

1 2 1 2

1 1 ρ ρ

Demand model A: P(ω = 1) = 0.2 P(ω = 2) = 0.8 Demand model B: P(ω = 1) = 0.8 P(ω = 2) = 0.2 Di : Di(x) = min

u,x′≥0 u + Eω[Hi(x′, ω)]

s.t. x′ = x + u Hi : Hi(x, ω) = min

u,x′≥0 2u + x′ + ρDi(x)

s.t. x′ = x + u − ω

SLIDE 28

Inventory Example: Train Four Policies

1. fully observable: distribution known upon departing R
2. partially observable: ambiguity partition {DA, DB}, {HA, HB}
3. risk-neutral average demand: demand equally likely to be 1 or 2
4. DRO average demand: modified χ2 method with radius 0.25

SLIDE 29

Inventory Example: Train Four Policies

2000 out-of-sample costs over 50 periods; quartiles; ρ = 0.9

Fully

bservable

Partially

bservable

Risk neutral average demand DRO average demand

10 15 20 25 30

Discounted simulated cost ($) Fully

bservable

Partially

bservable

Risk neutral average demand DRO average demand

60 80 100 120 140

Undiscounted simulated cost ($)

SLIDE 30

Inventory Example One Sample Path of the Partially Observable Policy

2 4 6 8 10 12 0.2 0.4 0.6 0.8 1 Periods Belief in model A (a) Belief 2 4 6 8 10 12 0.5 1 1.5 2 Periods Units (b) First-stage buy 2 4 6 8 10 12 0.5 1 1.5 2 Periods Units (c) Inventory

SLIDE 31

Concluding Thoughts

Partially observable multistage stochastic programs

– Saddle-cut SDDP algorithm – SDDP.jl (Dowson and Kapelevich)

Related saddle-function work in stochastic programming

– Baucke et al. (2018): risk measures – Downward et al. (2018): stage-wise dependent obj. coefficients

Closely related ideas are well known in POMDPs

– Contextual, multi-model, concurrent MDPs – We allow continuous state and action spaces via convexity

Countably infinite LPs for cyclic case
We did not handle decision-dependent learning

– b ← B(b, ω) versus b ← B(b, ω, u)

SLIDE 32

Extending SDDP-style Algorithms for Multistage Stochastic Programming Dave Morton Industrial Engineering & Management Sciences Northwestern University Joint work with: Oscar Dowson, Daniel Duque, and Bernardo Pagnoncelli

Collaborators

Hydroelectric Power Itaipu (14 GW)

Yuba, Bear and South Feather Hydrological Basin

SDDP Stochastic Dual Dynamic Programming

SLP-T z∗ = min

x1≥0 c1x1 + Eξ2|ξ1V2(x1, ξ2)

s.t. A1x1 = B1x0 + b1 where for t = 2, . . . , T, Vt(xt−1, ξt) = min

xt≥0 ctxt + Eξt+1|ξ1,...,ξtVt+1(xt, ξt+1)

s.t. Atxt = Btxt−1 + bt and where VT+1 ≡ 0 Vt(·, ξt) is piecewise linear and convex

SLP-T Assumptions for SDDP

– bt = Ψ(bt−1) + εt with εt inter-stage independent; or, – bt = Ψ(bt−1) · εt with εt inter-stage independent

What Does “Solution” Mean? A solution is a policy

SDDP

… …

… … … … … … …

SDDP Master Programs min

xt,θt

ctxt + θt s.t. Atxt = Btxt−1 + bt −Gk

t xt + θt ≥ gk t , k = 1, 2, . . . , K

xt ≥ 0

Partially Observable Multistage Stochastic Programming

Or, an alternative to DRO when you don’t really know the distribution An apology: Not talking about Wasserstein-based DRO for SLP-T via an SDDP Algorithm (with Daniel Duque)

Policy Graphs (Dowson) A policy graph for SLP-3 with inter-stage independence: 1 2 3 Unfolds to a scenario tree: 1 2L 2H 3LH 3LL 3HL 3HH

Policy Graphs A Markov-switching model: Random transitions:

Inventory Example

R DA HA DB HB

1 1 ρ ρ

Demand model A: P(ω = 1) = 0.2 P(ω = 2) = 0.8 Demand model B: P(ω = 1) = 0.8 P(ω = 2) = 0.2 Di : Di(x) = min

s.t. x′ = x + u Hi : Hi(x, ω) = min

s.t. x′ = x + u − ω

Policy Graphs

Each node i:

u = πi(x, ω) x′ = Ti(x, u, ω) Ωi Ci(x, u, ω) x x′ ω

A policy graph:

Policy Graphs

min

(1) where Vi(x, ω) = min

Policy Graphs with Partial Observability

Extend policy graph to: G = (R, N, E, Φ, A) where A partitions N:

A = N A ∩ A′ = ∅, A = A′ We know the current ambiguity set, A, but not which node Full observability: A = {{i} : i ∈ N}, i.e., |A| = 1 But, could have |A| = 2, where we know the stage but not the node

Updates to the Belief State

A = {A1, A2}, with A1 = {DA, DB} and A2 = {HA, HB}

P{Node = k | ω, A} = 1k∈A · P{ω | Node = k}P{Node = k} P{ω} bk ← [1k∈A · P(ω ∈ Ωk)]

i∈N biφik

b ← B(b, ω) = Dω

AΦ⊤b

Policy Graphs with Partial Observability Each node:

b ← B(b, ω) u = πi(x, ω, b) x′ = Ti(x, u, ω) Ωi Ci(x, u, ω) x, b x′, b ω

Policy Graphs with Partial Observability min

π Ei∈R+; ω∈Ωi[Vi(xR, Bi(bR, ω), ω)]

(3) where Vi(x, b, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + V(x′, b) s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) and where V(x′, b) =

bj

φjk

P(ϕ ∈ Ωk) · Vk(x′, Bk(b, ϕ), ϕ) Goal: Find πA(x, b, ω) that solves (3) for each A ∈ A, x, b, and ω

Saddle Property of Cost-to-go Function

Vi(x, b, ω) = min

x, u, ω) + V(x′, b) s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) where V(x′, b) =

bj

φjk

Linear Interpolation: Towards an SDDP Algorithm

¯ b1 = 0 ¯ b2 ¯ b3 ¯ b4 ¯ b5 = 1 b V(b)

V(b) = max

γkV(¯ bk) s.t.

γk = 1

γk¯ bk = b

Saddle Function with Interpolated Cuts

x′ b V(x′, b)

Computing Cuts for What? Vi(x, b, ω) = min

u,¯ x,x′ Ci(¯

x, u, ω) + VA(x′, b) s.t. ¯ x = x u ∈ Ui(¯ x, ω) x′ = Ti(¯ x, u, ω) where VA(x′, b) =

bj

φjk

P(ϕ ∈ Ωk) · Vk(x′, Bk(b, ϕ), ϕ)

SDDP Master Program V K

i (x, b, ω) = min u,¯ x,x′,θ max γ≥0 Ci(¯

x, u, ω) +

K

…  … 

…  …  …  …  …  …  …