QMC methods for stochastic programs: Contents ANOVA decomposition - - PowerPoint PPT Presentation

qmc methods for stochastic programs
SMART_READER_LITE
LIVE PREVIEW

QMC methods for stochastic programs: Contents ANOVA decomposition - - PowerPoint PPT Presentation

Home Page Title Page QMC methods for stochastic programs: Contents ANOVA decomposition of integrands W. R omisch Humboldt-University Berlin Page 1 of 18 www.math.hu-berlin.de/~romisch Go Back (H. Heitsch, I. H.


slide-1
SLIDE 1

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 1 of 18 Go Back Full Screen Close Quit

QMC methods for stochastic programs: ANOVA decomposition of integrands

  • W. R¨
  • misch

Humboldt-University Berlin www.math.hu-berlin.de/~romisch

(H. Heitsch, I. H. Sloan)

MCQMC 2012, Sydney, February 12–17, 2012

slide-2
SLIDE 2

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 2 of 18 Go Back Full Screen Close Quit

Introduction

  • Stochastic programs are optimization problems containing in-

tegrals in the objective function and/or constraints.

  • Applied stochastic programming models in production, trans-

portation, energy, finance etc. are typically large scale.

  • Standard approach for solving such models are variants of Monte

Carlo for generating scenarios (i.e., samples).

  • A few recent approaches to scenario generation in stochastic

programming besides MC: (a) Optimal quantization of probability distributions (Pflug-Pichler

2010).

(b) Quasi-Monte Carlo (QMC) methods (Koivu-Pennanen 05, Homem-

de-Mello 06).

(c) Sparse grid quadrature rules (Chen-Mehrotra 08).

slide-3
SLIDE 3

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 3 of 18 Go Back Full Screen Close Quit

While the justification of MC and (a) may be based on available sta- bility results for stochastic programs, there is almost no reasonable justification of applying (b) and (c). Personal interest: Applying and justifying randomized QMC methods (randomly shifted and digitally shifted polynomial lattice rules) with application in energy models.

slide-4
SLIDE 4

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 4 of 18 Go Back Full Screen Close Quit

Two-stage linear stochastic programs Two-stage stochastic programs arise as deterministic equivalents of improperly posed random linear programs min{c, x : x ∈ X, Tx = ξ}, where X is a convex polyhedral subset of Rm, T a matrix, ξ is a d-dimensional random vector. A possible deviation ξ − Tx is compensated by additional costs Φ(x, ξ) whose mean with respect to the probability distribution P

  • f ξ is added to the objective. We assume that the additional costs

represent the optimal value of a second-stage program, namely, Φ(x, ξ) = inf{q, y : y ∈ R ¯

m, Wy = ξ − Tx, y ≥ 0},

where q ∈ R ¯

m, W a (d, ¯

m)-matrix (having rank d) and t varies in the polyhedral cone W(R ¯

m +).

The deterministic equivalent then is of the form min

  • c, x +
  • Rd Φ(x, ξ)P(dξ) : x ∈ X
  • .
slide-5
SLIDE 5

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 5 of 18 Go Back Full Screen Close Quit

We assume that the additional costs are of the form Φ(x, ξ) = ϕ(ξ − Tx) with the second-stage optimal value function ϕ(t) = inf{q, y : Wy = t, y ≥ 0} = sup{t, z : W ⊤z ≤ q} = sup

z∈D

t, z , There exist vertices vj of the dual feasible set D and polyhedral cones Kj, j = 1, . . . , ℓ, decomposing dom ϕ such that ϕ(t) = vj, t, ∀t ∈ Kj, and ϕ(t) = max

j=1,...,ℓvj, t.

Hence, the integrands are of the form f(ξ) = max

j=1,...,ℓvj, ξ − Tx.

Problem: When transformed to [0, 1]d, f is not of bounded variation in the Hardy-Krause sense and does not belong to tensor product Sobolev spaces d

i=1 W 1 2 ([0, 1]) in general.

slide-6
SLIDE 6

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 6 of 18 Go Back Full Screen Close Quit

Model extensions

  • Two-stage models with affine functions h(ξ) and/or T(ξ), hence,

the integrands f are of the form f(ξ) = max

j=1,...,ℓvj, h(ξ) − T(ξ)x.

  • Two-stage models with random second-stage costs q(ξ)

f(ξ)= max

j=1,...,ℓvj(ξ), h(ξ) − Tx= max j=1,...,ℓCjq(ξ), h(ξ) − T(ξ)x.

  • Multi-period models: Random vector ξ = (ξ1, . . . , ξT)

f(ξ) = Ψ1(ξ, x), where Ψ1 is given by the DP recursion Φt(ξt, ut−1) := sup

  • ut−1, zt + Ψt+1(ξt, zt) : W ⊤

t zt ≤ qt(ξt)

  • Ψt(ξt, zt−1) := Φt(ξt, ht(ξt) − Tt(ξt)zt−1), t = T, . . . , 1,

where z0 = x, ξt = (ξt, . . . , ξT) and ΨT+1(ξT+1, zT) ≡ 0.

  • Multi-stage models: The only difference to multi-period is

Ψt(ξt, zt−1) := E[Φt(ξt, ht(ξt) − Tt(ξt)zt−1)|ξ1, . . . , ξt].

slide-7
SLIDE 7

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 7 of 18 Go Back Full Screen Close Quit

ANOVA decomposition of multivariate functions Idea: Decompositions of f may be used, where most of them are smooth, but hopefully only some of them relevant. Let D = {1, . . . , d} and f ∈ L1,ρd(Rd) with ρd(ξ) = d

j=1 ρj(ξj).

Let the projection Pk, k ∈ D, be defined by (Pkf)(ξ) := ∞

−∞

f(ξ1, . . . , ξk−1, s, ξk+1, . . . , ξd)ρk(s)ds (ξ ∈ Rd). Clearly, Pkf is constant with respect to ξk. For u ⊆ D we write Puf =

k∈u

Pk

  • (f),

where the product means composition, and note that the ordering within the product is not important because of Fubini’s theorem. The function Puf is constant with respect to all xk, k ∈ u. Note that Pu satisfies the properties of a projection.

slide-8
SLIDE 8

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 8 of 18 Go Back Full Screen Close Quit

ANOVA-decomposition of f: f =

  • u⊆D

fu , where f∅ = Id(f) = PD(f) and recursively fu = P−u(f) −

  • v⊆u

fv

  • r

fu =

  • v⊆u

(−1)|u|−|v|P−vf = P−u(f) +

  • v⊂u

(−1)|u|−|v|Pu−v(P−u(f)), where P−u and Pu−v mean integration with respect to ξj, j ∈ D\u and j ∈ u \ v, respectively. The second representation motivates that fu is essentially as smooth as P−u(f). If f belongs to L2,ρd(Rd), the ANOVA functions {fu}u⊆D are or- thogonal in L2,ρd(Rd).

slide-9
SLIDE 9

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 9 of 18 Go Back Full Screen Close Quit

We set σ2(f) = f − Id(f)2

L2 and have

σ2(f) = f2

L2 − (Id(f))2 =

  • ∅=u⊆D

fu2

L2 .

The truncation dimension dt of f is the smallest dt ∈ N such that

  • u⊆{1,...,dt}

fu2

L2 ≥ pσ2(f)

(where p ∈ (0, 1) is close to 1). Then it holds

  • f −
  • u⊆{1,...,dt}

fu

  • L2

≤ (1 − p)σ2(f).

(Wang-Fang 03, Kuo-Sloan-Wasilkowski-Wo´ zniakowski 10, Griebel-Holtz 10)

According to an observation of Griebel-Kuo-Sloan 10 the ANOVA terms fu can be smoother than f under certain conditions.

slide-10
SLIDE 10

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 10 of 18 Go Back Full Screen Close Quit

ANOVA decomposition of two-stage integrands Assumption: (A1) W(R ¯

m +) = Rd (complete recourse).

(A2) D = ∅ (dual feasibility). (A3)

  • Rd ξP(dξ) < ∞.

(A4) P has a density of the form ρd(ξ) = d

j=1 ρj(ξj) (ξ ∈ Rd)

with ρj ∈ C(R), j = 1, . . . , d. (A1) and (A2) imply that dom ϕ = Rd and D is bounded and, hence, it is the convex hull of its vertices. Furthermore, the cones Kj are the normal cones to D at the vertices vj, i.e., Kj = {t ∈ Rd : t, z − vj ≤ 0, ∀z ∈ D} (j = 1, . . . , ℓ) = {t ∈ Rd : t, vi − vj ≤ 0, ∀i = 1, . . . , ℓ, i = j}. It holds that ∪j=1,...,ℓ Kj = Rd and for j = j′ the intersection Kj ∩ Kj′ is a common closed face of dimension d − 1 iff the two cones are adjacent and is contained in {t ∈ Rd : t, vj′ − vj = 0}.

slide-11
SLIDE 11

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 11 of 18 Go Back Full Screen Close Quit

To compute projections Pk(f) for k ∈ D. Let ξi ∈ R, i = 1, . . . , d, i = k, be given. We set ξk = (ξ1, . . . , ξk−1, ξk+1, . . . , ξd) and ξs = (ξ1, . . . , ξk−1, s, ξk+1, . . . , ξd) ∈ Rd = ∪j=1,...,ℓ Kj. Assuming (A1)–(A4) it is possible to derive an explicit representa- tion of Pk(f) that depends on ξk and on the finitely many points at which the one-dimensional affine subspace {ξs : s ∈ R} meets the common face of two adjacent cones. This leads to Proposition: Let k ∈ D. Assume (A1)–(A4) and that all adjacent vertices of D have different kth components. The kth projection Pkf is infinitely differentiable if the density ρk is in C∞(R) and all its derivatives are bounded on R.

slide-12
SLIDE 12

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 12 of 18 Go Back Full Screen Close Quit

Theorem: Let u ⊂ D. Assume (A1)–(A4) and that all adjacent vertices of D have different kth components for some k ∈ D \ u. Then the ANOVA term fu belongs to C∞(Rd−|u|) if ρk ∈ C∞(R) and all its derivatives are bounded on R. Remark: The algebraic condition on the vertices of D is satisfied almost everywhere in the following sense: Given D there are only finitely many orthogonal matrices Q per- forming rotations of Rd such that the condition is not satisfied for QD = {z ∈ Rd : (QW)⊤z ≤ q}. Note that then the optimal value φ(t) is equal to max{Qt, z : z ∈ QD}. Such an orthogo- nal transformation of D leads only to simple changes.

slide-13
SLIDE 13

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 13 of 18 Go Back Full Screen Close Quit

Example: Let ¯ m = 3, d = 2, P denote the two-dimensional standard normal distribution and let the following vector q and matrix W W = −1 1 1 1 −1

  • q =

  1 1   be given. Then (A1) and (A2) are satisfied and the dual feasible set D is the triangle (in R2) D = {z ∈ R2 : −z1 + z2 ≤ 1, z1 + z2 ≤ 1, −z2 ≤ 0}, with the vertices v1 = 1

  • v2 =

−1

  • v3 =

1

  • .

The normal cones Kj to D at vj, j = 1, 2, 3, are K1 = {z ∈ R2 : z1 ≥ 0, z2 ≤ z1}, K2 = {z ∈ R2 : z1 ≤ 0, z2 ≤ −z1}, K3 = {z ∈ R2 : z2 ≥ z1, z2 ≥ −z1}.

slide-14
SLIDE 14

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 14 of 18 Go Back Full Screen Close Quit

❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅

  • q

q q q

❅ ❅ ❅

K2 K1 K3 v3 v2 v1 D Figure 1: Illustration of D, its vertices vj and the normal cones Kj to its vertices

Hence, the second component of the two adjacent vertices v1 and v2 coincides. The function ϕ is of the form ϕ(t) = max

i=1,2,3vi, t = max{t1, −t1, t2} = max{|t1|, t2}

and the integrand is f(ξ) = max{|ξ1 − [Tx]1|, ξ2 − [Tx]2} The ANOVA projection P1f is in C∞, but P2f is not differentiable.

slide-15
SLIDE 15

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 15 of 18 Go Back Full Screen Close Quit

Remark: Under the assumptions of the theorem the function fd−1(ξ) =

  • |u|≤d−1

fu is in C∞(Rd) if ρk ∈ C∞(R) and all its derivatives are bounded on R for every k ∈ D. On the other hand, it holds f = fd−1 + fD. Hence, the question arises: For which two-stage linear stochastic programs is the L2-norm of fD small or, equivalently, is fd−1 a good approximation of f in L2,ρd? Open problem: Estimates of the truncation dimension of two- stage linear stochastic programs ?

slide-16
SLIDE 16

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 16 of 18 Go Back Full Screen Close Quit

Conclusions

  • The results provide a first theoretical explanation of our com-

putational results close to the optimal rate for randomly shifted lattice rules applied to two-stage stochastic programs.

  • The results will be extended to more general two-stage situa-

tions.

  • Numerical experiments with and without orthogonal transfor-

mations will hopefully lead to more computational insight into the geometric condition on adjacent vertices.

  • Challenge: Multi-stage and integer stochastic programs.
slide-17
SLIDE 17

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 17 of 18 Go Back Full Screen Close Quit

References

  • M. Chen and S. Mehrotra: Epi-convergent scenario generation method for stochastic problems via

sparse grid, SPEPS 7-2008.

  • J. Dick, F. Pillichshammer: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte

Carlo Integration, Cambridge University Press, 2010.

  • M. Griebel, F. Y. Kuo, I. H. Sloan: The smoothing effect of the ANOVA decomposition, Journal
  • f Complexity 26 (2010), 523–551.
  • M. Griebel, F. Y. Kuo and I. H. Sloan: The smoothing effect of integration in Rd and the ANOVA

decomposition, Mathematics of Computation (to appear).

  • T. Homem-de-Mello: On rates of convergence for stochastic optimization problems under non-i.i.d.

sampling, SIAM Journal on Optimization 19 (2008), 524-551.

  • F. Y. Kuo: Component-by-component constructions achieve the optimal rate of convergence in

weighted Korobov and Sobolev spaces, Journal of Complexity 19 (2003), 301-320.

  • F. Y. Kuo, I. H. Sloan, G. W. Wasilkowski, H. Wo´

zniakowski: On decomposition of multivariate functions, Mathematics of Computation 79 (2010), 953–966.

  • F. Y. Kuo, I. H. Sloan, G. W. Wasilkowski, B. J. Waterhouse: Randomly shifted lattice rules with

the optimal rate of convergence for unbounded integrands, Journal of Complexity 26 (2010), 135– 160.

  • A. B. Owen: Multidimensional variation for Quasi-Monte Carlo, in J. Fan, G. Li (Eds.), International

Conference on Statistics, World Scientific Publ., 2005, 49–74.

slide-18
SLIDE 18

Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 18 of 18 Go Back Full Screen Close Quit

  • D. Nuyens and R. Cools: Fast algorithms for component-by-component constructions of rank-1

lattice rules in shift-invariant reproducing kernel Hilbert spaces, Mathematics of Computation 75 (2006), 903-922.

  • T. Pennanen, M. Koivu: Epi-convergent discretizations of stochastic programs via integration

quadratures, Numerische Mathematik 100 (2005), 141–163.

  • G. Ch. Pflug, A. Pichler: Scenario generation for stochastic optimization problems, in: Stochastic

Optimization Methods in Finance and Energy (M.I. Bertocchi, G. Consigli, M.A.H. Dempster eds.), Springer, 2011.

  • A. Ruszczy´

nski and A. Shapiro (Eds.): Stochastic Programming, Handbooks in Operations Research and Management Science Vol. 10, Elsevier, Amsterdam, 2003.

  • I. H. Sloan and H. Wo´

zniakowski: When are Quasi Monte Carlo algorithms efficientnfor high- dimensional integration, Journal of Complexity 14 (1998), 1–33.

  • I. H. Sloan, F. Y. Kuo and S. Joe: On the step-by-step construction of Quasi-Monte Carlo integra-

tion rules that achieve strong tractability error bounds in weighted Sobolev spaces, Mathematics of Computation 71 (2002), 1609-1640.

  • X. Wang and K.-T. Fang: The effective dimension and Quasi-Monte Carlo integration, Journal of

Complexity 19 (2003), 101–124.