computational complexity of stochastic programs
play

Computational complexity of stochastic programs A. Shapiro School - PowerPoint PPT Presentation

Computational complexity of stochastic programs A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA East Coast Optimization Meeting 2019 Consider optimization problem


  1. Computational complexity of stochastic programs A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA East Coast Optimization Meeting 2019

  2. Consider optimization problem � � min f ( x ) = E [ F ( x, ξ )] , x ∈ X where X ⊂ R n , F : R n × R m → R and ξ is an m -dimensional random vector. In case of two-stage linear stochastic programming with recourse, X = { x ∈ R n + : Ax = b } and F ( x, ξ ) is the first stage cost c ⊤ x plus the optimal value of the second stage problem y ∈ R m q ⊤ y subject to Tx + Wy = h, y ≥ 0 , min with ξ formed from random components of q, T, W, h . For fixed x ∈ X the expectation E [ F ( x, ξ )] is given by the integral � E [ F ( x, ξ )] = F ( x, z ) dP ( z ) , where P is the probability distribution of ξ . 1

  3. A standard approach to solving such stochastic programs is to discretize distribution P , i.e., to construct scenarios ξ k , k = 1 , ..., K , with assigned probabilities p k > 0, and hence to ap- proximate E [ F ( x, ξ )] by � K k =1 p k F ( x, ξ k ). In the two-stage linear case this leads to the linear program c ⊤ x + � K k =1 p k q ⊤ min k y k x,y 1 ,...,y K s.t. T k x + W k y k = h k , k = 1 , ..., K, Ax = b, x ≥ 0 , y k ≥ 0 , k = 1 , ..., K. In order to have an accurate approximation of the ‘true’ distri- bution P the number K of required scenarios typically growths exponentially with dimension m . 2

  4. Computational complexity of solving two-stage linear stochas- tic programs (deterministic point of view): the approximate so- lutions, with a sufficiently high accuracy, of linear two-stage stochastic programs with fixed recourse are # P -hard even if the random problem data is governed by independent uniform distributions (Dyer and Stougie, 2006, Hanasusanto, Kuhn and Wiesemann, 2016). Sample complexity of solving stochastic programs Generate a sample ξ j , j = 1 , ..., N , of random vector ξ and ap- proximate the expectation E [ F ( x, ξ )] by the respective sample average. This leads to the following so-called Sample Average Approximation (SAA) of the ‘true’ problem   N f N ( x ) = 1   � F ( x, ξ j )  ˆ min  . N x ∈ X j =1 3

  5. Slow convergence of the sample average ˆ f N ( x ) to the expecta- tion f ( x ). By the Central Limit Theorem, for fixed x the error f N ( x ) − f ( x ) = O p ( N − 1 / 2 ) . ˆ v N be the optimal value of the SAA problem and v 0 and Let ˆ S 0 be the optimal value and set of optimal solutions of the true problem. Then under mild regularity conditions f N ( x ) + o p ( N − 1 / 2 ) . x ∈S 0 ˆ ˆ v N = min In particular, if S 0 = { x 0 } , then N 1 / 2 [ˆ v N − v 0 ] ⇒ N (0 , σ 2 ( x 0 )) (Shapiro, 1991). 4

  6. Large Deviations type bounds. Suppose that: ε > δ ≥ 0, the set X is of finite diameter D , there is a constant σ > 0 such that M x ′ ,x ( t ) ≤ exp { σ 2 t 2 / 2 } , t ∈ R , x ′ , x ∈ X, where M x ′ ,x ( t ) is the moment generating function of the random variable F ( x ′ , ξ ) − F ( x, ξ ) − E [ F ( x ′ , ξ ) − F ( x, ξ )], there exists κ ( ξ ) such that its moment generating function is finite valued in a neighborhood of zero and � � � ≤ κ ( ξ ) � x ′ − x � , x ′ , x ∈ X and a.e. ξ. � F ( x ′ , ξ ) − F ( x, ξ ) � � Then for L = E [ κ ( ξ )] and sample size � � � �� 8 σ 2 � 2 O (1) DL N ≥ n log + log , ( ε − δ ) 2 ( ε − δ ) 2 α � N ⊂ S ε � S δ S δ N and S ε ˆ Here ˆ ≥ 1 − α . we are guaranteed that Pr are the sets of δ -optimal and ε -optimal solutions of the SAA and true problems respectively. 5

  7. Stochastic Approximation (SA) approach. Suppose that the problem is convex, i.e., the feasible set X is convex and F ( · , ξ ) is convex for a.e. ξ . Classical SA algorithm x j +1 = Π X ( x j − γ j G ( x j , ξ j )) , where G ( x, ξ ) ∈ ∂ x F ( x, ξ ) is a calculated (sub)gradient, Π X is the orthogonal (Euclidean) projection onto X and γ j = θ/j . Theoret- ical bound (assuming f ( · ) is strongly convex and differentiable ) E [ f ( x j ) − v 0 ] = O ( j − 1 ) , for an optimal choice of constant θ (recall that v 0 is the optimal value of the true problem). This algorithm is very sensitive to choice of θ . 6

  8. Robust SA approach (B. Polyak, 1990, Nemirovski ). Constant step size variant: fixed in advance sample size (number of iter- x N = 1 � N ations) N and step size γ j ≡ γ , j = 1 , ..., N : ˜ j =1 x j . N Theoretical bound x N ) − v 0 ] ≤ D 2 2 γN + γM 2 X E [ f (˜ , 2 where D X = max x ∈ X � x − x 1 � 2 and M 2 = max x ∈ X E � G ( x, ξ ) � 2 2 . For optimal (up to factor θ ) γ = θD X √ N we have M ≤ D X M + θD X M ≤ κD X M � x N ) − v 0 � √ √ √ f (˜ , E 2 θ 2 N N N where κ = max { θ, θ − 1 } . By Markov inequality it follows that ≤ κD X M � � x N ) − v 0 > ε √ f (˜ , Pr ε N and hence to the sample size estimate N ≥ κ 2 D 2 X M 2 . ε 2 α 2 7

  9. Multistage stochastic programming. Let ξ t be a random (stochas- tic) process. Denote ξ [ t ] := ( ξ 1 , .., ξ t ) the history of the process ξ t up to time t . The values of the decision vector x t , chosen at stage t , may depend on the information ξ [ t ] available up to time t , but not on the future observations. The decision process has the form decision( x 0 ) � observation( ξ 1 ) � decision( x 1 ) � ... � observation( ξ T ) � decision( x T ) . Risk neutral T -stage stochastic programming problem: � � � � min F 1 ( x 1 ) + F 2 ( x 2 ( ξ [2] ) , ξ 2 ) + · · · + F T x T ( ξ [ T ] ) , ξ T E x 1 ,x 2 ( · ) ,...,x T ( · ) s . t . x 1 ∈ X 1 , x t ( ξ [ t ] ) ∈ X t ( x t − 1 ( ξ [ t − 1] ) , ξ t ) , t = 2 , . . . , T. In linear case, F t ( x t , ξ t ) := c ⊤ t x t and X t ( x t − 1 , ξ t ) := { x t : B t x t − 1 + A t x t = b t , x t ≥ 0 } , t = 2 , ..., T. 8

  10. Optimization is performed over feasible policies (also called de- cision rules). A policy is a sequence of (measurable) functions x t = x t ( ξ [ t ] ), t = 1 , ..., T . Each x t ( ξ [ t ] ) is a function of the data process up to time t , this ensures the nonanticipative property of a considered policy. If the number of realizations (scenarios) of the process ξ t is finite, then the above (linear) problem can be written as one large (linear) programming problem. 9

  11. Dynamic programming equations. Going recursively backwards in time. At stage T consider Q T ( x T − 1 , ξ T ) := x T ∈X T ( x T − 1 ,ξ T ) F T ( x T , ξ T ) . inf At stages t = T − 1 , ..., 2, consider � � � � Q t ( x t − 1 , ξ [ t ] ) := x t ∈X t ( x t − 1 ,ξ t ) F t ( x t , ξ t ) + E inf Q t +1 ( x t , ξ [ t +1] ) � ξ [ t ] . � �� � Q t +1 ( x t ,ξ [ t ] ) At the first stage solve: Min F 1 ( x 1 ) + E [ Q 2 ( x 1 , ξ 1 )] . x 1 ∈X 1 If the random process is stagewise independent, i.e., ξ t +1 is in- dependent of ξ [ t ] , then Q t +1 ( x t ) = E [ Q t +1 ( x t , ξ t +1 )] does not depend on ξ [ t ] . 10

  12. For example, suppose that the problem is linear and only the right hand side vectors b t are random and can be modeled as a (first order) autoregressive process b t = µ + Φ b t − 1 + ε t , where µ and Φ are (deterministic) vector and regression matrix, respectively, and the error process ε t , t = 1 , ..., T , is stagewise independent. The corresponding feasibility constraints can be written in terms of x t and b t as B t x t − 1 + A t x t ≤ b t , Φ b t − 1 − b t + µ + ε t = 0 . That is, in terms of decision variables ( x t , b t ) this becomes a linear multistage stochastic programming problem governed by the stagewise independent random process ε 1 , ..., ε T . 11

  13. Discretization by Monte Carlo sampling Independent of each other random samples ξ j t = ( c j t , B j t , A j t , b j t ), j = 1 , ..., N t , of respec- tive ξ t , t = 2 , ..., T , are generated and the corresponding scenario tree is constructed by connecting every ancestor node at stage t , ..., ξ N t t − 1 with the same set of children nodes ξ 1 t . In that way the stagewise independence is preserved in the generated sce- nario tree. We refer to the constructed problem as the Sample Average Approximation (SAA) problem. The total number of scenarios of the SAA problem is given by the product N = � T t =2 N t and quickly becomes astronomically large with increase of the number of stages even for moderate values of sample sizes N t . 12

  14. For T = 3, under certain regularity conditions, for ε > 0 and α ∈ (0 , 1), and the sample sizes N 1 and N 2 satisfying � n 1 exp � n 2 exp �� � � � � � � − O (1) N 1 ε 2 − O (1) N 2 ε 2 D 1 L 1 D 2 L 2 O (1) + ≤ α, σ 2 σ 2 ε ε 1 2 we have that any first-stage ε/ 2-optimal solution of the SAA problem is an ε -optimal first-stage solution of the true problem with probability at least 1 − α . In particular, suppose that N 1 = N 2 and take L := max { L 1 , L 2 } , D := max { D 1 , D 2 } , σ 2 := max { σ 2 1 , σ 2 2 } and n := max { n 1 , n 2 } . Then the required sample size N 1 = N 2 : � � � �� N 1 ≥ O (1) σ 2 � 1 O (1) DL n log + log , ε 2 ε α with total number of scenarios N = N 2 1 (Shapiro, 2006). 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend