Computational complexity of stochastic programs A. Shapiro School - PowerPoint PPT Presentation

Computational complexity of stochastic programs A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA East Coast Optimization Meeting 2019

Consider optimization problem � � min f ( x ) = E [ F ( x, ξ )] , x ∈ X where X ⊂ R n , F : R n × R m → R and ξ is an m -dimensional random vector. In case of two-stage linear stochastic programming with recourse, X = { x ∈ R n + : Ax = b } and F ( x, ξ ) is the first stage cost c ⊤ x plus the optimal value of the second stage problem y ∈ R m q ⊤ y subject to Tx + Wy = h, y ≥ 0 , min with ξ formed from random components of q, T, W, h . For fixed x ∈ X the expectation E [ F ( x, ξ )] is given by the integral � E [ F ( x, ξ )] = F ( x, z ) dP ( z ) , where P is the probability distribution of ξ . 1

A standard approach to solving such stochastic programs is to discretize distribution P , i.e., to construct scenarios ξ k , k = 1 , ..., K , with assigned probabilities p k > 0, and hence to approximate E [ F ( x, ξ )] by � K k =1 p k F ( x, ξ k ). In the two-stage linear case this leads to the linear program c ⊤ x + � K k =1 p k q ⊤ min k y k x,y 1 ,...,y K s.t. T k x + W k y k = h k , k = 1 , ..., K, Ax = b, x ≥ 0 , y k ≥ 0 , k = 1 , ..., K. In order to have an accurate approximation of the ‘true’ distribution P the number K of required scenarios typically growths exponentially with dimension m . 2

Computational complexity of solving two-stage linear stochastic programs (deterministic point of view): the approximate solutions, with a sufficiently high accuracy, of linear two-stage stochastic programs with fixed recourse are # P -hard even if the random problem data is governed by independent uniform distributions (Dyer and Stougie, 2006, Hanasusanto, Kuhn and Wiesemann, 2016). Sample complexity of solving stochastic programs Generate a sample ξ j , j = 1 , ..., N , of random vector ξ and approximate the expectation E [ F ( x, ξ )] by the respective sample average. This leads to the following so-called Sample Average Approximation (SAA) of the ‘true’ problem   N f N ( x ) = 1   � F ( x, ξ j )  ˆ min  . N x ∈ X j =1 3

Slow convergence of the sample average ˆ f N ( x ) to the expectation f ( x ). By the Central Limit Theorem, for fixed x the error f N ( x ) − f ( x ) = O p ( N − 1 / 2 ) . ˆ v N be the optimal value of the SAA problem and v 0 and Let ˆ S 0 be the optimal value and set of optimal solutions of the true problem. Then under mild regularity conditions f N ( x ) + o p ( N − 1 / 2 ) . x ∈S 0 ˆ ˆ v N = min In particular, if S 0 = { x 0 } , then N 1 / 2 [ˆ v N − v 0 ] ⇒ N (0 , σ 2 ( x 0 )) (Shapiro, 1991). 4

Large Deviations type bounds. Suppose that: ε > δ ≥ 0, the set X is of finite diameter D , there is a constant σ > 0 such that M x ′ ,x ( t ) ≤ exp { σ 2 t 2 / 2 } , t ∈ R , x ′ , x ∈ X, where M x ′ ,x ( t ) is the moment generating function of the random variable F ( x ′ , ξ ) − F ( x, ξ ) − E [ F ( x ′ , ξ ) − F ( x, ξ )], there exists κ ( ξ ) such that its moment generating function is finite valued in a neighborhood of zero and � � � ≤ κ ( ξ ) � x ′ − x � , x ′ , x ∈ X and a.e. ξ. � F ( x ′ , ξ ) − F ( x, ξ ) � � Then for L = E [ κ ( ξ )] and sample size � � � �� 8 σ 2 � 2 O (1) DL N ≥ n log + log , ( ε − δ ) 2 ( ε − δ ) 2 α � N ⊂ S ε � S δ S δ N and S ε ˆ Here ˆ ≥ 1 − α . we are guaranteed that Pr are the sets of δ -optimal and ε -optimal solutions of the SAA and true problems respectively. 5

Stochastic Approximation (SA) approach. Suppose that the problem is convex, i.e., the feasible set X is convex and F ( · , ξ ) is convex for a.e. ξ . Classical SA algorithm x j +1 = Π X ( x j − γ j G ( x j , ξ j )) , where G ( x, ξ ) ∈ ∂ x F ( x, ξ ) is a calculated (sub)gradient, Π X is the orthogonal (Euclidean) projection onto X and γ j = θ/j . Theoret- ical bound (assuming f ( · ) is strongly convex and differentiable ) E [ f ( x j ) − v 0 ] = O ( j − 1 ) , for an optimal choice of constant θ (recall that v 0 is the optimal value of the true problem). This algorithm is very sensitive to choice of θ . 6

Robust SA approach (B. Polyak, 1990, Nemirovski ). Constant step size variant: fixed in advance sample size (number of iter- x N = 1 � N ations) N and step size γ j ≡ γ , j = 1 , ..., N : ˜ j =1 x j . N Theoretical bound x N ) − v 0 ] ≤ D 2 2 γN + γM 2 X E [ f (˜ , 2 where D X = max x ∈ X � x − x 1 � 2 and M 2 = max x ∈ X E � G ( x, ξ ) � 2 2 . For optimal (up to factor θ ) γ = θD X √ N we have M ≤ D X M + θD X M ≤ κD X M � x N ) − v 0 � √ √ √ f (˜ , E 2 θ 2 N N N where κ = max { θ, θ − 1 } . By Markov inequality it follows that ≤ κD X M � � x N ) − v 0 > ε √ f (˜ , Pr ε N and hence to the sample size estimate N ≥ κ 2 D 2 X M 2 . ε 2 α 2 7

Multistage stochastic programming. Let ξ t be a random (stochastic) process. Denote ξ [ t ] := ( ξ 1 , .., ξ t ) the history of the process ξ t up to time t . The values of the decision vector x t , chosen at stage t , may depend on the information ξ [ t ] available up to time t , but not on the future observations. The decision process has the form decision( x 0 ) � observation( ξ 1 ) � decision( x 1 ) � ... � observation( ξ T ) � decision( x T ) . Risk neutral T -stage stochastic programming problem: � � � � min F 1 ( x 1 ) + F 2 ( x 2 ( ξ [2] ) , ξ 2 ) + · · · + F T x T ( ξ [ T ] ) , ξ T E x 1 ,x 2 ( · ) ,...,x T ( · ) s . t . x 1 ∈ X 1 , x t ( ξ [ t ] ) ∈ X t ( x t − 1 ( ξ [ t − 1] ) , ξ t ) , t = 2 , . . . , T. In linear case, F t ( x t , ξ t ) := c ⊤ t x t and X t ( x t − 1 , ξ t ) := { x t : B t x t − 1 + A t x t = b t , x t ≥ 0 } , t = 2 , ..., T. 8

Optimization is performed over feasible policies (also called decision rules). A policy is a sequence of (measurable) functions x t = x t ( ξ [ t ] ), t = 1 , ..., T . Each x t ( ξ [ t ] ) is a function of the data process up to time t , this ensures the nonanticipative property of a considered policy. If the number of realizations (scenarios) of the process ξ t is finite, then the above (linear) problem can be written as one large (linear) programming problem. 9

Dynamic programming equations. Going recursively backwards in time. At stage T consider Q T ( x T − 1 , ξ T ) := x T ∈X T ( x T − 1 ,ξ T ) F T ( x T , ξ T ) . inf At stages t = T − 1 , ..., 2, consider � � � � Q t ( x t − 1 , ξ [ t ] ) := x t ∈X t ( x t − 1 ,ξ t ) F t ( x t , ξ t ) + E inf Q t +1 ( x t , ξ [ t +1] ) � ξ [ t ] . � �� Q t +1 ( x t ,ξ [ t ] ) At the first stage solve: Min F 1 ( x 1 ) + E [ Q 2 ( x 1 , ξ 1 )] . x 1 ∈X 1 If the random process is stagewise independent, i.e., ξ t +1 is independent of ξ [ t ] , then Q t +1 ( x t ) = E [ Q t +1 ( x t , ξ t +1 )] does not depend on ξ [ t ] . 10

For example, suppose that the problem is linear and only the right hand side vectors b t are random and can be modeled as a (first order) autoregressive process b t = µ + Φ b t − 1 + ε t , where µ and Φ are (deterministic) vector and regression matrix, respectively, and the error process ε t , t = 1 , ..., T , is stagewise independent. The corresponding feasibility constraints can be written in terms of x t and b t as B t x t − 1 + A t x t ≤ b t , Φ b t − 1 − b t + µ + ε t = 0 . That is, in terms of decision variables ( x t , b t ) this becomes a linear multistage stochastic programming problem governed by the stagewise independent random process ε 1 , ..., ε T . 11

Discretization by Monte Carlo sampling Independent of each other random samples ξ j t = ( c j t , B j t , A j t , b j t ), j = 1 , ..., N t , of respective ξ t , t = 2 , ..., T , are generated and the corresponding scenario tree is constructed by connecting every ancestor node at stage t , ..., ξ N t t − 1 with the same set of children nodes ξ 1 t . In that way the stagewise independence is preserved in the generated scenario tree. We refer to the constructed problem as the Sample Average Approximation (SAA) problem. The total number of scenarios of the SAA problem is given by the product N = � T t =2 N t and quickly becomes astronomically large with increase of the number of stages even for moderate values of sample sizes N t . 12

For T = 3, under certain regularity conditions, for ε > 0 and α ∈ (0 , 1), and the sample sizes N 1 and N 2 satisfying � n 1 exp � n 2 exp �� − O (1) N 1 ε 2 − O (1) N 2 ε 2 D 1 L 1 D 2 L 2 O (1) + ≤ α, σ 2 σ 2 ε ε 1 2 we have that any first-stage ε/ 2-optimal solution of the SAA problem is an ε -optimal first-stage solution of the true problem with probability at least 1 − α . In particular, suppose that N 1 = N 2 and take L := max { L 1 , L 2 } , D := max { D 1 , D 2 } , σ 2 := max { σ 2 1 , σ 2 2 } and n := max { n 1 , n 2 } . Then the required sample size N 1 = N 2 : � � � �� N 1 ≥ O (1) σ 2 � 1 O (1) DL n log + log , ε 2 ε α with total number of scenarios N = N 2 1 (Shapiro, 2006). 13

Computational complexity of stochastic programs A. Shapiro School - PowerPoint PPT Presentation

Computational complexity of stochastic programs A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA East Coast Optimization Meeting 2019 Consider optimization problem

Multiple Programs How do programs communicate? 1 Multiple Programs How do programs communicate?

A note on the complexity of backward induction games Jakub Szymanik RAIN @ NASSLLI 2012 Outline

Abstract: Computational Complexity theory deals with the classification of problems into classes

Texts Complexity Theory The main text for the course is: Computational Complexity . Christos H.

Computational Complexity, Orders of Magnitude n Rosen Ch. 3.2: Growth of Functions n Rosen

Stochastic geometry and random generation 1 Stochastic geometry and random generation

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

Computational Complexity of Judgment Aggregation Ronald de Haan Computational Social Choice:

Instruction Selection on SSA Graphs Sebastian Hack, Sebastian Buchwald, Andreas Zwinkau Compiler

publicpolicies,socialnetworksandepidemicprocesses Social networks

A Bounded Path Propagator on Directed Graphs CP 16 Diego de U na, Graeme Gange, Peter Schachte

Decision Trees Lecture 11 David Sontag New York University

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Maximum Flow Applications Max flow extensions and applications. Disjoint paths and network

Monotone Graphical Multivariate Markov Chains Roberto Colombi 1 , Sabrina Giordano 2 1 Dept of

Dual Finite Element Formulations and Associated Global Quantities for Field-Circuit Coupling