 
              On the Solution of Optimization and Variational Problems with Imperfect Information Uday V. Shanbhag (with Hao Jiang (@Illinois) and Hesam Ahmadi (@PSU)) Harold and Inge Marcus Department of Industrial and Manufacturing Engineering Pennsylvania State University University Park, PA 6 th International Conference on Complementarity Problems (ICCP) Humboldt-Universität Zu Berlin Berlin August 8, 2014 1 / 57
A misspecified optimization problem I A prototypical misspecified ∗ convex program where θ ∗ ∈ R m is misspecified: minimize C ( θ ∗ ) f ( x , θ ∗ ) x ∈ X Generally, θ ∗ captures problem characteristics that may require estimation. ◮ Parameters of cost/price functions ◮ Efficiencies ◮ Representation of uncertainty Generally, this is part of the model building process. ◮ Traditionally, a dichotomy in the roles of statisticans and optimizers 1 . Statisticians Learn – (Build model , estimate parameters ) 2 . Optimizers Search – (Use model/parameters to obtain solution ) ◮ Increasingly, the serial nature cannot persist. ∗ This is parametric misspecification (as opposed to model misspecification)
Offline learning I ◮ One avenue lies in collecting observations a priori ◮ Learning problem L θ unaffected by the computational problem C ( θ ∗ ) : minimize L θ g ( θ ) θ ∈ Θ Concerns: ◮ Exact solutions generally unavailable in finite time; solution error can be bounded in expected-value sense (at best) in stochastic regimes ◮ Premature termination of learning process leads to � θ ; Error cascades into computational problem; x ∈ SOL ( C ( � � θ )) . ◮ Unclear how to develop a implementable scheme that produces x ∗ : ◮ (First-order) schemes that produce x ∗ and θ ∗ asymptotically ◮ Non-asymptotic error bounds a Note that schemes that produce approximations are available based on Lipschitzian properties 3 / 57
An example I 4 / 57
An example II 5 / 57
An example III 6 / 57
Data-driven stochastic programming I ◮ Consider the following static stochastic program min ( C θ ∗ ) E [ f ( x , ξ θ ∗ ( ω ))] , x ∈ X where f : R n × R d → R , ξ θ ∗ : Ω → R d and (Ω , F , P θ ∗ ) represents the probability space. ◮ Traditionally, the parameters of this distribution are estimated a priori (by MLE approaches for instance). Often a challenging problem (such as covariance selection) 7 / 57
Misspecified production planning problems I ◮ The production planner solves the following problem: N W � � min c fi ( x fi ) x fi ≥ 0 f = 1 i = 1 subject to (1) x fi ≤ cap fi , for all f , i , N � x fi = d i . f = 1 ◮ Machine type f ’s production cost at node i c ( l ) fi ( x ( l ) fi ) at time l , l = 1 , . . . , T : fi ) 2 + h fi x ( l ) c ( l ) fi ( x ( l ) fi ) = d fi ( x ( l ) + ξ ( l ) fi fi ◮ The planner will solve the following problem to estimate d fi and h fi : T N W � � � fi ) 2 + h fi x ( l ) fi )) 2 . ( d fi ( x ( l ) − c ( l ) fi ( x ( l ) min fi { d fi , h f , i }∈ Θ l = 1 i = 1 f = 1 8 / 57
A framework for learning and computation I C ( θ ∗ ) minimize f ( x , θ ∗ ) x ∈ X minimize L θ g ( θ ) θ ∈ Θ Our focus is on general purpose algorithms that jointly generate sequences { x k } and { θ k } with the following goals: k →∞ x k = x ∗ and lim k →∞ θ k = θ ∗ lim (Global convergence) � f ( x K , θ K ) − f ( x ∗ , θ ∗ ) � ≤ O ( h ( K )) , (Rate statements) where h ( K ) specifies the rate. 9 / 57
A serial approach 1. Compute a solution ˜ θ to ( L θ ) 2. Use solution to solve ( C (˜ θ ) ) Challenges: ◮ Given the stage-wise nature, step 1. needs to provide accurate/exact ˜ θ in finite time; possible for small problems; ◮ In stochastic regimes, solution bounds available in expected-value sense: E [ � θ K − θ ∗ � 2 ] ≤ O ( 1 / K ) . ◮ In fact, unless the learning problem is solvable via a finite termination algorithm, asymptotic statements are unavailable 10 / 57
A complementarity approach ◮ A direct variational approach: under convexity assumptions, equilibrium conditions are given by VI ( Z , H ) where � F ( x , θ ) � H ( z ) � and Z � X × Θ . ∇ θ g ( θ ) Challenges: ◮ Problem rarely monotone and low-complexity first-order projection/stochastic approximation schemes cannot accommodate such problems. 11 / 57
Research questions ◮ First-order schemes available for solution of deterministic/stochastic con- vex optimization and monotone variational problems ◮ Can we develop analogous schemes that guarantee global/a.s. conver- gence † ◮ Can rate statements be provided for such schemes: ◮ Are the original rates preserved? ◮ What is the price of learning in terms of the modification/degradation in rates? † not immediate since problems can be viewed as non-monotone VIs/SVIs. 12 / 57
Outline Part I: Deterministic problems: ◮ Gradient methods for smooth/nonsmooth and strongly convex/convex op- timization ◮ Extragradient and regularization methods for monotone variational in- equality problems Part II: Stochastic problems: ◮ Stochastic approximation schemes for strongly convex/convex stochastic optimization with stochastic learning problems ◮ Regularized stochastic approximation for monotone stochastic variational inequality problems with stochastic learning problems 13 / 57
Literature Review Static decision-making problems with perfect information ◮ Optimization: convex programming [BNO03], integer programming [NW99], stochastic programming [BL97] ◮ Variational inequality problems [FP03a] Learning ◮ Linear and nonlinear regression, support vector machines (SVMs), etc. [HTF01] Joint schemes for related problems: ◮ Adaptive control [AW94], Iterative learning (tracking) control [Moo93] ◮ Bandit problems [Git89], regret problems [Zin03] ◮ Relatively less on joint schemes focusing on stylized problems in revenue management [CHdMK06, HKZ, CHdMK12] 14 / 57
Misspecified deterministic optimization Consider the static misspecified convex optimization problem ( C ( θ ∗ )) : x ∈ X f ( x , θ ∗ ) , min ( C ( θ ∗ ) ) where x ∈ R n , f : X × Θ → R is a convex function in x for every θ ∈ Θ ⊆ R m . Suppose θ ∗ denotes the solution to a convex learning problem denoted by ( L ) : min ( L ) θ ∈ Θ g ( θ ) , where g : R m → R is a convex function in θ and is defined on a closed and convex set Θ . 15 / 57
A joint gradient algorithm Algorithm 1 ( Joint gradient scheme) Given x 0 ∈ X and θ 0 ∈ Θ and sequences γ f , k , γ g , k , ∀ k ≥ 0 , (Opt ( θ k ) ) x k + 1 := Π X ( x k − γ f , k ∇ x f ( x k , θ k )) , ∀ k ≥ 0 . (Learn) θ k + 1 := Π Θ ( θ k − γ g , k ∇ θ g ( θ k )) , 16 / 57
Assumptions Assumption 1 The function f ( x , θ ) is continuously differentiable in x for all θ ∈ Θ and function g is continuously differentiable in θ . Assumption 2 The gradient map ∇ x f ( x ; θ ) is Lipschitz continuous in x with constant G f , x uniformly over θ ∈ Θ or �∇ x f ( x 1 , θ ) − ∇ x f ( x 2 , θ ) � ≤ G f , x � x 1 − x 2 � , ∀ x 1 , x 2 ∈ X , ∀ θ ∈ Θ . Additionally, the gradient map ∇ θ g is Lipschitz continuous in θ with constant G g . Assumption 3 Let { γ f , k } and { γ g , k } be diminishing nonnegative sequences chosen such that � ∞ k = 1 γ f , k = ∞ , � ∞ f , k < ∞ , � ∞ � ∞ k = 1 γ 2 γ 2 k = 1 γ g , k = ∞ , and g , k < ∞ . k = 1 17 / 57
Constant steplength schemes for strongly convex problems I Assumption 4 The function f is strongly convex in x with constant η f for all θ ∈ Θ and the function g is strongly convex with constant η g . Assumption 5 The gradient ∇ x f ( x ∗ , θ ) is Lipschitz continuous in θ with constant L θ . Proposition 1 ( Rate analysis in strongly convex regimes) Let Assumptions 1, 2, 4 and 5 hold. In addition, assume that γ f and γ g are chosen such that γ f ≤ min ( 2 η f / G 2 f , x , 1 / L θ ) and γ g ≤ 2 / G g . Let { x k , θ k } be the sequence generated by Algorithm 1. Then for every k ≥ 0 , we have the following: � x k + 1 − x ∗ � ≤ q k + 1 � x 0 − x ∗ � + kq θ q k � θ 0 − θ ∗ � , x where q x � ( 1 + γ 2 f G 2 f , x − 2 γ f η f ) 1 / 2 , q θ � γ f L θ , q g � ( 1 + γ 2 g G 2 g − 2 γ g η g ) 1 / 2 , and q � max ( q x , g g ) . 18 / 57
Constant steplength schemes for strongly convex problems II Remark: Notably, learning leads to a degradation in the convergence rate from the standard linear rate to a sub-linear rate. Furthermore, it is easily seen that when we have access to the true θ ∗ , the original rate may be recov- ered. ‡ 4 4 6.5 x 10 x 10 Optimization and learning Optimization and learning 12 Optimization Optimization Generation cost Generation cost 6 Optimal generation cost Optimal generation cost 10 5.5 8 5 6 4.5 4 0 1 2 0 1 2 3 10 10 10 10 10 10 10 Iteration Iteration Figure 1 : Strongly convex problems and learning: Constant steplength (l) and Dimin- ishing steplength (r) 19 / 57
Recommend
More recommend