on the solution of optimization and variational problems
play

On the Solution of Optimization and Variational Problems with - PowerPoint PPT Presentation

On the Solution of Optimization and Variational Problems with Imperfect Information Uday V. Shanbhag (with Hao Jiang (@Illinois) and Hesam Ahmadi (@PSU)) Harold and Inge Marcus Department of Industrial and Manufacturing Engineering Pennsylvania


  1. On the Solution of Optimization and Variational Problems with Imperfect Information Uday V. Shanbhag (with Hao Jiang (@Illinois) and Hesam Ahmadi (@PSU)) Harold and Inge Marcus Department of Industrial and Manufacturing Engineering Pennsylvania State University University Park, PA 6 th International Conference on Complementarity Problems (ICCP) Humboldt-Universität Zu Berlin Berlin August 8, 2014 1 / 57

  2. A misspecified optimization problem I A prototypical misspecified ∗ convex program where θ ∗ ∈ R m is misspecified: minimize C ( θ ∗ ) f ( x , θ ∗ ) x ∈ X Generally, θ ∗ captures problem characteristics that may require estimation. ◮ Parameters of cost/price functions ◮ Efficiencies ◮ Representation of uncertainty Generally, this is part of the model building process. ◮ Traditionally, a dichotomy in the roles of statisticans and optimizers 1 . Statisticians Learn – (Build model , estimate parameters ) 2 . Optimizers Search – (Use model/parameters to obtain solution ) ◮ Increasingly, the serial nature cannot persist. ∗ This is parametric misspecification (as opposed to model misspecification)

  3. Offline learning I ◮ One avenue lies in collecting observations a priori ◮ Learning problem L θ unaffected by the computational problem C ( θ ∗ ) : minimize L θ g ( θ ) θ ∈ Θ Concerns: ◮ Exact solutions generally unavailable in finite time; solution error can be bounded in expected-value sense (at best) in stochastic regimes ◮ Premature termination of learning process leads to � θ ; Error cascades into computational problem; x ∈ SOL ( C ( � � θ )) . ◮ Unclear how to develop a implementable scheme that produces x ∗ : ◮ (First-order) schemes that produce x ∗ and θ ∗ asymptotically ◮ Non-asymptotic error bounds a Note that schemes that produce approximations are available based on Lipschitzian properties 3 / 57

  4. An example I 4 / 57

  5. An example II 5 / 57

  6. An example III 6 / 57

  7. Data-driven stochastic programming I ◮ Consider the following static stochastic program min ( C θ ∗ ) E [ f ( x , ξ θ ∗ ( ω ))] , x ∈ X where f : R n × R d → R , ξ θ ∗ : Ω → R d and (Ω , F , P θ ∗ ) represents the probability space. ◮ Traditionally, the parameters of this distribution are estimated a priori (by MLE approaches for instance). Often a challenging problem (such as covariance selection) 7 / 57

  8. Misspecified production planning problems I ◮ The production planner solves the following problem: N W � � min c fi ( x fi ) x fi ≥ 0 f = 1 i = 1 subject to (1) x fi ≤ cap fi , for all f , i , N � x fi = d i . f = 1 ◮ Machine type f ’s production cost at node i c ( l ) fi ( x ( l ) fi ) at time l , l = 1 , . . . , T : fi ) 2 + h fi x ( l ) c ( l ) fi ( x ( l ) fi ) = d fi ( x ( l ) + ξ ( l ) fi fi ◮ The planner will solve the following problem to estimate d fi and h fi : T N W � � � fi ) 2 + h fi x ( l ) fi )) 2 . ( d fi ( x ( l ) − c ( l ) fi ( x ( l ) min fi { d fi , h f , i }∈ Θ l = 1 i = 1 f = 1 8 / 57

  9. A framework for learning and computation I C ( θ ∗ ) minimize f ( x , θ ∗ ) x ∈ X minimize L θ g ( θ ) θ ∈ Θ Our focus is on general purpose algorithms that jointly generate sequences { x k } and { θ k } with the following goals: k →∞ x k = x ∗ and lim k →∞ θ k = θ ∗ lim (Global convergence) � f ( x K , θ K ) − f ( x ∗ , θ ∗ ) � ≤ O ( h ( K )) , (Rate statements) where h ( K ) specifies the rate. 9 / 57

  10. A serial approach 1. Compute a solution ˜ θ to ( L θ ) 2. Use solution to solve ( C (˜ θ ) ) Challenges: ◮ Given the stage-wise nature, step 1. needs to provide accurate/exact ˜ θ in finite time; possible for small problems; ◮ In stochastic regimes, solution bounds available in expected-value sense: E [ � θ K − θ ∗ � 2 ] ≤ O ( 1 / K ) . ◮ In fact, unless the learning problem is solvable via a finite termination algorithm, asymptotic statements are unavailable 10 / 57

  11. A complementarity approach ◮ A direct variational approach: under convexity assumptions, equilibrium conditions are given by VI ( Z , H ) where � F ( x , θ ) � H ( z ) � and Z � X × Θ . ∇ θ g ( θ ) Challenges: ◮ Problem rarely monotone and low-complexity first-order projection/stochastic approximation schemes cannot accommodate such problems. 11 / 57

  12. Research questions ◮ First-order schemes available for solution of deterministic/stochastic con- vex optimization and monotone variational problems ◮ Can we develop analogous schemes that guarantee global/a.s. conver- gence † ◮ Can rate statements be provided for such schemes: ◮ Are the original rates preserved? ◮ What is the price of learning in terms of the modification/degradation in rates? † not immediate since problems can be viewed as non-monotone VIs/SVIs. 12 / 57

  13. Outline Part I: Deterministic problems: ◮ Gradient methods for smooth/nonsmooth and strongly convex/convex op- timization ◮ Extragradient and regularization methods for monotone variational in- equality problems Part II: Stochastic problems: ◮ Stochastic approximation schemes for strongly convex/convex stochastic optimization with stochastic learning problems ◮ Regularized stochastic approximation for monotone stochastic variational inequality problems with stochastic learning problems 13 / 57

  14. Literature Review Static decision-making problems with perfect information ◮ Optimization: convex programming [BNO03], integer programming [NW99], stochastic programming [BL97] ◮ Variational inequality problems [FP03a] Learning ◮ Linear and nonlinear regression, support vector machines (SVMs), etc. [HTF01] Joint schemes for related problems: ◮ Adaptive control [AW94], Iterative learning (tracking) control [Moo93] ◮ Bandit problems [Git89], regret problems [Zin03] ◮ Relatively less on joint schemes focusing on stylized problems in revenue management [CHdMK06, HKZ, CHdMK12] 14 / 57

  15. Misspecified deterministic optimization Consider the static misspecified convex optimization problem ( C ( θ ∗ )) : x ∈ X f ( x , θ ∗ ) , min ( C ( θ ∗ ) ) where x ∈ R n , f : X × Θ → R is a convex function in x for every θ ∈ Θ ⊆ R m . Suppose θ ∗ denotes the solution to a convex learning problem denoted by ( L ) : min ( L ) θ ∈ Θ g ( θ ) , where g : R m → R is a convex function in θ and is defined on a closed and convex set Θ . 15 / 57

  16. A joint gradient algorithm Algorithm 1 ( Joint gradient scheme) Given x 0 ∈ X and θ 0 ∈ Θ and sequences γ f , k , γ g , k , ∀ k ≥ 0 , (Opt ( θ k ) ) x k + 1 := Π X ( x k − γ f , k ∇ x f ( x k , θ k )) , ∀ k ≥ 0 . (Learn) θ k + 1 := Π Θ ( θ k − γ g , k ∇ θ g ( θ k )) , 16 / 57

  17. Assumptions Assumption 1 The function f ( x , θ ) is continuously differentiable in x for all θ ∈ Θ and function g is continuously differentiable in θ . Assumption 2 The gradient map ∇ x f ( x ; θ ) is Lipschitz continuous in x with constant G f , x uniformly over θ ∈ Θ or �∇ x f ( x 1 , θ ) − ∇ x f ( x 2 , θ ) � ≤ G f , x � x 1 − x 2 � , ∀ x 1 , x 2 ∈ X , ∀ θ ∈ Θ . Additionally, the gradient map ∇ θ g is Lipschitz continuous in θ with constant G g . Assumption 3 Let { γ f , k } and { γ g , k } be diminishing nonnegative sequences chosen such that � ∞ k = 1 γ f , k = ∞ , � ∞ f , k < ∞ , � ∞ � ∞ k = 1 γ 2 γ 2 k = 1 γ g , k = ∞ , and g , k < ∞ . k = 1 17 / 57

  18. Constant steplength schemes for strongly convex problems I Assumption 4 The function f is strongly convex in x with constant η f for all θ ∈ Θ and the function g is strongly convex with constant η g . Assumption 5 The gradient ∇ x f ( x ∗ , θ ) is Lipschitz continuous in θ with constant L θ . Proposition 1 ( Rate analysis in strongly convex regimes) Let Assumptions 1, 2, 4 and 5 hold. In addition, assume that γ f and γ g are chosen such that γ f ≤ min ( 2 η f / G 2 f , x , 1 / L θ ) and γ g ≤ 2 / G g . Let { x k , θ k } be the sequence generated by Algorithm 1. Then for every k ≥ 0 , we have the following: � x k + 1 − x ∗ � ≤ q k + 1 � x 0 − x ∗ � + kq θ q k � θ 0 − θ ∗ � , x where q x � ( 1 + γ 2 f G 2 f , x − 2 γ f η f ) 1 / 2 , q θ � γ f L θ , q g � ( 1 + γ 2 g G 2 g − 2 γ g η g ) 1 / 2 , and q � max ( q x , g g ) . 18 / 57

  19. Constant steplength schemes for strongly convex problems II Remark: Notably, learning leads to a degradation in the convergence rate from the standard linear rate to a sub-linear rate. Furthermore, it is easily seen that when we have access to the true θ ∗ , the original rate may be recov- ered. ‡ 4 4 6.5 x 10 x 10 Optimization and learning Optimization and learning 12 Optimization Optimization Generation cost Generation cost 6 Optimal generation cost Optimal generation cost 10 5.5 8 5 6 4.5 4 0 1 2 0 1 2 3 10 10 10 10 10 10 10 Iteration Iteration Figure 1 : Strongly convex problems and learning: Constant steplength (l) and Dimin- ishing steplength (r) 19 / 57

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend