primal dual subgradient method for convex problems with
play

Primal-dual Subgradient Method for Convex Problems with Functional - PowerPoint PPT Presentation

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual method for functional constraints


  1. Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual method for functional constraints 1/20

  2. Outline 1 Constrained optimization problem 2 Lagrange multipliers 3 Dual function and dual problem 4 Augmented Lagrangian 5 Switching subgradient method 6 Finding the dual multipliers 7 Complexity analysis Yu. Nesterov Primal-dual method for functional constraints 2/20

  3. Optimization problem: simple constraints Consider the problem: min x ∈ Q f ( x ) , where Q is a closed convex set: x , y ∈ Q ⇒ [ x , y ] ⊆ Q , f is a subdifferentiable on Q convex function: f ( y ) ≥ f ( x ) + �∇ f ( x ) , y − x � , x , y ∈ Q , ∇ f ( x ) ∈ ∂ f ( x ). Optimality condition: point x ∗ ∈ Q is optimal iff �∇ f ( x ∗ ) , x − x ∗ � ≥ 0 , ∀ x ∈ Q . Interpretation: Function increases along any feasible direction. Yu. Nesterov Primal-dual method for functional constraints 3/20

  4. Examples Let x ∗ ∈ int Q . 1. Interior solution. Then �∇ f ( x ∗ ) , x − x ∗ � ≥ 0, ∀ x ∈ Q implies ∇ f ( x ∗ ) = 0. 2. Optimization over positive orthant. � � x ∈ R n : x ( i ) ≥ 0 , i = 1 . . . . , n Let Q ≡ R n + = . ∀ x ∈ R n Optimality condition: �∇ f ( x ∗ ) , x − x ∗ � ≥ 0 , + . � � x ( i ) − x ( i ) ∀ x ( i ) ≥ 0. Coordinate form: ∇ i f ( x ∗ ) ≥ 0 , ∗ This means that (tend x ( i ) → ∞ ) ∇ i f ( x ∗ ) ≥ 0 , i = 1 , . . . , n , x ( i ) (set x ( i ) = 0.) ∗ ∇ i f ( x ∗ ) = 0 , i = 1 , . . . , n , Yu. Nesterov Primal-dual method for functional constraints 4/20

  5. Optimization problem: functional constraints x ∈ Q { f 0 ( x ) , f i ( x ) ≤ 0 , i = 1 , . . . , m } , Problem: min where Q is a closed convex set, all f i are convex and subdifferentiable on Q , i = 0 , . . . , m : f i ( y ) ≥ f i ( x ) + �∇ f i ( x ) , y − x � , x , y ∈ Q , ∇ f i ( x ) ∈ ∂ f i ( x ). Optimality condition (KKT, 1951): point x ∗ ∈ Q is optimal iff there exist Lagrange multipliers λ ( i ) ∗ ≥ 0, i = 1 , . . . , m , such that � m λ ( i ) (1) : �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0 , ∀ x ∈ Q , i =1 (2) : f i ( x ∗ ) ≤ 0 , i = 1 , . . . , m , (feasibility) λ ( i ) (3) : ∗ f i ( x ∗ ) = 0 , i = 1 , . . . , m . (complementary slackness) Yu. Nesterov Primal-dual method for functional constraints 5/20

  6. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Observation: in any case, x ∗ is the optimal solution of problem P I . Interpretation: λ ( i ) are the shadow prices for resources. ∗ (Kantorovich, 1939) Application examples: Traffic congestion: car flows on roads ⇔ size of queues. Electrical networks: currents in the wires ⇔ voltage potentials, etc. Main question: How to compute ( x ∗ , λ ∗ )? Yu. Nesterov Primal-dual method for functional constraints 6/20

  7. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 � m λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , implies x ∗ ∈ Arg min x ∈ Q L ( x , λ ∗ ). Define the dual function φ ( λ ) = min x ∈ Q L ( x , λ ), λ ≥ 0. It is concave! By Danskin’s Theorem, ∇ φ ( λ ) = ( f 1 ( x ( λ )) , . . . , f m ( x ( λ )), with x ( λ ) ∈ Arg min x ∈ Q L ( x , λ ). Conditions KKT(2,3): f i ( x ∗ ) ≤ 0, λ ( i ) ∗ f i ( x ∗ ) = 0, i = 1 , . . . , m , imply ( x ∗ = x ( λ ∗ )) λ ∗ ∈ Arg max λ ≥ 0 φ ( λ ). Yu. Nesterov Primal-dual method for functional constraints 7/20

  8. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Each iteration is time consuming. Unclear termination criterion. � 1 � Low rate of convergence ( O upper-level iterations). ǫ 2 Yu. Nesterov Primal-dual method for functional constraints 8/20

  9. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian � � 2 � m λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Consider the dual function ˆ � φ ( λ ) = min L ( x , λ ). x ∈ Q Main properties. Function ˆ φ is concave. Its gradient is Lipschitz continuous with constant 1 K . Its unconstrained maximum is attained at the optimal dual solution. The corresponding point ˆ x ( λ ∗ ) is the optimal primal solution. � � λ ( i ) + Kf i ( x ) + = λ ( i ) Hint: Check that the equation is equivalent to KKT(2,3). Yu. Nesterov Primal-dual method for functional constraints 9/20

  10. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Advantage: Fast convergence of the dual process. Disadvantages: Difficult iteration. Unclear termination. No global complexity analysis. Do we have an alternative? Yu. Nesterov Primal-dual method for functional constraints 10/20

  11. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q .) Defining the Lagrangian � m λ ( i ) f i ( x ) , x ∈ Q , λ ∈ R m L ( x , λ ) = f 0 ( x ) + + , i =1 def we can introduce the Lagrangian dual problem f ∗ = sup φ ( λ ), λ ∈ R m + where φ ( λ ) def = inf x ∈ Q L ( x , λ ). Clearly, f ∗ ≥ f ∗ . Later, we will show f ∗ = f ∗ algorithmically . Yu. Nesterov Primal-dual method for functional constraints 11/20

  12. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Assume d ( x 0 ) = 0. Bregman distance: β ( x , y ) = d ( y ) − d ( x ) − �∇ d ( x ) , y − x � , x , y ∈ Q . 2 � x − y � 2 for all x , y ∈ Q . Clearly, β ( x , y ) ≥ 1 Bregman mapping: for x ∈ Q , g ∈ E ∗ and h > 0 define B h ( x , g ) = arg min y ∈ Q { h � g , y − x � + β ( x , y ) } . def The first-order condition for point x + = B h ( x , g ) is as follows: � hg + ∇ d ( x + ) − ∇ d ( x ) , y − x + � ≥ 0 , y ∈ Q . Yu. Nesterov Primal-dual method for functional constraints 12/20

  13. Examples � n � 1 / 2 � ( x ( i ) ) 2 We choose � x � = 1. Euclidean distance. and i =1 d ( x ) = 1 2 � x � 2 . Then β ( x , y ) = 1 2 � x − y � 2 , and we have B h ( x , g ) = Projection Q ( x − hg ). � n | x ( i ) | and 2. Entropy distance. We choose � x � = i =1 � n x ( i ) ln x ( i ) . d ( x ) = ln n + Then i =1 � n y ( i ) [ln y ( i ) − ln x ( i ) ]. β ( x , y ) = i =1 � n x ( i ) = 1 } , then If Q = { x ∈ R n + : i =1 � � � n B ( i ) h ( x , g ) = x ( i ) e − hg ( i ) / x ( j ) e − hg ( j ) , i = 1 , . . . , n . j =1 Yu. Nesterov Primal-dual method for functional constraints 13/20

  14. Switching subgradient method Input parameter: the step size h > 0. Initialization : Compute the prox-center x 0 . Iteration k ≥ 0 : a) Define I k = { i ∈ { 1 , . . . , m } : f i ( x k ) > h �∇ f i ( x k ) � ∗ } . � � ∇ f 0 ( x k ) b) If I k = ∅ , then compute x k +1 = B h x k , . �∇ f 0 ( x k ) � ∗ c) If I k � = ∅ , then choose arbitrary i k ∈ I k and define f ik ( x k ) h k = ∗ . Compute x k +1 = B h k ( x k , ∇ f i k ( x k )). �∇ f ik ( x k ) � 2 After t ≥ 0 iterations, define F t = { k ∈ { 0 , . . . , t } : I k = ∅} . Denote N ( t ) = |F ( t ) | . It is possible that N ( t ) = 0. Yu. Nesterov Primal-dual method for functional constraints 14/20

  15. Finding the dual multipliers if N ( t ) > 0, define the dual multipliers as follows: = h � λ (0) 1 �∇ f 0 ( x k ) � ∗ , t k ∈F t � λ ( i ) 1 = h k , i = 1 , . . . , m , t λ (0) t k ∈A i ( t ) where A i ( t ) = { k ∈ { 0 , . . . , t } : i k = i } , 0 ≤ i ≤ m . Denote S t = � 1 �∇ f 0 ( x k ) � ∗ . If F t = ∅ , then we define S t = 0. k ∈F t For proving convergence of the switching strategy, we find an upper bound for the gap � f 0 ( x k ) δ t = 1 �∇ f 0 ( x k ) � ∗ − φ ( λ t ), S t k ∈F ( t ) assuming that N ( t ) > 0. Yu. Nesterov Primal-dual method for functional constraints 15/20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend