Primal-dual Subgradient Method for Convex Problems with Functional - PowerPoint PPT Presentation

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual method for functional constraints 1/20

Outline 1 Constrained optimization problem 2 Lagrange multipliers 3 Dual function and dual problem 4 Augmented Lagrangian 5 Switching subgradient method 6 Finding the dual multipliers 7 Complexity analysis Yu. Nesterov Primal-dual method for functional constraints 2/20

Optimization problem: simple constraints Consider the problem: min x ∈ Q f ( x ) , where Q is a closed convex set: x , y ∈ Q ⇒ [ x , y ] ⊆ Q , f is a subdifferentiable on Q convex function: f ( y ) ≥ f ( x ) + �∇ f ( x ) , y − x � , x , y ∈ Q , ∇ f ( x ) ∈ ∂ f ( x ). Optimality condition: point x ∗ ∈ Q is optimal iff �∇ f ( x ∗ ) , x − x ∗ � ≥ 0 , ∀ x ∈ Q . Interpretation: Function increases along any feasible direction. Yu. Nesterov Primal-dual method for functional constraints 3/20

Examples Let x ∗ ∈ int Q . 1. Interior solution. Then �∇ f ( x ∗ ) , x − x ∗ � ≥ 0, ∀ x ∈ Q implies ∇ f ( x ∗ ) = 0. 2. Optimization over positive orthant. � � x ∈ R n : x ( i ) ≥ 0 , i = 1 . . . . , n Let Q ≡ R n + = . ∀ x ∈ R n Optimality condition: �∇ f ( x ∗ ) , x − x ∗ � ≥ 0 , + . � � x ( i ) − x ( i ) ∀ x ( i ) ≥ 0. Coordinate form: ∇ i f ( x ∗ ) ≥ 0 , ∗ This means that (tend x ( i ) → ∞ ) ∇ i f ( x ∗ ) ≥ 0 , i = 1 , . . . , n , x ( i ) (set x ( i ) = 0.) ∗ ∇ i f ( x ∗ ) = 0 , i = 1 , . . . , n , Yu. Nesterov Primal-dual method for functional constraints 4/20

Optimization problem: functional constraints x ∈ Q { f 0 ( x ) , f i ( x ) ≤ 0 , i = 1 , . . . , m } , Problem: min where Q is a closed convex set, all f i are convex and subdifferentiable on Q , i = 0 , . . . , m : f i ( y ) ≥ f i ( x ) + �∇ f i ( x ) , y − x � , x , y ∈ Q , ∇ f i ( x ) ∈ ∂ f i ( x ). Optimality condition (KKT, 1951): point x ∗ ∈ Q is optimal iff there exist Lagrange multipliers λ ( i ) ∗ ≥ 0, i = 1 , . . . , m , such that � m λ ( i ) (1) : �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0 , ∀ x ∈ Q , i =1 (2) : f i ( x ∗ ) ≤ 0 , i = 1 , . . . , m , (feasibility) λ ( i ) (3) : ∗ f i ( x ∗ ) = 0 , i = 1 , . . . , m . (complementary slackness) Yu. Nesterov Primal-dual method for functional constraints 5/20

Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Observation: in any case, x ∗ is the optimal solution of problem P I . Interpretation: λ ( i ) are the shadow prices for resources. ∗ (Kantorovich, 1939) Application examples: Traffic congestion: car flows on roads ⇔ size of queues. Electrical networks: currents in the wires ⇔ voltage potentials, etc. Main question: How to compute ( x ∗ , λ ∗ )? Yu. Nesterov Primal-dual method for functional constraints 6/20

Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 � m λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , implies x ∗ ∈ Arg min x ∈ Q L ( x , λ ∗ ). Define the dual function φ ( λ ) = min x ∈ Q L ( x , λ ), λ ≥ 0. It is concave! By Danskin’s Theorem, ∇ φ ( λ ) = ( f 1 ( x ( λ )) , . . . , f m ( x ( λ )), with x ( λ ) ∈ Arg min x ∈ Q L ( x , λ ). Conditions KKT(2,3): f i ( x ∗ ) ≤ 0, λ ( i ) ∗ f i ( x ∗ ) = 0, i = 1 , . . . , m , imply ( x ∗ = x ( λ ∗ )) λ ∗ ∈ Arg max λ ≥ 0 φ ( λ ). Yu. Nesterov Primal-dual method for functional constraints 7/20

Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Each iteration is time consuming. Unclear termination criterion. � 1 � Low rate of convergence ( O upper-level iterations). ǫ 2 Yu. Nesterov Primal-dual method for functional constraints 8/20

Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian � � 2 � m λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Consider the dual function ˆ � φ ( λ ) = min L ( x , λ ). x ∈ Q Main properties. Function ˆ φ is concave. Its gradient is Lipschitz continuous with constant 1 K . Its unconstrained maximum is attained at the optimal dual solution. The corresponding point ˆ x ( λ ∗ ) is the optimal primal solution. � � λ ( i ) + Kf i ( x ) + = λ ( i ) Hint: Check that the equation is equivalent to KKT(2,3). Yu. Nesterov Primal-dual method for functional constraints 9/20

Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Advantage: Fast convergence of the dual process. Disadvantages: Difficult iteration. Unclear termination. No global complexity analysis. Do we have an alternative? Yu. Nesterov Primal-dual method for functional constraints 10/20

Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q .) Defining the Lagrangian � m λ ( i ) f i ( x ) , x ∈ Q , λ ∈ R m L ( x , λ ) = f 0 ( x ) + + , i =1 def we can introduce the Lagrangian dual problem f ∗ = sup φ ( λ ), λ ∈ R m + where φ ( λ ) def = inf x ∈ Q L ( x , λ ). Clearly, f ∗ ≥ f ∗ . Later, we will show f ∗ = f ∗ algorithmically . Yu. Nesterov Primal-dual method for functional constraints 11/20

Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Assume d ( x 0 ) = 0. Bregman distance: β ( x , y ) = d ( y ) − d ( x ) − �∇ d ( x ) , y − x � , x , y ∈ Q . 2 � x − y � 2 for all x , y ∈ Q . Clearly, β ( x , y ) ≥ 1 Bregman mapping: for x ∈ Q , g ∈ E ∗ and h > 0 define B h ( x , g ) = arg min y ∈ Q { h � g , y − x � + β ( x , y ) } . def The first-order condition for point x + = B h ( x , g ) is as follows: � hg + ∇ d ( x + ) − ∇ d ( x ) , y − x + � ≥ 0 , y ∈ Q . Yu. Nesterov Primal-dual method for functional constraints 12/20

Examples � n � 1 / 2 � ( x ( i ) ) 2 We choose � x � = 1. Euclidean distance. and i =1 d ( x ) = 1 2 � x � 2 . Then β ( x , y ) = 1 2 � x − y � 2 , and we have B h ( x , g ) = Projection Q ( x − hg ). � n | x ( i ) | and 2. Entropy distance. We choose � x � = i =1 � n x ( i ) ln x ( i ) . d ( x ) = ln n + Then i =1 � n y ( i ) [ln y ( i ) − ln x ( i ) ]. β ( x , y ) = i =1 � n x ( i ) = 1 } , then If Q = { x ∈ R n + : i =1 � � � n B ( i ) h ( x , g ) = x ( i ) e − hg ( i ) / x ( j ) e − hg ( j ) , i = 1 , . . . , n . j =1 Yu. Nesterov Primal-dual method for functional constraints 13/20

Switching subgradient method Input parameter: the step size h > 0. Initialization : Compute the prox-center x 0 . Iteration k ≥ 0 : a) Define I k = { i ∈ { 1 , . . . , m } : f i ( x k ) > h �∇ f i ( x k ) � ∗ } . � � ∇ f 0 ( x k ) b) If I k = ∅ , then compute x k +1 = B h x k , . �∇ f 0 ( x k ) � ∗ c) If I k � = ∅ , then choose arbitrary i k ∈ I k and define f ik ( x k ) h k = ∗ . Compute x k +1 = B h k ( x k , ∇ f i k ( x k )). �∇ f ik ( x k ) � 2 After t ≥ 0 iterations, define F t = { k ∈ { 0 , . . . , t } : I k = ∅} . Denote N ( t ) = |F ( t ) | . It is possible that N ( t ) = 0. Yu. Nesterov Primal-dual method for functional constraints 14/20

Finding the dual multipliers if N ( t ) > 0, define the dual multipliers as follows: = h � λ (0) 1 �∇ f 0 ( x k ) � ∗ , t k ∈F t � λ ( i ) 1 = h k , i = 1 , . . . , m , t λ (0) t k ∈A i ( t ) where A i ( t ) = { k ∈ { 0 , . . . , t } : i k = i } , 0 ≤ i ≤ m . Denote S t = � 1 �∇ f 0 ( x k ) � ∗ . If F t = ∅ , then we define S t = 0. k ∈F t For proving convergence of the switching strategy, we find an upper bound for the gap � f 0 ( x k ) δ t = 1 �∇ f 0 ( x k ) � ∗ − φ ( λ t ), S t k ∈F ( t ) assuming that N ( t ) > 0. Yu. Nesterov Primal-dual method for functional constraints 15/20

Primal-dual Subgradient Method for Convex Problems with Functional - PowerPoint PPT Presentation

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual method for functional constraints

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

Kurdyka- Lojasiewicz inequality and Kurdyka- Lojasiewicz inequality and subgradient

Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

More Subgradient Calculus: Proximal Operator Following functions are again convex, but again, may

More Subgradient Calculus: Function Convexity first Following functions are again convex, but

Minimizing within convex bodies using a convex hull method Edouard Oudet Thomas

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained

Bellmans curse of dimensionality n n-dimensional state space n Number of states grows

Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical

Stratasys Q1 2019 Financial Results Conference Call May 2, 2019 2 Q1 2019 Conference call

Early Literacy Achievements, Population Density and the Transition to Modern Growth Dominique

Message-Passing Programming Cellular Automaton Exercise Traffic simulation Boundary swapping

The Art Gallery Problem for polyhedra Carleton Algorithms Seminar Giovanni Viglietta School of

Analyticity of solutions to parabolic equations and observability Coron60: Conference in honor of

An asymptotic preserving unified gas kinetic scheme for the grey radiative transfer equations