Convex Optimization in Machine Learning and Inverse Problems Part - PowerPoint PPT Presentation

Convex Optimization in Machine Learning and Inverse Problems Part 3: Augmented Lagrangian Methods ario A. T. Figueiredo 1 and Stephen J. Wright 2 M´ 1 Instituto de Telecomunica¸ c˜ oes, Instituto Superior T´ ecnico, Lisboa, Portugal 2 Computer Sciences Department, University of Wisconsin, Madison, WI, USA Condensed version of ICCOPT tutorial, Lisbon, Portugal, 2013 M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 1 / 27

Augmented Lagrangian Methods Consider a linearly constrained problem, min f ( x ) s.t. Ax = b . where f is a proper, lower semi-continuous, convex function. The augmented Lagrangian is (with ρ > 0) ρ L ( x , λ ; ρ ) := f ( x ) + λ T ( Ax − b ) 2 � Ax − b � 2 + 2 � �� Lagrangian “augmentation” M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 2 / 27

Augmented Lagrangian Methods Consider a linearly constrained problem, min f ( x ) s.t. Ax = b . where f is a proper, lower semi-continuous, convex function. The augmented Lagrangian is (with ρ > 0) ρ L ( x , λ ; ρ ) := f ( x ) + λ T ( Ax − b ) 2 � Ax − b � 2 + 2 � �� Lagrangian “augmentation” Basic augmented Lagrangian (a.k.a. method of multipliers) is x k = arg min x L ( x , λ k − 1 ; ρ ); λ k = λ k − 1 + ρ ( Ax k − b ); (Hestenes, 1969; Powell, 1969) M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 2 / 27

A Favorite Derivation ...more or less rigorous for convex f . Write the problem as f ( x ) + λ T ( Ax − b ) . min max x λ Obviously, the max w.r.t. λ will be + ∞ , unless Ax = b , so this is equivalent to the original problem. M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 3 / 27

A Favorite Derivation ...more or less rigorous for convex f . Write the problem as f ( x ) + λ T ( Ax − b ) . min max x λ Obviously, the max w.r.t. λ will be + ∞ , unless Ax = b , so this is equivalent to the original problem. This equivalence is not very useful, computationally: the max λ function is highly nonsmooth w.r.t. x . Smooth it by adding a “proximal point” term, penalizing deviations from a prior estimate ¯ λ : � � f ( x ) + λ T ( Ax − b ) − 1 2 ρ � λ − ¯ λ � 2 min max . x λ M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 3 / 27

A Favorite Derivation ...more or less rigorous for convex f . Write the problem as f ( x ) + λ T ( Ax − b ) . min max x λ Obviously, the max w.r.t. λ will be + ∞ , unless Ax = b , so this is equivalent to the original problem. This equivalence is not very useful, computationally: the max λ function is highly nonsmooth w.r.t. x . Smooth it by adding a “proximal point” term, penalizing deviations from a prior estimate ¯ λ : � � f ( x ) + λ T ( Ax − b ) − 1 2 ρ � λ − ¯ λ � 2 min max . x λ Maximization w.r.t. λ is now trivial (a concave quadratic), yielding λ = ¯ λ + ρ ( Ax − b ) . M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 3 / 27

A Favorite Derivation (Cont.) Inserting λ = ¯ λ + ρ ( Ax − b ) leads to λ T ( Ax − b ) + ρ 2 � Ax − b � 2 = L ( x , ¯ f ( x ) + ¯ min λ ; ρ ) . x M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 4 / 27

A Favorite Derivation (Cont.) Inserting λ = ¯ λ + ρ ( Ax − b ) leads to λ T ( Ax − b ) + ρ 2 � Ax − b � 2 = L ( x , ¯ f ( x ) + ¯ min λ ; ρ ) . x Hence can view the augmented Lagrangian process as: ⋄ min x L ( x , ¯ λ ; ρ ) to get new x ; ⋄ Shift the “prior” on λ by updating to the latest max: ¯ λ + ρ ( Ax − b ). ⋄ repeat until convergence. M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 4 / 27

A Favorite Derivation (Cont.) Inserting λ = ¯ λ + ρ ( Ax − b ) leads to λ T ( Ax − b ) + ρ 2 � Ax − b � 2 = L ( x , ¯ f ( x ) + ¯ min λ ; ρ ) . x Hence can view the augmented Lagrangian process as: ⋄ min x L ( x , ¯ λ ; ρ ) to get new x ; ⋄ Shift the “prior” on λ by updating to the latest max: ¯ λ + ρ ( Ax − b ). ⋄ repeat until convergence. Add subscripts, and we recover the augmented Lagrangian algorithm of the first slide! Can also increase ρ (to sharpen the effect of the prox term), if needed. M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 4 / 27

Inequality Constraints, Nonlinear Constraints The same derivation can be used for inequality constraints: min f ( x ) s.t. Ax ≥ b . Apply the same reasoning to the constrained min-max formulation: λ ≥ 0 f ( x ) − λ T ( Ax − b ) . min max x M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 5 / 27

Inequality Constraints, Nonlinear Constraints The same derivation can be used for inequality constraints: min f ( x ) s.t. Ax ≥ b . Apply the same reasoning to the constrained min-max formulation: λ ≥ 0 f ( x ) − λ T ( Ax − b ) . min max x After the prox-term is added, can find the minimizing λ in closed form (as for prox-operators). Leads to update formula: � ¯ � max λ + ρ ( Ax − b ) , 0 . This derivation extends immediately to nonlinear constraints c ( x ) = 0 or c ( x ) ≥ 0. M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 5 / 27

“Explicit” Constraints, Inequality Constraints There may be other constraints on x (such as x ∈ Ω) that we prefer to handle explicitly in the subproblem. For the formulation min f ( x ) , s.t. Ax = b , x ∈ Ω, x the min x step can enforce x ∈ Ω explicitly: x k = arg min x ∈ Ω L ( x , λ k − 1 ; ρ ); λ k = λ k − 1 + ρ ( Ax k − b ); M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 6 / 27

“Explicit” Constraints, Inequality Constraints There may be other constraints on x (such as x ∈ Ω) that we prefer to handle explicitly in the subproblem. For the formulation min f ( x ) , s.t. Ax = b , x ∈ Ω, x the min x step can enforce x ∈ Ω explicitly: x k = arg min x ∈ Ω L ( x , λ k − 1 ; ρ ); λ k = λ k − 1 + ρ ( Ax k − b ); This gives an alternative way to handle inequality constraints: introduce slacks s , and enforce them explicitly. That is, replace min f ( x ) s.t. c ( x ) ≥ 0 , x by min f ( x ) s.t. c ( x ) = s , s ≥ 0 . x M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 6 / 27

“Explicit” Constraints, Inequality Constraints (Cont.) The augmented Lagrangian is now L ( x , s , λ ; ρ ) := f ( x ) + λ T ( c ( x ) − s ) + ρ 2 � c ( x ) − s � 2 2 . Enforce s ≥ 0 explicitly in the subproblem: ( x k , s k ) = arg min x , s L ( x , s , λ k − 1 ; ρ ) , s.t. s ≥ 0; λ k = λ k − 1 + ρ ( c ( x k ) − s k ) M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 7 / 27

“Explicit” Constraints, Inequality Constraints (Cont.) The augmented Lagrangian is now L ( x , s , λ ; ρ ) := f ( x ) + λ T ( c ( x ) − s ) + ρ 2 � c ( x ) − s � 2 2 . Enforce s ≥ 0 explicitly in the subproblem: ( x k , s k ) = arg min x , s L ( x , s , λ k − 1 ; ρ ) , s.t. s ≥ 0; λ k = λ k − 1 + ρ ( c ( x k ) − s k ) There are good algorithmic options for dealing with bound constraints s ≥ 0 (gradient projection and its enhancements). This is used in the Lancelot code (Conn et al., 1992) . M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 7 / 27

Quick History of Augmented Lagrangian Dates from at least 1969: Hestenes, Powell. Developments in 1970s, early 1980s by Rockafellar, Bertsekas, and others. Lancelot code for nonlinear programming: Conn, Gould, Toint, around 1992 (Conn et al., 1992) . Lost favor somewhat as an approach for general nonlinear programming during the next 15 years. Recent revival in the context of sparse optimization and its many applications, in conjunction with splitting / coordinate descent. M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 8 / 27

Alternating Direction Method of Multipliers (ADMM) Consider now problems with a separable objective of the form min ( x , z ) f ( x ) + h ( z ) s.t. Ax + Bz = c , for which the augmented Lagrangian is L ( x , z , λ ; ρ ) := f ( x ) + h ( z ) + λ T ( Ax + Bz − c ) + ρ 2 � Ax − Bz − c � 2 2 . M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 9 / 27

Alternating Direction Method of Multipliers (ADMM) Consider now problems with a separable objective of the form min ( x , z ) f ( x ) + h ( z ) s.t. Ax + Bz = c , for which the augmented Lagrangian is L ( x , z , λ ; ρ ) := f ( x ) + h ( z ) + λ T ( Ax + Bz − c ) + ρ 2 � Ax − Bz − c � 2 2 . Standard AL would minimize L ( x , z , λ ; ρ ) w.r.t. ( x , z ) jointly. However, these are coupled in the quadratic term, separability is lost M. Figueiredo and S. Wright Augmented Lagrangian Methods April 2016 9 / 27

Convex Optimization in Machine Learning and Inverse Problems Part - PowerPoint PPT Presentation

Convex Optimization in Machine Learning and Inverse Problems Part 3: Augmented Lagrangian Methods ario A. T. Figueiredo 1 and Stephen J. Wright 2 M 1 Instituto de Telecomunica c oes, Instituto Superior T ecnico, Lisboa, Portugal 2

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Convex Programs COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Convex

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

16. Review of convex optimization Convex sets and functions Convex programming models

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

CSCE 970 Lecture 7: Earth, cant afford to visit each area to deter- Clustering: Basic Concepts

Edit Timelines & Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda

Building Hardened Implementations of SCADA/ICS Protocols Using Language-Theoretic Security

Proximal Method with Contractions for Smooth Convex Optimization Nikita Doikov Yurii Nesterov

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Complexity of Composite Optimization Guanghui (George) Lan University of Florida Georgia

I see a cookie banner Is it even legal? Nataliia Bielova and Cristiana Santos joint work with

Probing correla4ons in A=3 systems using electron scaEering Reynier Cruz Torres Hall A/C

Sambuz

Useful Links

Newsletter

Mail Us

Convex Optimization in Machine Learning and Inverse Problems Part - PowerPoint PPT Presentation

Convex Optimization in Machine Learning and Inverse Problems Part 3: Augmented Lagrangian Methods ario A. T. Figueiredo 1 and Stephen J. Wright 2 M 1 Instituto de Telecomunica c oes, Instituto Superior T ecnico, Lisboa, Portugal 2

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Convex Programs COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Convex

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

16. Review of convex optimization Convex sets and functions Convex programming models

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

CSCE 970 Lecture 7: Earth, cant afford to visit each area to deter- Clustering: Basic Concepts

Edit Timelines &amp; Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda

Building Hardened Implementations of SCADA/ICS Protocols Using Language-Theoretic Security

Proximal Method with Contractions for Smooth Convex Optimization Nikita Doikov Yurii Nesterov

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Complexity of Composite Optimization Guanghui (George) Lan University of Florida Georgia

I see a cookie banner Is it even legal? Nataliia Bielova and Cristiana Santos joint work with

Probing correla4ons in A=3 systems using electron scaEering Reynier Cruz Torres Hall A/C

Sambuz

Useful Links

Newsletter

Mail Us

Edit Timelines & Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda