MATH 4211/6211 Optimization Algorithms for Constrained Optimization - PowerPoint PPT Presentation

MATH 4211/6211 – Optimization Algorithms for Constrained Optimization Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0

We know that the gradient method proceeds as x ( k +1) = x ( k ) + α k d ( k ) where d ( k ) is a descent direction (often chosen as a function of g ( k ) ). However, x ( k +1) is not necessarily in the feasible set Ω . Hence the projected gradient (PG) method proceeds as x ( k +1) = Π( x ( k ) + α k d ( k ) ) in order that x ( k ) ∈ Ω for all k . Here Π( x ) is the projection of x onto Ω . Xiaojing Ye, Math & Stat, Georgia State University 1

Definition. The projection Π onto Ω is defined by Π( z ) = arg min � x − z � x ∈ Ω Namely, Π( x ) is the “closest point” in Ω to x . Note that Π( x ) is itself an optimization problem, which may not have closed- form or be easy to solve in most cases. Xiaojing Ye, Math & Stat, Georgia State University 2

Example. Find the projection operators Π( x ) for the following sets Ω : 1. Ω = { x ∈ R n : � x � ∞ ≤ 1 } 2. Ω = { x ∈ R n : a i ≤ x i ≤ b i , ∀ i } 3. Ω = { x ∈ R n : � x � ≤ 1 } 4. Ω = { x ∈ R n : � x � = 1 } 5. Ω = { x ∈ R n : � x � 1 ≤ 1 } 6. Ω = { x ∈ R n : Ax = 0 } where A ∈ R m × n with m ≤ n is full rank. Xiaojing Ye, Math & Stat, Georgia State University 3

Example. Consider the constrained optimization problem: 1 2 x ⊤ Qx minimize � x � 2 = 1 subject to where Q ≻ 0 . Apply the PG method with a fixed step size α > 0 to this problem. Specifically: • Write down the explicit formula of x ( k +1) in terms of x ( k ) (assume never projecting 0 ). • Is it possible to ensure convergence when α is sufficiently small? λ max ) and x (0) is not orthogonal to the smallest 1 • Show that if α ∈ (0 , eigenvector corresponding to λ min , then x ( k ) converges. Here λ max ( λ min ) is the largest (smallest) eigenvalue of Q . Xiaojing Ye, Math & Stat, Georgia State University 4

Solution. We can see that the solution should be a unit eigenvector corresponding to λ min . x Recall that Π( x ) = � x � for all x � = 0 . We also know ∇ f ( x ) = Qx , and x ( k ) − α ∇ f ( x ( k ) ) = ( I − α Q ) x ( k ) . Therefore, PG with step size α is given by 1 x ( k +1) = β k ( I − α Q ) x ( k ) , where β k = � ( I − α Q ) x ( k ) � Note that, if x (0) is an eigenvector of Q corresponding to eigenvalue λ , then x (1) = β 0 ( I − α Q ) x (0) = β 0 (1 − αλ ) x (0) = x (0) and hence x ( k ) = x (0) for all k . Xiaojing Ye, Math & Stat, Georgia State University 5

Solution (cont.) Denote λ 1 ≤ · · · ≤ λ n the eigenvalues of Q , and v 1 , . . . , v n the corresponding eigenvectors. Now assume that x ( k ) = y ( k ) v 1 + · · · + y ( k ) n v n 1 Then we have x ( k +1) = Π(( I − α Q ) x ( k ) ) = β k y ( k ) (1 − αλ 1 ) v 1 + · · · + β k y ( k ) (1 − αλ n ) v n n 1 Denote β ( k ) = � k − 1 j =0 β j , then y ( k ) = β k − 1 y ( k − 1) (1 − αλ i ) = · · · = β ( k ) y (0) (1 − αλ i ) k i i i Xiaojing Ye, Math & Stat, Georgia State University 6

Solution (cont.) Therefore, we have   y ( k ) n n x ( k ) = y ( k ) v i = y ( k ) i � �  v 1 + v i   i 1 y ( k )  i =1 i =2 1 Furthermore, y ( k ) = β ( k ) y (0) (1 − αλ 1 ) k = y (0) � k (1 − αλ i ) k � 1 − αλ i i i i y ( k ) β ( k ) y (0) y (0) 1 − αλ 1 1 1 1 Note that y (0) � = 0 (since x (0) is not orthogonal to the eigenvector corre- 1 sponding to λ 1 ). As 0 < α < 1 λ n , we have � k � 0 < 1 − αλ i 1 − αλ i < 1 ⇒ → 0 as k → ∞ 1 − αλ 1 1 − αλ 1 for all λ i > λ 1 . Hence x ( k ) → v 1 . Xiaojing Ye, Math & Stat, Georgia State University 7

Projected gradient (PG) method for optimization with linear constraint: minimize f ( x ) subject to Ax = b Then PG is given by x ( k +1) = Π( x ( k ) − α k ∇ f ( x ( k ) )) where Π is the projection onto Ω := { x ∈ R n : Ax = b } . Xiaojing Ye, Math & Stat, Georgia State University 8

We first consider the orthogonal projection onto the hyperplane Ψ = { x ∈ R n : Ax = 0 } : For any v ∈ R n , the projection onto Ψ is the solution to 1 2 � x − v � 2 minimize Ax = 0 subject to Let P : R n → R n denote this projector, i.e., P v is the point on Ψ closest to v . Xiaojing Ye, Math & Stat, Georgia State University 9

The Lagrange function is l ( x , λ ) = 1 2 � x − v � 2 + λ ⊤ Ax Hence the Lagrange (KKT) condition is ( x − v ) + A ⊤ λ = 0 Ax = 0 Left-multiplying the first equation by A and using Ax = 0 , we obtain λ = ( AA ⊤ ) − 1 Av x = ( I − A ⊤ ( AA ⊤ ) − 1 A ) v Denote the projector onto Ψ by P = I − A ⊤ ( AA ⊤ ) − 1 A Thus, the projection of v onto Ψ is P v . Xiaojing Ye, Math & Stat, Georgia State University 10

Proposition. The projector P has the following properties: 1. P = P ⊤ 2. P 2 = P . 3. P v = 0 iff ∃ λ ∈ R m s.t. v = A ⊤ λ . Namely N ( P ) = R ( A ⊤ ) . Proof. Items 1 and 2 are easy to verify. For item 3: ( ⇒ ) If P v = 0 , then v = A ⊤ ( AA ⊤ ) − 1 Av . Letting λ = ( AA ⊤ ) − 1 Av yields v = A ⊤ λ . ( ⇐ ) Suppose v = A ⊤ λ , then P v = ( I − A ⊤ ( AA ⊤ ) − 1 A ) A ⊤ λ = A ⊤ λ − A ⊤ λ = 0 . Xiaojing Ye, Math & Stat, Georgia State University 11

Similar to the derivation of P , we can obtain the projection onto Ω : 1 2 � x − v � 2 minimize Ax = b subject to (Write down the Lagrange function and KKT condition, and solve for ( x , λ ) .) The projection Π of v onto Ω is Π( v ) = P v − A ⊤ ( AA ⊤ ) − 1 b Xiaojing Ye, Math & Stat, Georgia State University 12

Proposition. Let x ∗ ∈ R n be feasible (i.e., Ax ∗ = b ), then P ∇ f ( x ∗ ) = 0 iff x ∗ satisfies the Lagrange condition. Proof. We have P ∇ f ( x ∗ ) = 0 ∇ f ( x ∗ ) ∈ N ( P ) ⇐ ⇒ ∇ f ( x ∗ ) ∈ R ( A ⊤ ) ⇐ ⇒ ∇ f ( x ∗ ) = − A ⊤ λ ∗ for some λ ∗ ∈ R m ⇐ ⇒ Xiaojing Ye, Math & Stat, Georgia State University 13

Now we are ready to write down explicitly the PG: x ( k +1) = Π( x ( k ) − α k ∇ f ( x ( k ) )) ( ∵ PG definition) = P ( x ( k ) − α k ∇ f ( x ( k ) )) − A ⊤ ( AA ⊤ ) − 1 b ( ∵ Relation of Π and P ) = P x ( k ) − A ⊤ ( AA ⊤ ) − 1 b − P α k ∇ f ( x ( k ) ) = Π( x ( k ) ) − α k P ∇ f ( x ( k ) ) ( ∵ Relation of Π and P ) = x ( k ) − α k P ∇ f ( x ( k ) ) ( ∵ x ( k ) ∈ Ω ) The only difference from standard gradient method is the additional P . Note that if x (0) ∈ Ω , then x ( k ) ∈ Ω for all k . Xiaojing Ye, Math & Stat, Georgia State University 14

Now we can consider the choice of α k . For example, we can use the projected steepest descent (PSD) method: f ( x ( k ) − α P ∇ f ( x ( k ) )) α k = arg min α> 0 Xiaojing Ye, Math & Stat, Georgia State University 15

Theorem. Let x ( k ) be generated by PSD. If P ∇ f ( x ( k ) ) � = 0 , then f ( x ( k +1) ) < f ( x ( k ) ) . Proof. For such x ( k ) , consider the line search function φ ( α ) := f ( x ( k ) − α P ∇ f ( x ( k ) )) . Then we have φ ′ ( α ) = −∇ f ( x ( k ) − α P ∇ f ( x ( k ) )) ⊤ P ∇ f ( x ( k ) ) . Hence φ ′ (0) = −∇ f ( x ( k ) ) ⊤ P ∇ f ( x ( k ) ) = −∇ f ( x ( k ) ) ⊤ P 2 ∇ f ( x ( k ) ) = −� P ∇ f ( x ( k ) ) � 2 < 0 , and therefore φ ( α k ) < φ (0) , i.e., f ( x ( k +1) ) < f ( x ( k ) ) . Xiaojing Ye, Math & Stat, Georgia State University 16

P ∇ f ( x ∗ ) = 0 is sufficient for global optimality if f is convex: Theorem. Let f be convex and x ∗ be feasible. Then P ∇ f ( x ∗ ) = 0 iff x ∗ is a global minimizer. Proof. From the previous proposition and convexity of f , we know x ∗ satisfies the Lagrange condition P ∇ f ( x ∗ ) = 0 ⇐ ⇒ x ∗ is a global minimizer ⇐ ⇒ Xiaojing Ye, Math & Stat, Georgia State University 17

Lagrange algorithm We first consider the Lagrange algorithm for equality-constrained optimization: minimize f ( x ) h ( x ) = 0 subject to where f, h ∈ C 2 . Recall the Lagrange function l : R n + m → R is l ( x , λ ) = f ( x ) + h ( x ) ⊤ λ . We denote its Hessian with respect to x by ∇ 2 x l ( x , λ ) = ∇ 2 x f ( x ) + D 2 x h ( x ) ⊤ λ ∈ R n × n Xiaojing Ye, Math & Stat, Georgia State University 18

Recall the Lagrange condition is ∇ f ( x ) + D h ( x ) ⊤ λ = 0 ∈ R n h ( x ) = 0 ∈ R m The Lagrange algorithm is given by x ( k +1) = x ( k ) − α k ( ∇ f ( x ( k ) ) + D h ( x ( k ) ) ⊤ λ ( k ) ) λ ( k +1) = λ ( k ) + β k h ( x ( k ) ) which is like “gradient descent for x ” and “gradient ascent for λ ” of l . Here α k , β k ≥ 0 are step sizes. WLOG, we can assume α k = β k for all k by scaling λ ( k ) properly. It is easy to verify that, if ( x ( k ) , λ ( k ) ) → ( x ∗ , λ ∗ ) , then ( x ∗ , λ ∗ ) satisfies the Lagrange condition. Xiaojing Ye, Math & Stat, Georgia State University 19

MATH 4211/6211 Optimization Algorithms for Constrained Optimization - PowerPoint PPT Presentation

MATH 4211/6211 Optimization Algorithms for Constrained Optimization Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 We know that the gradient method

MATH 4211/6211 Optimization Convex Optimization Problems Xiaojing Ye Department of

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Linear Programming Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Non-Simplex Methods for LP Xiaojing Ye Department of Mathematics

Algorithms for constrained local optimization Fabio Schoen 2008

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization VIII:

Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

AM 205: lecture 20 Today: PDE optimization, constrained optimization example New topic:

PDE-Constrained Optimization Using Hyper-Reduced Models Matthew J. Zahr and Charbel Farhat

PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models Matthew J.

Accelerating PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models

Mixed-Integer PDE-Constrained Optimization Frontiers in PDE-constrained Optimization Pelin Cay,

Ch02. Constrained Optimization Ping Yu Faculty of Business and Economics The University of Hong

Presentation constrained optimization Wenda Chen Speech Data and Constrained Optimization

High resolution holographic image synthesis for future display eyeglasses Praneeth Chakravarthula

Structuring Typical Evolutions using Temporal-Driven Constrained Clustering 8 November 2012

Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval Petra Galukov,

16. Review of convex optimization Convex sets and functions Convex programming models

Local Maximal Stack Scores with General Loop Penalty Function EVA 2005, Gothenburg Niels Richard

Evolutionary Algorithms To use evolutionary algorithms your must: Fitness Define your

A. Telcs Supporting research schools at AFRT December 2012 Hong the University of Pannonia

Hierarchical tilings and their hulls Jamie Walton University of York, UK 2016 Summer School on

MATH 4211/6211 Optimization Algorithms for Constrained Optimization - PowerPoint PPT Presentation

MATH 4211/6211 Optimization Algorithms for Constrained Optimization Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 We know that the gradient method

MATH 4211/6211 Optimization Convex Optimization Problems Xiaojing Ye Department of

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics &amp;

MATH 4211/6211 Optimization Linear Programming Xiaojing Ye Department of Mathematics &amp;

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department of Mathematics &amp;

MATH 4211/6211 Optimization Non-Simplex Methods for LP Xiaojing Ye Department of Mathematics

Algorithms for constrained local optimization Fabio Schoen 2008

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization VIII:

Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

AM 205: lecture 20 Today: PDE optimization, constrained optimization example New topic:

PDE-Constrained Optimization Using Hyper-Reduced Models Matthew J. Zahr and Charbel Farhat

PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models Matthew J.

Accelerating PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models

Mixed-Integer PDE-Constrained Optimization Frontiers in PDE-constrained Optimization Pelin Cay,

Ch02. Constrained Optimization Ping Yu Faculty of Business and Economics The University of Hong

Presentation constrained optimization Wenda Chen Speech Data and Constrained Optimization

High resolution holographic image synthesis for future display eyeglasses Praneeth Chakravarthula

Structuring Typical Evolutions using Temporal-Driven Constrained Clustering 8 November 2012

Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval Petra Galukov,

16. Review of convex optimization Convex sets and functions Convex programming models

Local Maximal Stack Scores with General Loop Penalty Function EVA 2005, Gothenburg Niels Richard

Evolutionary Algorithms To use evolutionary algorithms your must: Fitness Define your

A. Telcs Supporting research schools at AFRT December 2012 Hong the University of Pannonia

Hierarchical tilings and their hulls Jamie Walton University of York, UK 2016 Summer School on

MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Linear Programming Xiaojing Ye Department of Mathematics &

MATH 4211/6211 Optimization Newtons Method Xiaojing Ye Department of Mathematics &