Convex Optimization 9. Unconstrained minimization Prof. Ying Cui - PowerPoint PPT Presentation

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40

Outline Unconstrained minimization problems Descent methods Gradient descent method Steepest descent method Newton’s method Self-concordance Implementation SJTU Ying Cui 2 / 40

Unconstrained minimization min f ( x ) x assumptions: ◮ assume f : R n → R is convex, twice continuously differentiable (implying that dom f is open) ◮ assume there exists an optimal point x ∗ (optimal value p ∗ = inf x f ( x ) is attained and finite) a necessary and sufficient condition for optimality: ∇ f ( x ∗ ) = 0 ◮ solving unconstrained minimization problem is the same as finding a solution of optimality equation ◮ in a few special cases, can be solved analytically ◮ usually, must be solved by an iterative algorithm ◮ produce a sequence of points x ( k ) ∈ dom f, k = 0 , 1 , ... with f ( x ( k ) ) → p ∗ , as k → ∞ ◮ terminated when f ( x ( k ) ) − p ∗ ≤ ǫ for some tolerance ǫ > 0 SJTU Ying Cui 3 / 40

Initial point and sublevel set algorithms in this chapter require a starting point x (0) such that ◮ x (0) ∈ dom f ◮ sublevel set S = { x | f ( x ) ≤ f ( x (0) ) } is closed (hard to verify) 2nd condition is satisfied for all x (0) ∈ dom f if f is closed, i.e., all sublevel sets are closed, equivalent to epi f is closed ◮ true if f is continuous and dom f = R n ◮ true if f ( x ) → ∞ as x → bd dom f examples of differentiable functions with closed sublevel sets: m m � � exp( a T log( b i − a T f ( x ) = log( i x + b i )) , f ( x ) = − i x ) i =1 i =1 SJTU Ying Cui 4 / 40

Strong convexity and implications f is strongly convex on S if there exists an m > 0 such that ∇ 2 f ( x ) � mI for all x ∈ S implications ◮ for x, y ∈ S , f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) + m 2 || x − y || 2 2 ◮ m = 0 : recover the basic inequality characterizing convexity ◮ m > 0 : a better lower bound than follows from convexity alone ◮ imply that S is bounded ◮ p ∗ > −∞ and for x ∈ S , f ( x ) − p ∗ ≤ 2 m ||∇ f ( x ) || 2 1 2 ◮ if gradient is small at a point, then the point is nearly optimal ◮ a condition for suboptimality generalizing optimality condition ||∇ f ( x ) || 2 ≤ (2 mǫ ) 1 / 2 = ⇒ f ( x ) − p ∗ ≤ ǫ ◮ useful as a stopping criterion if m is known ◮ upper bound on ∇ f ( x ) : there exists an M > 0 such that ∇ 2 f ( x ) � MI for all x ∈ S SJTU Ying Cui 5 / 40

Condition number of matrix and convex set ◮ condition number of a matrix: the ratio of its largest eigenvalue to its smallest eigenvalue ◮ condition number of a convex set: square of the ratio of its maximum width to its minimum width ◮ width of a convex set C in the direction q with || q || 2 = 1 : W ( C, q ) = sup z ∈ C q T z − inf z ∈ C q T z ◮ minimum width and maximum width of C : W min = inf || q || 2 =1 W ( C, q ) and W max = sup || q || 2 =1 W ( C, q ) ◮ condition number of C : cond ( C ) = W 2 max W 2 min ◮ a measure of its anisotropy or eccentricity: cond ( C ) small means C has approximately the same width in all directions (nearly spherical); cond ( C ) large means that C is far wider in some directions than in others SJTU Ying Cui 6 / 40

Condition number of sublevel sets mI � ∇ 2 f ( x ) � MI for all x ∈ S ◮ upper bound of condition number of ∇ 2 f ( x ) : cond ( ∇ 2 f ( x )) ≤ M/m ◮ upper bound of condition number of sublevel set C α = { x | f ( x ) ≤ α } , p ∗ < α ≤ f ( x (0) ) : cond ( C α ) ≤ M/m ◮ geometric interpretation: α → p ∗ cond ( C α ) = cond ( ∇ 2 f ( x ∗ )) lim ◮ condition number of the sublevel sets of f (which is bounded by M/m ) has a strong effect on the efficiency of some common methods for unconstrained minimization SJTU Ying Cui 7 / 40

Descent methods algorithms described in this chapter produce a minimizing sequence x ( k ) , k = 1 , · · · , where x ( k +1) = x ( k ) + t ( k ) ∆ x ( k ) with f ( x ( k +1) ) < f ( x ( k ) ) and t ( k ) > 0 ◮ other notations: x + = x + t ∆ x, x := x + t ∆ x ◮ ∆ x is step (or search direction); t is step size (or step length) ◮ convexity of f implies ∇ f ( x ( k ) ) T ∆ x ( k ) < 0 (i.e., ∆ x ( k ) is a descent direction) General descent method . given a starting point x ∈ dom f . repeat 1. Determine a descent direction ∆ x . 2. Line search. Choose a step size t > 0 . 3. Update. x := x + t ∆ x until stopping criterion is satisfied. SJTU Ying Cui 8 / 40

Line search types exact line search : t = arg min t> 0 f ( x + t ∆ x ) ◮ minimize f along ray { x + t ∆ x | t ≥ 0 } ◮ used when cost of the minimization problem with one variable is low compared to the cost of computing the search direction itself ◮ in some special cases the minimizer can be found analytically, and in others it can be computed efficiently SJTU Ying Cui 9 / 40

Line search types backtracking line search (with parameters α ∈ (0 , 1 2 ) , β ∈ (0 , 1) ) ◮ reduce f enough along ray { x + t ∆ x | t ≥ 0 } ◮ starting at t = 1 , repeat t := βt until f ( x + t ∆ x ) < f ( x ) + αt ∇ f ( x ) T ∆ x ◮ convexity of f : f ( x + t ∆ x ) ≥ f ( x ) + t ∇ f ( x ) T ∆ x ◮ constant α can be interpreted as the fraction of decrease in f predicted by linear extrapolation that we will accept ◮ graphical interpretation: backtrack until t ≤ t 0 f ( x + t ∆ x ) f ( x ) + t ∇ f ( x ) T ∆ x f ( x ) + αt ∇ f ( x ) T ∆ x t t = 0 t 0 Figure 9.1 Backtracking line search. The curve shows f , restricted to the line over which we search. The lower dashed line shows the linear extrapolation of f , and the upper dashed line has a slope a factor of α smaller. The backtracking condition is that f lies below the upper dashed line, i.e. , 0 ≤ t ≤ t 0 . SJTU Ying Cui 10 / 40

Gradient descent method general descent method with ∆ x = −∇ f ( x ) Gradient descent method . given a starting point x ∈ dom f . repeat 1. ∆ x := −∇ f ( x ) . 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x + t ∆ x . until stopping criterion is satisfied. ◮ stopping criterion usually of the form ||∇ f ( x ) || 2 ≤ ǫ ◮ convergence result: for strongly convex f , f ( x ( k ) ) − p ∗ ≤ c k ( f ( x (0) ) − p ∗ ) ◮ exact line search: c = 1 − m/M < 1 ◮ backtracking line search: c = 1 − min { 2 mα, 2 βαm/M } < 1 ◮ linear convergence: the error lies below a line on a log-linear plot of error versus iteration number ◮ very simple, but often very slow; rarely used in practice SJTU Ying Cui 11 / 40

Examples a quadratic problem in R 2 f ( x ) = (1 / 2)( x 2 1 + γx 2 2 ) ( γ > 0) with exact line search, starting at x (0) = ( γ, 1) : closed-form expressions for iterates � γ − 1 � k � � k � γ − 1 � 2 k − γ − 1 x ( k ) , x ( k ) , f ( x ( k ) ) = f ( x (0) ) = γ = γ 1 2 γ + 1 γ + 1 γ + 1 ◮ exact solution found in one iteration if γ = 1 ; convergence rapid if γ not far from 1; convergence very slow if γ ≫ 1 or γ ≪ 1 4 x (0) x 2 0 x (1) − 4 − 10 0 10 x 1 Figure 9.2 Some contour lines of the function f ( x ) = (1 / 2)( x 2 1 + 10 x 2 2 ). The condition number of the sublevel sets, which are ellipsoids, is exactly 10. The figure shows the iterates of the gradient method with exact line search, started at x (0) = (10 , 1). SJTU Ying Cui 12 / 40

Examples a nonquadratic problem in R 2 f ( x 1 , x 2 ) = e x 1 +3 x 2 − 0 . 1 + e x 1 − 3 x 2 − 0 . 1 + e − x 1 − 0 . 1 ◮ backtracking line search: approximately linear convergence (sublevel sets of f not too badly conditioned, M/m not too large) ◮ exact line search: approximately linear convergence, about twice as fast as with backtracking line search 10 5 10 0 x (0) f ( x ( k ) ) − p ⋆ backtracking l.s. x (2) 10 − 5 x (0) 10 − 10 exact l.s. x (1) x (1) 10 − 15 0 5 10 15 20 25 k Figure 9.4 Error f ( x ( k ) ) − p ⋆ versus iteration k of the gradient method with Figure 9.3 Iterates of the gradient method with backtracking line search, backtracking and exact line search, for the problem in R 2 with objective f for the problem in R 2 with objective f given in (9.20). The dashed curves given in (9.20). The plot shows nearly linear convergence, with the error are level curves of f , and the small circles are the iterates of the gradient method. The solid lines, which connect successive iterates, show the scaled reduced approximately by the factor 0 . 4 in each iteration of the gradient steps t ( k ) ∆ x ( k ) . Figure 9.5 Iterates of the gradient method with exact line search for the method with backtracking line search, and by the factor 0 . 2 in each iteration problem in R 2 with objective f given in (9.20). of the gradient method with exact line search. SJTU Ying Cui 13 / 40

Examples a problem in R 100 500 � f ( x ) = c T x − log( b i − a T i x ) i =1 ◮ backtracking line search: approximately linear convergence ◮ exact line search: approximately linear convergence, only a bit faster than with backtracking line search 10 4 10 2 f ( x ( k ) ) − p ⋆ 10 0 exact l.s. 10 − 2 backtracking l.s. 10 − 4 0 50 100 150 200 k Figure 9.6 Error f ( x ( k ) ) − p ⋆ versus iteration k for the gradient method with backtracking and exact line search, for a problem in R 100 . SJTU Ying Cui 14 / 40

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui - PowerPoint PPT Presentation

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization problems Descent methods

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

16. Review of convex optimization Convex sets and functions Convex programming models

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

Improved implementation for finding text similarities in large collections of data Notebook for

This is a parallel parrot! Adam Sampson Institute of Arts, Media and Computer Games

Cystically Dominant Lesions Fundamental skill of diagnostic pathologists of the Breast

Assessing inter-rater agreement in Stata Daniel Klein klein.daniel.81@gmail.com

Direct computation of knot Floer homology and the Upsilon invariant Taketo Sano, joint work with

Reassessing Effective Protection Rates in a Trade in Tasks perspective: Evolution of Trade Policy

CS6200 Information Retrieval David Smith College of Computer and Information Science

Word Sense Disambiguation WORD SENSE DISAMBIGUATION Homonymy and Polysemy As we have seen,

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui - PowerPoint PPT Presentation

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization problems Descent methods

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

16. Review of convex optimization Convex sets and functions Convex programming models

Faster convex optimization Simulated annealing &amp; Interior point Elad Hazan Joint work with

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

Improved implementation for finding text similarities in large collections of data Notebook for

This is a parallel parrot! Adam Sampson Institute of Arts, Media and Computer Games

Cystically Dominant Lesions Fundamental skill of diagnostic pathologists of the Breast

Assessing inter-rater agreement in Stata Daniel Klein klein.daniel.81@gmail.com

Direct computation of knot Floer homology and the Upsilon invariant Taketo Sano, joint work with

Reassessing Effective Protection Rates in a Trade in Tasks perspective: Evolution of Trade Policy

CS6200 Information Retrieval David Smith College of Computer and Information Science

Word Sense Disambiguation WORD SENSE DISAMBIGUATION Homonymy and Polysemy As we have seen,

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with