convex optimization
play

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui - PowerPoint PPT Presentation

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization problems Descent methods


  1. Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40

  2. Outline Unconstrained minimization problems Descent methods Gradient descent method Steepest descent method Newton’s method Self-concordance Implementation SJTU Ying Cui 2 / 40

  3. Unconstrained minimization min f ( x ) x assumptions: ◮ assume f : R n → R is convex, twice continuously differentiable (implying that dom f is open) ◮ assume there exists an optimal point x ∗ (optimal value p ∗ = inf x f ( x ) is attained and finite) a necessary and sufficient condition for optimality: ∇ f ( x ∗ ) = 0 ◮ solving unconstrained minimization problem is the same as finding a solution of optimality equation ◮ in a few special cases, can be solved analytically ◮ usually, must be solved by an iterative algorithm ◮ produce a sequence of points x ( k ) ∈ dom f, k = 0 , 1 , ... with f ( x ( k ) ) → p ∗ , as k → ∞ ◮ terminated when f ( x ( k ) ) − p ∗ ≤ ǫ for some tolerance ǫ > 0 SJTU Ying Cui 3 / 40

  4. Initial point and sublevel set algorithms in this chapter require a starting point x (0) such that ◮ x (0) ∈ dom f ◮ sublevel set S = { x | f ( x ) ≤ f ( x (0) ) } is closed (hard to verify) 2nd condition is satisfied for all x (0) ∈ dom f if f is closed, i.e., all sublevel sets are closed, equivalent to epi f is closed ◮ true if f is continuous and dom f = R n ◮ true if f ( x ) → ∞ as x → bd dom f examples of differentiable functions with closed sublevel sets: m m � � exp( a T log( b i − a T f ( x ) = log( i x + b i )) , f ( x ) = − i x ) i =1 i =1 SJTU Ying Cui 4 / 40

  5. Strong convexity and implications f is strongly convex on S if there exists an m > 0 such that ∇ 2 f ( x ) � mI for all x ∈ S implications ◮ for x, y ∈ S , f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) + m 2 || x − y || 2 2 ◮ m = 0 : recover the basic inequality characterizing convexity ◮ m > 0 : a better lower bound than follows from convexity alone ◮ imply that S is bounded ◮ p ∗ > −∞ and for x ∈ S , f ( x ) − p ∗ ≤ 2 m ||∇ f ( x ) || 2 1 2 ◮ if gradient is small at a point, then the point is nearly optimal ◮ a condition for suboptimality generalizing optimality condition ||∇ f ( x ) || 2 ≤ (2 mǫ ) 1 / 2 = ⇒ f ( x ) − p ∗ ≤ ǫ ◮ useful as a stopping criterion if m is known ◮ upper bound on ∇ f ( x ) : there exists an M > 0 such that ∇ 2 f ( x ) � MI for all x ∈ S SJTU Ying Cui 5 / 40

  6. Condition number of matrix and convex set ◮ condition number of a matrix: the ratio of its largest eigenvalue to its smallest eigenvalue ◮ condition number of a convex set: square of the ratio of its maximum width to its minimum width ◮ width of a convex set C in the direction q with || q || 2 = 1 : W ( C, q ) = sup z ∈ C q T z − inf z ∈ C q T z ◮ minimum width and maximum width of C : W min = inf || q || 2 =1 W ( C, q ) and W max = sup || q || 2 =1 W ( C, q ) ◮ condition number of C : cond ( C ) = W 2 max W 2 min ◮ a measure of its anisotropy or eccentricity: cond ( C ) small means C has approximately the same width in all directions (nearly spherical); cond ( C ) large means that C is far wider in some directions than in others SJTU Ying Cui 6 / 40

  7. Condition number of sublevel sets mI � ∇ 2 f ( x ) � MI for all x ∈ S ◮ upper bound of condition number of ∇ 2 f ( x ) : cond ( ∇ 2 f ( x )) ≤ M/m ◮ upper bound of condition number of sublevel set C α = { x | f ( x ) ≤ α } , p ∗ < α ≤ f ( x (0) ) : cond ( C α ) ≤ M/m ◮ geometric interpretation: α → p ∗ cond ( C α ) = cond ( ∇ 2 f ( x ∗ )) lim ◮ condition number of the sublevel sets of f (which is bounded by M/m ) has a strong effect on the efficiency of some common methods for unconstrained minimization SJTU Ying Cui 7 / 40

  8. Descent methods algorithms described in this chapter produce a minimizing sequence x ( k ) , k = 1 , · · · , where x ( k +1) = x ( k ) + t ( k ) ∆ x ( k ) with f ( x ( k +1) ) < f ( x ( k ) ) and t ( k ) > 0 ◮ other notations: x + = x + t ∆ x, x := x + t ∆ x ◮ ∆ x is step (or search direction); t is step size (or step length) ◮ convexity of f implies ∇ f ( x ( k ) ) T ∆ x ( k ) < 0 (i.e., ∆ x ( k ) is a descent direction) General descent method . given a starting point x ∈ dom f . repeat 1. Determine a descent direction ∆ x . 2. Line search. Choose a step size t > 0 . 3. Update. x := x + t ∆ x until stopping criterion is satisfied. SJTU Ying Cui 8 / 40

  9. Line search types exact line search : t = arg min t> 0 f ( x + t ∆ x ) ◮ minimize f along ray { x + t ∆ x | t ≥ 0 } ◮ used when cost of the minimization problem with one variable is low compared to the cost of computing the search direction itself ◮ in some special cases the minimizer can be found analytically, and in others it can be computed efficiently SJTU Ying Cui 9 / 40

  10. Line search types backtracking line search (with parameters α ∈ (0 , 1 2 ) , β ∈ (0 , 1) ) ◮ reduce f enough along ray { x + t ∆ x | t ≥ 0 } ◮ starting at t = 1 , repeat t := βt until f ( x + t ∆ x ) < f ( x ) + αt ∇ f ( x ) T ∆ x ◮ convexity of f : f ( x + t ∆ x ) ≥ f ( x ) + t ∇ f ( x ) T ∆ x ◮ constant α can be interpreted as the fraction of decrease in f predicted by linear extrapolation that we will accept ◮ graphical interpretation: backtrack until t ≤ t 0 f ( x + t ∆ x ) f ( x ) + t ∇ f ( x ) T ∆ x f ( x ) + αt ∇ f ( x ) T ∆ x t t = 0 t 0 Figure 9.1 Backtracking line search. The curve shows f , restricted to the line over which we search. The lower dashed line shows the linear extrapolation of f , and the upper dashed line has a slope a factor of α smaller. The backtracking condition is that f lies below the upper dashed line, i.e. , 0 ≤ t ≤ t 0 . SJTU Ying Cui 10 / 40

  11. Gradient descent method general descent method with ∆ x = −∇ f ( x ) Gradient descent method . given a starting point x ∈ dom f . repeat 1. ∆ x := −∇ f ( x ) . 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x + t ∆ x . until stopping criterion is satisfied. ◮ stopping criterion usually of the form ||∇ f ( x ) || 2 ≤ ǫ ◮ convergence result: for strongly convex f , f ( x ( k ) ) − p ∗ ≤ c k ( f ( x (0) ) − p ∗ ) ◮ exact line search: c = 1 − m/M < 1 ◮ backtracking line search: c = 1 − min { 2 mα, 2 βαm/M } < 1 ◮ linear convergence: the error lies below a line on a log-linear plot of error versus iteration number ◮ very simple, but often very slow; rarely used in practice SJTU Ying Cui 11 / 40

  12. Examples a quadratic problem in R 2 f ( x ) = (1 / 2)( x 2 1 + γx 2 2 ) ( γ > 0) with exact line search, starting at x (0) = ( γ, 1) : closed-form expressions for iterates � γ − 1 � k � � k � γ − 1 � 2 k − γ − 1 x ( k ) , x ( k ) , f ( x ( k ) ) = f ( x (0) ) = γ = γ 1 2 γ + 1 γ + 1 γ + 1 ◮ exact solution found in one iteration if γ = 1 ; convergence rapid if γ not far from 1; convergence very slow if γ ≫ 1 or γ ≪ 1 4 x (0) x 2 0 x (1) − 4 − 10 0 10 x 1 Figure 9.2 Some contour lines of the function f ( x ) = (1 / 2)( x 2 1 + 10 x 2 2 ). The condition number of the sublevel sets, which are ellipsoids, is exactly 10. The figure shows the iterates of the gradient method with exact line search, started at x (0) = (10 , 1). SJTU Ying Cui 12 / 40

  13. Examples a nonquadratic problem in R 2 f ( x 1 , x 2 ) = e x 1 +3 x 2 − 0 . 1 + e x 1 − 3 x 2 − 0 . 1 + e − x 1 − 0 . 1 ◮ backtracking line search: approximately linear convergence (sublevel sets of f not too badly conditioned, M/m not too large) ◮ exact line search: approximately linear convergence, about twice as fast as with backtracking line search 10 5 10 0 x (0) f ( x ( k ) ) − p ⋆ backtracking l.s. x (2) 10 − 5 x (0) 10 − 10 exact l.s. x (1) x (1) 10 − 15 0 5 10 15 20 25 k Figure 9.4 Error f ( x ( k ) ) − p ⋆ versus iteration k of the gradient method with Figure 9.3 Iterates of the gradient method with backtracking line search, backtracking and exact line search, for the problem in R 2 with objective f for the problem in R 2 with objective f given in (9.20). The dashed curves given in (9.20). The plot shows nearly linear convergence, with the error are level curves of f , and the small circles are the iterates of the gradient method. The solid lines, which connect successive iterates, show the scaled reduced approximately by the factor 0 . 4 in each iteration of the gradient steps t ( k ) ∆ x ( k ) . Figure 9.5 Iterates of the gradient method with exact line search for the method with backtracking line search, and by the factor 0 . 2 in each iteration problem in R 2 with objective f given in (9.20). of the gradient method with exact line search. SJTU Ying Cui 13 / 40

  14. Examples a problem in R 100 500 � f ( x ) = c T x − log( b i − a T i x ) i =1 ◮ backtracking line search: approximately linear convergence ◮ exact line search: approximately linear convergence, only a bit faster than with backtracking line search 10 4 10 2 f ( x ( k ) ) − p ⋆ 10 0 exact l.s. 10 − 2 backtracking l.s. 10 − 4 0 50 100 150 200 k Figure 9.6 Error f ( x ( k ) ) − p ⋆ versus iteration k for the gradient method with backtracking and exact line search, for a problem in R 100 . SJTU Ying Cui 14 / 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend