csci 1951 g optimization methods in finance part 06
play

CSCI 1951-G Optimization Methods in Finance Part 06: Algorithms - PowerPoint PPT Presentation

CSCI 1951-G Optimization Methods in Finance Part 06: Algorithms for Unconstrained Convex Optimization March 9, 2018 1 / 28 This material is covered in S. Boyd, L. Vandenberges book Convex Optimization


  1. CSCI 1951-G – Optimization Methods in Finance Part 06: Algorithms for Unconstrained Convex Optimization March 9, 2018 1 / 28

  2. This material is covered in S. Boyd, L. Vandenberge’s book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/ . Some of the materials and the figures are taken from it. 2 / 28

  3. Outline 1 Unconstrained minimization: descent methods 2 Equality constrained minimization: Newton’s method 3 General minimization: Interior point methods 3 / 28

  4. Unconstrained minimization Consider the unconstrained minimization problem: min f ( x ) where f : R n → R , convex and twice continuously differentiable . x ∗ : optimal solution with optimal obj. value p ∗ . Necessary and sufficient condition for x ∗ to be optimal: ∇ f ( x ∗ ) = 0 The above is a system of ... n equations in ... n variables. Solving ∇ f ( x ) = 0 analytically is ofen not easy or not possible. 4 / 28

  5. Example: unconstrained geometric program � m � � exp( a T min f ( x ) = ln i x + b i ) i =1 f ( x ) is convex. The optimality condition is m 1 i x ∗ + b i ) a i � exp( a T 0 = ∇ f ( x ∗ ) = j x ∗ + b j ) � m j =1 exp( a T i =1 which in general has no analytical solution. 5 / 28

  6. Iterative algorithms Iterative algorithms for minimization compute a minimizing sequence x (0) , x (1) , . . . of feasible points s.t. f ( x ( k ) → p ∗ as k → ∞ The algorithm terminates when f ( x ( k ) ) − p ∗ ≤ ε, for a specified tolerance ε > 0 . 6 / 28

  7. How to know when to stop? Consider the sublevel set S = { x : f ( x ) ≤ f ( x (0) ) } Additional assumption: f is strongly convex on S , i.e., there exist m > 0 s.t. ∇ 2 f ( x ) − mI > 0 for all x ∈ S i.e., the difference on the l.h.s. is positive definite. Consequence : f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) + m 2 � y − x � 2 2 for all x and y in S. (What happens when f is “just” convex?) 7 / 28

  8. Strong convexity gives a stopping rule f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) + m 2 � y − x � 2 2 For any fixed x , the r.h.s. is a convex quadratic function g x ( y ) of y . Let’s find the y for which the r.h.s. is minimal. How? Solve ∇ g x ( y ) = 0 ! Solution: y = x − 1 ˜ m ∇ f ( x ) Then: y − x ) + m f ( y ) ≥ f ( x ) + ∇ f ( x ) T (˜ y − x � 2 2 � ˜ 2 = f ( x ) − 1 2 m �∇ f ( x ) � 2 2 8 / 28

  9. Strong convexity gives a stopping rule f ( y ) ≥ f ( x ) − 1 2 m �∇ f ( x ) � 2 2 for any x and y in S For y = x ∗ , the above becomes: p ∗ ≥ f ( x ) − 1 2 m �∇ f ( x ) � 2 2 for any x ∈ S Intuition: if �∇ f ( x ) � 2 2 is small, x is nearly optimal. Suboptimality condition: In order to have f ( x ) − p ∗ ≤ ε , it must hold that √ �∇ f ( x ) � 2 ≤ 2 mε Strong convexity also gives us a bound on � x − x ∗ � 2 in terms of �∇ f ( x ) � 2 : � x − x ∗ � 2 ≤ 2 m �∇ f ( x ) � 2 9 / 28

  10. Descent methods We now describe algorithms producing a minimizing sequence ( x ( k ) k ≥ 1 where x ( k +1) = x ( k ) + t ( k ) ∆ x ( k ) • ∆ x ( k ) ∈ R n (vector): step/search direction . • t ( k ) > 0 (scalar): step size/length . The algorithms are descent methods , i.e., f ( x ( k +1) ) < f ( x ( k ) ) 10 / 28

  11. Descent direction How to chose ∆ x ( k ) so that f ( x ( k +1) ) < f ( x ( k ) ) ? From convexity we know that ∇ f ( x ( k ) ) T ( y − x ( k ) ) ≥ 0 ⇒ f ( y ) . . . ≥ f ( x ( k ) ) so ∆ x ( k ) must satisfy: ∇ f ( x ( k ) ) T ∆ x ( k ) < 0 I.e., the angle between −∇ f ( x ( k ) ) and ∆ x ( k ) must be ... acute . Such a direction is known as a descent direction . 11 / 28

  12. General descent method input: function f , starting point x repeat 1 Determine a descent direction ∆ x ; 2 Line search : choose a step size t ≥ 0 ; 3 Update : x ← x + t ∆ x ; until stopping criterion is satisfied Step 2 is called line search because it determines where on the ray { x + t ∆ x : t ≥ 0 } the next iterate will be. 12 / 28

  13. Exact line search Choose t to minimize f along the ray { x + t ∆ x : t ≥ 0 } : t = arg min s ≥ 0 f ( x + s ∆ x ) Useful when the cost of the above minimization problem is low w.r.t. computing ∆ x (e.g., analytical solution) 13 / 28

  14. Backtracking line search Most line searches are inexact : they approximately minimize f along the ray { x + t ∆ x : t ≥ 0 } Backtracking line search : input: descent direction ∆ x for f at x , α ∈ (0 , 0 . 5) , β ∈ (0 , 1) t ← 1 while f ( x + t ∆ x ) > f ( x ) + αt ∇ f ( x ) T ∆ x t ← βt end “Backtracking”: starts with large t and iteratively shrinks it. 14 / 28

  15. Why does backtracking line search terminate? For small t , f ( x + t ∆ x ) ≈ f ( x ) + t ∇ f ( x ) T ∆ x It holds f ( x ) + t ∇ f ( x ) T ∆ x < f ( x ) + αt ∇ f ( x ) T ∆ x because ∇ f ( x ) T ∆ x ≤ 0 because ∆ x is a descent direction. 15 / 28

  16. Visualization f ( x + t ∆ x ) f ( x ) + t ∇ f ( x ) T ∆ x f ( x ) + αt ∇ f ( x ) T ∆ x t t = 0 t 0 Figure 9.1 Backtracking line search. The curve shows f , restricted to the line over which we search. The lower dashed line shows the linear extrapolation of f , and the upper dashed line has a slope a factor of α smaller. The backtracking condition is that f lies below the upper dashed line, i.e. , 0 ≤ t ≤ t 0 . 16 / 28

  17. Gradient descent method input: function f , starting point x repeat 1 ∆ x ← −∇ f ( x ) ; 2 Line search : choose a step size t ≥ 0 via exact or backtracking line search; 3 Update : x ← x + t ∆ x ; until stopping criterion is satisfied (e.g., �∇ f ( x ) � 2 ≤ η ) 17 / 28

  18. Example min f ( x 1 , x 2 ) = e x 1 +3 x 2 − 0 . 1 + e x 1 − 3 x 2 − 0 . 1 + e − x 1 − 0 . 1 Let’s solve it with gradient descent and backtrack line search with α = 0 . 1 and β = 0 . 7 . x (0) x (2) x (1) Figure 9.3 Iterates of the gradient method with backtracking line search, for the problem in R 2 with objective f given in (9.20). The dashed curves are level curves of f , and the small circles are the iterates of the gradient method. The solid lines, which connect successive iterates, show the scaled steps t ( k ) ∆ x ( k ) . The lines connecting successive iterates show the scaled steps: x ( k +1) − x ( k ) = − t ( k ) ∇ f ( x ( k ) ) 18 / 28

  19. Example x (0) x (1) Figure 9.5 Iterates of the gradient method with exact line search for the problem in R 2 with objective f given in (9.20). 19 / 28

  20. Example 10 5 10 0 f ( x ( k ) ) − p ⋆ backtracking l.s. 10 − 5 10 − 10 exact l.s. 10 − 15 0 5 10 15 20 25 k Figure 9.4 Error f ( x ( k ) ) − p ⋆ versus iteration k of the gradient method with backtracking and exact line search, for the problem in R 2 with objective f given in (9.20). The plot shows nearly linear convergence, with the error reduced approximately by the factor 0 . 4 in each iteration of the gradient method with backtracking line search, and by the factor 0 . 2 in each iteration of the gradient method with exact line search. 20 / 28

  21. Convergence analysis Fact: if f is strongly convex on S , then ∃ M ∈ R + s.t. ∇ 2 f ( x ) ≤ MI , for all x ∈ S . Converge of gradient descent : Let ε > 0 . Let � � f ( x (0) ) − p ∗ 1 k ≥ log 1 − m � � ε − log M Afer k iterations it must hold f ( x ( k ) ) − p ∗ ≤ ε More interpretable bound: 1 − m f ( x ( k ) ) − p ∗ ≤ � � ( f ( x (0) ) − p ∗ ) M I.e., the error converges to 0 at least as fast as a geometric series ( linear convergence (on a log-linear plot)) 21 / 28

  22. Steepest descent method We saw that gradient descent may converge very slowly if M/m is large. Is the gradient the best descent direction to take (and in what sense)? First-order Taylor approximation of f ( x + v ) around x : f ( x + v ) ≈ f ( x ) + ∇ f ( x ) T v ∇ f ( x ) T v is the directional derivative of f at x in the direction v 22 / 28

  23. Steepest descent method v is a descent direction if the directional derivative ∇ f ( x ) T v is negative. How to choose v to make the directional derivative as negative as possible? Since ∇ f ( x ) T v is linear in v , we must restrict the choice of v somehow (oth. ...we could just keep growing the magnitude of v ) Let � · � be any norm in R n Normalized steepest descent direction w.r.t. � · � : ∆ x nsd = arg min {∇ f ( x ) T v : � v � = 1 } It gives the largest decrease in the linear approximation of f 23 / 28

  24. Example If � · � is the Euclidean norm , then ∆ x nsd = −∇ f ( x ) 24 / 28

  25. Example Consider the quadratic norm � z � P = ( z T Pz ) 1 / 2 = � P 1 / 2 z � 2 where P is positive definite. The normalized steepest descent direction is ∆ x nsd = ( ∇ f ( x ) T P − 1 ∇ f ( x )) 1 / 2 P − 1 ∇ f ( x ) for the step v = − P − 1 ∇ f ( x ) . 25 / 28

  26. Geometric interpretation −∇ f ( x ) ∆ x nsd Figure 9.9 Normalized steepest descent direction for a quadratic norm. The ellipsoid shown is the unit ball of the norm, translated to the point x . The normalized steepest descent direction ∆ x nsd at x extends as far as possible in the direction −∇ f ( x ) while staying in the ellipsoid. The gradient and normalized steepest descent directions are shown. 26 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend