gradient and epigraph contd
play

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, - PowerPoint PPT Presentation

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x 9that attains its minimum at 1 2 1 2 (0 , 0). We see below its epigraph. Supporting hyperplan (or lower bound) at (0,0) August 24, 2018 48 /


  1. Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x − 9that attains its minimum at 1 2 1 2 (0 , 0). We see below its epigraph. Supporting hyperplan (or lower bound) at (0,0) August 24, 2018 48 / 403

  2. Illustrations to understand Gradient 2 2 For the paraboloid, f ( x , x ) = x + x − 9, the corresponding 1 2 1 2 F ( x , x , z ) = x 2 2 0 = ( x 0 , z ) = (1 , 1 , − 7)which lies onthe x − 9 − z andthe point x + 1 2 1 2 0-level surface of F . The gradient ∇ F ( x 1 , x 2 , z )is[2 x 1 , 2 x 2 , − 1], which when evaluated at x 0 = (1 , 1 , − 7)is[ − 2 , − 2 , − 1]. The equation of the tangent plane to f at x 0 is therefore given by2( x 1 − 1) + 2( x 2 − 1) − 7 = z . The paraboloid attains its minimum at(0 , 0). Plot the tanget plane to the surface at (0 , 0 , f (0 , 0))as also the gradient vector ∇ F at(0 , 0 , f (0 , 0)). What do youexpect? August 24, 2018 49 / 403

  3. Illustrations to understand Gradient 2 2 For the paraboloid, f ( x , x ) = x + x − 9, the corresponding 1 2 1 2 F ( x , x , z ) = x 2 2 0 = ( x 0 , z ) = (1 , 1 , − 7)which lies onthe x − 9 − z andthe point x + 1 2 1 2 0-level surface of F . The gradient ∇ F ( x 1 , x 2 , z )is[2 x 1 , 2 x 2 , − 1], which when evaluated at x 0 = (1 , 1 , − 7)is[ − 2 , − 2 , − 1]. The equation of the tangent plane to f at x 0 is y2( x 1 − 1) + 2( x 2 − 1) − 7 = z . therefore given b The paraboloid attains its minimum at(0 , 0). Plot the tanget plane to the surface at (0 , 0 , f (0 , 0))as also the gradient vector ∇ F at(0 , 0 , f (0 , 0)). What do you expect? Ans: A horizontal tanget plane and a vertical gradient! August 24, 2018 49 / 403

  4. First-Order Convexity Conditions: The completestatement Theorem For differentiable f : D→ ℜ and open convex set D , f is convex iff , for any x , y ∈ D , 1 T f ( x )( y − x ) f ( y ) ≥ f ( x ) + ∇ (9) f is strictly convex iff , for any x , y ∈ D , with x ̸ = y , 2 T f ( x )( y − x ) f ( y ) > f ( x ) + ∇ Strict lower bound (10) 3 f is strongly convex iff , for any x , y ∈ D , and for some constant c > 0 , 1 c || y − x || 2 T f ( x )( y − x ) + f ( y ) ≥ f ( x ) + ∇ (11) 2 August 24, 2018 50 / 403

  5. First-Order Convexity Condition: Proof Proof: Sufficiency: The proof of sufficiency is very similar for all the three statements of the theorem. So we will prove only for statement (9). Suppose (9) holds. Consider x 1 , x 2 ∈ Dand any θ ∈ (0 , 1). Let x = θ x 1 + (1 − θ ) x 2 . Then, T f ( x )( x 1 − x ) multiply by theta f ( x 1 ) ≥ f ( x )+ ∇ f ( x 2 ) ≥ f ( x )+ ∇ T f ( x )( x 2 − x ) multiply by 1-theta (12) And add.. August 24, 2018 51 / 403

  6. First-Order Convexity Condition: Proof Proof: Sufficiency: The proof of sufficiency is very similar for all the three statements of the theorem. So we will prove only for statement (9). Suppose (9) holds. Consider x 1 , x 2 ∈ Dand any θ ∈ (0 , 1). Let x = θ x 1 + (1 − θ ) x 2 . Then, T f ( x )( x 1 − x ) f ( x 1 ) ≥ f ( x ) + ∇ T f ( x )( x 2 − x ) f ( x 2 ) ≥ f ( x ) + ∇ (12) Adding(1 − θ )times the second inequality to θ times the first, we get, θ f ( x 1 ) + (1 − θ ) f ( x 2 ) ≥ f ( x ) which proves that f ( x )is a convex function. In the case of strict convexity, strict inequality holds in (12) and it follows through. In the case of strong convexity, we need to additionally prove that 1 1 1 2 c || x − x || 2 = θ c || x − x || + (1 − θ ) 2 2 2 August 24, 2018 51 / 403

  7. First-Order Convexity Conditions: Proofs Necessity: Suppose f is convex. Then for all θ ∈ (0 , 1)and x 1 , x 2 ∈ D, we musthave f ( θ x 2 + (1 − θ ) x 1 ) ≤ θ f ( x 2 ) + (1 − θ ) f ( x 1 ) Thus, ∇ T f ( x 1 )( x 2 − x 1 ) = Directional derivative of f at x1 along x2 - x1 August 24, 2018 52 / 403

  8. First-Order Convexity Conditions: Proofs Necessity: Suppose f is convex. Then for all θ ∈ (0 , 1)and x 1 , x 2 ∈ D, we musthave f ( θ x 2 + (1 − θ ) x 1 ) ≤ θ f ( x 2 ) + (1 − θ ) f ( x 1 ) Thus, ( ) f x + θ ( x − x ) − f ( x ) 1 2 1 1 T ∇ f ( x )( x − x ) =lim ≤ f ( x 2 ) − f ( x 1 ) 1 2 1 θ θ → 0 This proves necessity for (9). The necessity proofs for (10) and (11) are very similar, except for a small difference for the case of strict convexity; the strict inequality is not preserved when we take limits. Suppose equality does hold in the case of strict convexity, that is for a strictly convex function f , let f ( x 2 ) = f ( x 1 ) + ∇ T f ( x 1 )( x 2 − x 1 ) (13) for some x 2 ̸ = x 1 . August 24, 2018 52 / 403

  9. First-Order Convexity Conditions: Proofs Necessity (contd for strict case): Because f is stricly convex, for any θ ∈ (0 , 1)we canwrite f ((1 − θ ) x 1 + θ x 2 ) = f ( x 1 + θ ( x 2 − x 1 )) < (1 − θ ) f ( x 1 ) + θ f ( x 2 )(14) Since (9) is already proved for convex functions, we use it in conjunction with (13), and (14), to get August 24, 2018 53 / 403

  10. First-Order Convexity Conditions: Proofs Necessity (contd for strict case): Because f is stricly convex, for any θ ∈ (0 , 1)we canwrite f ((1 − θ ) x 1 + θ x 2 ) = f ( x 1 + θ ( x 2 − x 1 )) < (1 − θ ) f ( x 1 ) + θ f ( x 2 )(14) Since (9) is already proved for convex functions, we use it in conjunction with (13), and (14), to get f ( x 1 ) + θ ∇ T f ( x 1 )( x 2 − x 1 ) ≤ f ( x 1 + θ ( x 2 − x 1 ) ) < f ( x 1 ) + θ ∇ T f ( x 1 )( x 2 − x 1 ) Thus, equality can never hold in (9) for any x 1 = ̸ x 2 . This proves the which is a contradiction. necessity of (10). August 24, 2018 53 / 403

  11. First-Order Convexity Conditions: The complete statement The geometrical interpretation of this theorem is that at any point, the linear approximation based on a local derivative gives a lower estimate of the function, i.e. the convex function always lies above the supporting hyperplane at that point. This is pictorially depictedbelow: August 24, 2018 54 / 403

  12. (Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at any boundary point (x,f(x)) August 24, 2018 55 / 403

  13. (Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at every boundarypoint x epi(f) x [h,-1] There exist multiple supporting hyperplanes Let a supporting hyperplane be characterized by a normal vector [h(x), -1] When f was di ff erentiable, this vector was [gradient(x), -1] August 24, 2018 55 / 403

  14. (Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at every boundarypoint x [ ] [ ] [ ] } { [ v , z ]| ⟨ h ( x ) , − 1 , [ v , z ] ⟩ = ⟨ h ( x ) , − 1 , x , f ( x ) ⟩ for all [ v , z ]on the hyperplane and ▶ ⟨ [ h ( x ) , − 1 ] , [ y , z ] ⟩ ≤ ⟨ [ h ( x ) , − 1 ] , [ x , f ( x ) ] ⟩ for all[ y , z ] ∈ epi ( f )whic h also includes [y,f(y)] August 24, 2018 55 / 403

  15. (Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at every boundarypoint x [ v , z ]| ⟨ [ h ( x ) , − 1 ] , [ v , z ] ⟩ = ⟨ [ h ( x ) , − 1 ] , [ x , f ( x ) ] ⟩ ▶ { } for all[ v , z ]on the hyperplane and ⟨ [ h ( x ) , − 1 ] , [ y , z ] ⟩ ≤ ⟨ [ h ( x ) , − 1 ] , [ x , f ( x ) ] ⟩ for all[ y , z ] ∈ epi ( f )which also includes [ y , f ( y ) ] [ ] [ ] [ ] [ ] Thus: ⟨ h ( x ) , − 1 , y , f ( y ) ⟩ ≤ ⟨ h ( x ) , − 1 , x , f ( x ) ⟩ for all y ∈ domain of f The normal to such a supporting hyperplane serves the same purpose as the [gradient(x),-1] August 24, 2018 55 / 403

  16. The What, Why and How of (sub)gradients What of (sub)gradient: Normal to supporting hyperplane at point (x,f(x) of epi(f) Need not be unique Gradient is a subgradient when the function is di ff erentiable August 24, 2018 56 / 403

  17. The What, Why and How of (sub)gradients What of (sub)gradient: Normal to the tightly lower bounding linear approximation to a convex function Why of (sub)gradient: (sub)Gradient necessary and su ffi cient conditions of optimality for convex functions Important for algorithms for optimization Subgradients are important for non-di ff erentiable functions and constraint optimization August 24, 2018 56 / 403

  18. The What, Why and How of (sub)gradients What of (sub)gradient: Normal to the tightly lower bounding linear approximation to a convex function Why of (sub)gradient: Ability to deal with Constraints, Optimality Conditions, Optimization Algorithms How of (sub)gradient: How to compute subgradient of complex non-di ff erentiable convex functions Calculus of convex functions and of subgradients August 24, 2018 56 / 403

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend