summary
play

Summary Key topics. Familiarity with form of basic network - PowerPoint PPT Presentation

Summary Key topics. Familiarity with form of basic network gradient. Deep network initialization. Minibatches. Momentum. Next time: convexity. 17 / 42 Part 2: convexity Why convexity? Deep networks are not convex in their


  1. Summary Key topics. ◮ Familiarity with form of basic network gradient. ◮ Deep network initialization. ◮ Minibatches. ◮ Momentum. Next time: convexity. 17 / 42

  2. Part 2: convexity

  3. Why convexity? Deep networks are not convex in their parameters. Why study convexity? ◮ Convexity is pervasive in ML and mathematics; e.g., our losses for deep learning are still convex. ◮ Convexity exemplifies nice “local-to-global” structure. 18 / 42

  4. 6. Convex sets and functions

  5. Convex sets A set S is convex if, for every pair of points { x , x ′ } in S , the line segment between x and x ′ is also contained in S . ( { x , x ′ } ∈ S = ⇒ [ x , x ′ ] ∈ S .) convex not convex convex convex 19 / 42

  6. Convex sets A set S is convex if, for every pair of points { x , x ′ } in S , the line segment between x and x ′ is also contained in S . ( { x , x ′ } ∈ S = ⇒ [ x , x ′ ] ∈ S .) convex not convex convex convex Examples : ◮ All of R d . ◮ Empty set. ◮ Half-spaces: { x ∈ R d : a T x ≤ b } . ◮ Intersections of convex sets. � � � � = � m x ∈ R d : Ax ≤ b x ∈ R d : a T ◮ Polyhedra: i x ≤ b i . i =1 ◮ Convex hulls: conv( S ) := { � k i =1 α i x i : k ∈ N , x i ∈ S, α i ≥ 0 , � k i =1 α i = 1 } . (Infinite convex hulls: intersection of all convex supersets.) 19 / 42

  7. Convex functions from convex sets The epigraph of a function f is the area above the curve: � � ( x , y ) ∈ R d +1 : y ≥ f ( x ) epi( f ) := . A function is convex if its epigraph is convex. f is not convex f is convex 20 / 42

  8. Convex functions (standard definition) A function f : R d → R is convex if for any x , x ′ ∈ R d and α ∈ [0 , 1] , f ((1 − α ) x + α x ′ ) ≤ (1 − α ) · f ( x ) + α · f ( x ′ ) . x x ′ x x ′ f is not convex f is convex 21 / 42

  9. Convex functions (standard definition) A function f : R d → R is convex if for any x , x ′ ∈ R d and α ∈ [0 , 1] , f ((1 − α ) x + α x ′ ) ≤ (1 − α ) · f ( x ) + α · f ( x ′ ) . x x ′ x x ′ f is not convex f is convex Examples : ◮ f ( x ) = c x for any c > 0 (on R ) ◮ f ( x ) = | x | c for any c ≥ 1 (on R ) ◮ f ( x ) = b T x for any b ∈ R d . ◮ f ( x ) = � x � for any norm �·� . ◮ f ( x ) = x T Ax for symmetric positive semidefinite A . �� d � ◮ f ( x ) = ln i =1 exp( x i ) , which approximates max i x i . 21 / 42

  10. Example verification: norms Is f ( x ) = � x � convex? 22 / 42

  11. Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . 22 / 42

  12. Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = 22 / 42

  13. Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = � (1 − α ) x � + � α x ′ � ≤ (triangle inequality) 22 / 42

  14. Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = � (1 − α ) x � + � α x ′ � ≤ (triangle inequality) (1 − α ) � x � + α � x ′ � = (homogeneity) 22 / 42

  15. Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = � (1 − α ) x � + � α x ′ � ≤ (triangle inequality) (1 − α ) � x � + α � x ′ � = (homogeneity) (1 − α ) f ( x ) + αf ( x ′ ) . = Yes, f is convex. 22 / 42

  16. Operations preserving convexity Summations: if ( f 1 , . . . , f k ) convex and ( α 1 , . . . , α k ) nonnegative, x �→ α 1 f 1 ( x ) + · · · + α k f k ( x ) is convex. Affine composition: if f is convex, the for any A ∈ R m × d and b ∈ R m , x �→ f ( Ax + b ) is convex. Maxima: if ( f 1 , . . . , f k ) are convex, x �→ max f i ( x ) is convex. i (Infinitely many functions: use a supremum.) 23 / 42

  17. Example: linear classification and margin losses If ℓ is convex and the predictor is linear, then the empirical risk is convex: ◮ Define ℓ i ( w ) = ℓ ( w T x i y i ) , convex since composition of convex and affine; ◮ thus the empirical risk � n � n R ( w ) = 1 T x i y i ) = 1 � ℓ ( w ℓ i ( w ) n n i =1 i =1 is the nonnegative combination of convex functions, and convex. 24 / 42

  18. 7. Various forms of convexity

  19. Convexity of differentiable functions Differentiable functions If f : R d → R is differentiable, then f is convex if and only if f ( x ) a ( x ) T ( x − x 0 ) f ( x ) ≥ f ( x 0 ) + ∇ f ( x 0 ) for all x , x 0 ∈ R d . x 0 Note: this implies increasing slopes : � � T ( x − y ) ≥ 0 . a ( x ) = f ( x 0 ) + f ′ ( x 0 )( x − x 0 ) ∇ f ( x ) − ∇ f ( y ) 25 / 42

  20. Convexity of differentiable functions Differentiable functions If f : R d → R is differentiable, then f is convex if and only if f ( x ) a ( x ) T ( x − x 0 ) f ( x ) ≥ f ( x 0 ) + ∇ f ( x 0 ) for all x , x 0 ∈ R d . x 0 Note: this implies increasing slopes : � � T ( x − y ) ≥ 0 . a ( x ) = f ( x 0 ) + f ′ ( x 0 )( x − x 0 ) ∇ f ( x ) − ∇ f ( y ) Twice-differentiable functions If f : R d → R is twice-differentiable, then f is convex if and only if ∇ 2 f ( x ) � 0 for all x ∈ R d (i.e., the Hessian, or matrix of second-derivatives, is positive semi-definite for all x ). 25 / 42

  21. Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? 26 / 42

  22. Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = 26 / 42

  23. Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. 26 / 42

  24. Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? 26 / 42

  25. Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . 26 / 42

  26. Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = 26 / 42

  27. Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = e � a , x 0 � � �� � e � a , x − x 0 � − = 1 + � a , x − x 0 � 26 / 42

  28. Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = e � a , x 0 � � �� � e � a , x − x 0 � − = 1 + � a , x − x 0 � (because 1 + z ≤ e z for all z ∈ R ) . ≥ 0 26 / 42

  29. Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = e � a , x 0 � � �� � e � a , x − x 0 � − = 1 + � a , x − x 0 � (because 1 + z ≤ e z for all z ∈ R ) . ≥ 0 Yes, f is convex. 26 / 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend