Summary Key topics. Familiarity with form of basic network - PowerPoint PPT Presentation

Summary Key topics. ◮ Familiarity with form of basic network gradient. ◮ Deep network initialization. ◮ Minibatches. ◮ Momentum. Next time: convexity. 17 / 42

Part 2: convexity

Why convexity? Deep networks are not convex in their parameters. Why study convexity? ◮ Convexity is pervasive in ML and mathematics; e.g., our losses for deep learning are still convex. ◮ Convexity exemplifies nice “local-to-global” structure. 18 / 42

6. Convex sets and functions

Convex sets A set S is convex if, for every pair of points { x , x ′ } in S , the line segment between x and x ′ is also contained in S . ( { x , x ′ } ∈ S = ⇒ [ x , x ′ ] ∈ S .) convex not convex convex convex 19 / 42

Convex sets A set S is convex if, for every pair of points { x , x ′ } in S , the line segment between x and x ′ is also contained in S . ( { x , x ′ } ∈ S = ⇒ [ x , x ′ ] ∈ S .) convex not convex convex convex Examples : ◮ All of R d . ◮ Empty set. ◮ Half-spaces: { x ∈ R d : a T x ≤ b } . ◮ Intersections of convex sets. � � � � = � m x ∈ R d : Ax ≤ b x ∈ R d : a T ◮ Polyhedra: i x ≤ b i . i =1 ◮ Convex hulls: conv( S ) := { � k i =1 α i x i : k ∈ N , x i ∈ S, α i ≥ 0 , � k i =1 α i = 1 } . (Infinite convex hulls: intersection of all convex supersets.) 19 / 42

Convex functions from convex sets The epigraph of a function f is the area above the curve: � � ( x , y ) ∈ R d +1 : y ≥ f ( x ) epi( f ) := . A function is convex if its epigraph is convex. f is not convex f is convex 20 / 42

Convex functions (standard definition) A function f : R d → R is convex if for any x , x ′ ∈ R d and α ∈ [0 , 1] , f ((1 − α ) x + α x ′ ) ≤ (1 − α ) · f ( x ) + α · f ( x ′ ) . x x ′ x x ′ f is not convex f is convex 21 / 42

Convex functions (standard definition) A function f : R d → R is convex if for any x , x ′ ∈ R d and α ∈ [0 , 1] , f ((1 − α ) x + α x ′ ) ≤ (1 − α ) · f ( x ) + α · f ( x ′ ) . x x ′ x x ′ f is not convex f is convex Examples : ◮ f ( x ) = c x for any c > 0 (on R ) ◮ f ( x ) = | x | c for any c ≥ 1 (on R ) ◮ f ( x ) = b T x for any b ∈ R d . ◮ f ( x ) = � x � for any norm �·� . ◮ f ( x ) = x T Ax for symmetric positive semidefinite A . �� d � ◮ f ( x ) = ln i =1 exp( x i ) , which approximates max i x i . 21 / 42

Example verification: norms Is f ( x ) = � x � convex? 22 / 42

Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . 22 / 42

Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = 22 / 42

Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = � (1 − α ) x � + � α x ′ � ≤ (triangle inequality) 22 / 42

Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = � (1 − α ) x � + � α x ′ � ≤ (triangle inequality) (1 − α ) � x � + α � x ′ � = (homogeneity) 22 / 42

Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = � (1 − α ) x � + � α x ′ � ≤ (triangle inequality) (1 − α ) � x � + α � x ′ � = (homogeneity) (1 − α ) f ( x ) + αf ( x ′ ) . = Yes, f is convex. 22 / 42

Operations preserving convexity Summations: if ( f 1 , . . . , f k ) convex and ( α 1 , . . . , α k ) nonnegative, x �→ α 1 f 1 ( x ) + · · · + α k f k ( x ) is convex. Affine composition: if f is convex, the for any A ∈ R m × d and b ∈ R m , x �→ f ( Ax + b ) is convex. Maxima: if ( f 1 , . . . , f k ) are convex, x �→ max f i ( x ) is convex. i (Infinitely many functions: use a supremum.) 23 / 42

Example: linear classification and margin losses If ℓ is convex and the predictor is linear, then the empirical risk is convex: ◮ Define ℓ i ( w ) = ℓ ( w T x i y i ) , convex since composition of convex and affine; ◮ thus the empirical risk � n � n R ( w ) = 1 T x i y i ) = 1 � ℓ ( w ℓ i ( w ) n n i =1 i =1 is the nonnegative combination of convex functions, and convex. 24 / 42

7. Various forms of convexity

Convexity of differentiable functions Differentiable functions If f : R d → R is differentiable, then f is convex if and only if f ( x ) a ( x ) T ( x − x 0 ) f ( x ) ≥ f ( x 0 ) + ∇ f ( x 0 ) for all x , x 0 ∈ R d . x 0 Note: this implies increasing slopes : � � T ( x − y ) ≥ 0 . a ( x ) = f ( x 0 ) + f ′ ( x 0 )( x − x 0 ) ∇ f ( x ) − ∇ f ( y ) 25 / 42

Convexity of differentiable functions Differentiable functions If f : R d → R is differentiable, then f is convex if and only if f ( x ) a ( x ) T ( x − x 0 ) f ( x ) ≥ f ( x 0 ) + ∇ f ( x 0 ) for all x , x 0 ∈ R d . x 0 Note: this implies increasing slopes : � � T ( x − y ) ≥ 0 . a ( x ) = f ( x 0 ) + f ′ ( x 0 )( x − x 0 ) ∇ f ( x ) − ∇ f ( y ) Twice-differentiable functions If f : R d → R is twice-differentiable, then f is convex if and only if ∇ 2 f ( x ) � 0 for all x ∈ R d (i.e., the Hessian, or matrix of second-derivatives, is positive semi-definite for all x ). 25 / 42

Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? 26 / 42

Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = 26 / 42

Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. 26 / 42

Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? 26 / 42

Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . 26 / 42

Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = 26 / 42

Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = e � a , x 0 � � �� e � a , x − x 0 � − = 1 + � a , x − x 0 � 26 / 42

Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = e � a , x 0 � � �� e � a , x − x 0 � − = 1 + � a , x − x 0 � (because 1 + z ≤ e z for all z ∈ R ) . ≥ 0 26 / 42

Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = e � a , x 0 � � �� e � a , x − x 0 � − = 1 + � a , x − x 0 � (because 1 + z ≤ e z for all z ∈ R ) . ≥ 0 Yes, f is convex. 26 / 42

Summary Key topics. Familiarity with form of basic network - PowerPoint PPT Presentation

Summary Key topics. Familiarity with form of basic network gradient. Deep network initialization. Minibatches. Momentum. Next time: convexity. 17 / 42 Part 2: convexity Why convexity? Deep networks are not convex in their

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

1 Product Range Products 2 summary summary summary summary Relays with 8 and 11-Pins

An Ultramarathon Pie with Doge Glaze An Ultramarathon Pie with Doge Glaze Marathon: The Summary

SUMMARY OF 2 0 1 5 BRI TI SH EVENTI NG DATA DATA SUMMARY 2015 68,269 Cross Country Starters

summary(dsm_x_tw) summary(dsm_xyb_tw) summary(dsm_xy_tw) Overview Estimating smooths How

New patent case filings per year 1 Summary Judgment motions per year 2 All courts: 101 Summary

Search Summary Search Summary Some material from: D Lin, J You, JC Latombe 1 Search Summary #

Q3FY18 RESULTS Results Summary Operating Highlights Financial Summary Key Strategies Appendix

Summary 1. Summary of

Preliminary Results For year end 31st July 2019 6 November 2019 SUMMARY & OUTLOOK SUMMARY

EXECUTIVE SUMMARY ABOUT SEMPERTI Semperti Executive Summary Version: v1 // 2016 SEMPERTI

Q1FY18 RESULTS Results Summary Operating Highlights Financial Summary Key Strategies Appendix

How similar are these curves? Jessica Sherette EAPSI Research and Experience Summary of Proposal

Lecture 12: Summary Summary Advanced Digital Communications (EQ2410) 1 Standards Final Exam

Security Summary Michael McCool Intel Osaka, W3C Web of Things F2F, 17 May 2017 Summary

GDRSD FINANCIAL GDRSD FINANCIAL GDRSD FINANCIAL GDRSD FINANCIAL OVERVIEW SUMMARY OVERVIEW

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

On the tightness of SDP relaxations of QCQPs Alex L. Wang 1 and Fatma Kln-Karzan 1 1 Carnegie

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x

Concepts of programming languages Idris an Pali, Stefan Koppier, Kevin Namink, Luca

Sparse regression DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science

Primary objectives: Convex optimization Ellipsoid method A polynomial algorithm for

Recent Progress on Error Bounds for Structured Convex Programming Zirui Zhou Joint work with

MATH 4211/6211 Optimization Convex Optimization Problems Xiaojing Ye Department of

Summary Key topics. Familiarity with form of basic network - PowerPoint PPT Presentation

Summary Key topics. Familiarity with form of basic network gradient. Deep network initialization. Minibatches. Momentum. Next time: convexity. 17 / 42 Part 2: convexity Why convexity? Deep networks are not convex in their

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

1 Product Range Products 2 summary summary summary summary Relays with 8 and 11-Pins

An Ultramarathon Pie with Doge Glaze An Ultramarathon Pie with Doge Glaze Marathon: The Summary

SUMMARY OF 2 0 1 5 BRI TI SH EVENTI NG DATA DATA SUMMARY 2015 68,269 Cross Country Starters

summary(dsm_x_tw) summary(dsm_xyb_tw) summary(dsm_xy_tw) Overview Estimating smooths How

New patent case filings per year 1 Summary Judgment motions per year 2 All courts: 101 Summary

Search Summary Search Summary Some material from: D Lin, J You, JC Latombe 1 Search Summary #

Q3FY18 RESULTS Results Summary Operating Highlights Financial Summary Key Strategies Appendix

Summary 1. Summary of

Preliminary Results For year end 31st July 2019 6 November 2019 SUMMARY &amp; OUTLOOK SUMMARY

EXECUTIVE SUMMARY ABOUT SEMPERTI Semperti Executive Summary Version: v1 // 2016 SEMPERTI

Q1FY18 RESULTS Results Summary Operating Highlights Financial Summary Key Strategies Appendix

How similar are these curves? Jessica Sherette EAPSI Research and Experience Summary of Proposal

Lecture 12: Summary Summary Advanced Digital Communications (EQ2410) 1 Standards Final Exam

Security Summary Michael McCool Intel Osaka, W3C Web of Things F2F, 17 May 2017 Summary

GDRSD FINANCIAL GDRSD FINANCIAL GDRSD FINANCIAL GDRSD FINANCIAL OVERVIEW SUMMARY OVERVIEW

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

On the tightness of SDP relaxations of QCQPs Alex L. Wang 1 and Fatma Kln-Karzan 1 1 Carnegie

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x

Concepts of programming languages Idris an Pali, Stefan Koppier, Kevin Namink, Luca

Sparse regression DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science

Primary objectives: Convex optimization Ellipsoid method A polynomial algorithm for

Recent Progress on Error Bounds for Structured Convex Programming Zirui Zhou Joint work with

MATH 4211/6211 Optimization Convex Optimization Problems Xiaojing Ye Department of

Preliminary Results For year end 31st July 2019 6 November 2019 SUMMARY & OUTLOOK SUMMARY