optimization for machine learning
play

Optimization for Machine Learning Lecture 1: Introduction to - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 1: Introduction to Convexity S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 12, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 43


  1. Optimization for Machine Learning Lecture 1: Introduction to Convexity S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 12, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 43

  2. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  3. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  4. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  5. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  6. Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� � i =1 Regularizer � �� � Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

  7. Convex Functions and Sets Outline Convex Functions and Sets 1 Operations Which Preserve Convexity 2 First Order Properties 3 Subgradients 4 Constraints 5 Warmup: Minimizing a 1-d Convex Function 6 Warmup: Coordinate Descent 7 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 43

  8. Convex Functions and Sets Focus of my Lectures S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

  9. Convex Functions and Sets Focus of my Lectures S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

  10. Convex Functions and Sets Focus of my Lectures 10 2 0 0 − 2 0 − 2 2 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

  11. Convex Functions and Sets Disclaimer My focus is on showing connections between various methods I will sacrifice mathematical rigor and focus on intuition S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 43

  12. Convex Functions and Sets Convex Function f ( x ′ ) f ( x ) A function f is convex if, and only if, for all x , x ′ and λ ∈ (0 , 1) f ( λ x + (1 − λ ) x ′ ) ≤ λ f ( x ) + (1 − λ ) f ( x ′ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

  13. Convex Functions and Sets Convex Function f ( x ′ ) f ( x ) A function f is strictly convex if, and only if, for all x , x ′ and λ ∈ (0 , 1) f ( λ x + (1 − λ ) x ′ ) <λ f ( x ) + (1 − λ ) f ( x ′ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

  14. Convex Functions and Sets Convex Function f ( x ′ ) f ( x ) 2 �·� 2 is convex. A function f is σ -strongly convex if, and only if, f ( · ) − σ That is, for all x , x ′ and λ ∈ (0 , 1) f ( λ x + (1 − λ ) x ′ ) ≤ λ f ( x ) + (1 − λ ) f ( x ′ ) − σ � � x − x ′ � � 2 2 λ (1 − λ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

  15. Convex Functions and Sets Exercise: Jensen’s Inequality Extend the definition of convexity to show that if f is convex, then for all λ i ≥ 0 such that � i λ i = 1 we have �� � � f λ i x i ≤ λ i f ( x i ) i i S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 7 / 43

  16. Convex Functions and Sets Some Familiar Examples 12 10 8 6 4 2 − 4 − 2 2 4 2 x 2 (Square norm) f ( x ) = 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  17. Convex Functions and Sets Some Familiar Examples 60 40 20 0 3 − 2 2 1 0 0 − 1 2 − 2 − 3 � � 10 , 1 � � x � f ( x , y ) = 1 � x , y 2 2 , 1 y S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  18. Convex Functions and Sets Some Familiar Examples 0 − 0 . 2 − 0 . 4 − 0 . 6 0 0 . 2 0 . 4 0 . 6 0 . 8 1 f ( x ) = x log x + (1 − x ) log(1 − x ) (Negative entropy) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  19. Convex Functions and Sets Some Familiar Examples 0 − 0 . 5 − 1 − 1 . 5 2 1 . 5 − 2 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 . 5 1 . 2 1 . 4 1 . 6 1 . 8 2 0 f ( x , y ) = x log x + y log y − x − y (Un-normalized negative entropy) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  20. Convex Functions and Sets Some Familiar Examples 4 3 2 1 0 − 3 − 2 − 1 0 1 2 3 f ( x ) = max(0 , 1 − x ) (Hinge Loss) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

  21. Convex Functions and Sets Some Other Important Examples Linear functions: f ( x ) = ax + b Softmax: f ( x ) = log � i exp( x i ) �� i x 2 Norms: For example the 2-norm f ( x ) = i S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 9 / 43

  22. Convex Functions and Sets Convex Sets A set C is convex if, and only if, for all x , x ′ ∈ C and λ ∈ (0 , 1) we have λ x + (1 − λ ) x ′ ∈ C S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 43

  23. Convex Functions and Sets Convex Sets and Convex Functions A function f is convex if, and only if, its epigraph is a convex set S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 43

  24. Convex Functions and Sets Convex Sets and Convex Functions Indicator functions of convex sets are convex � 0 if x ∈ C I C ( x ) = ∞ otherwise . S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 43

  25. Convex Functions and Sets Below sets of Convex Functions 10 2 0 0 − 2 0 − 2 2 f ( x , y ) = x 2 + y 2 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 43

  26. Convex Functions and Sets Below sets of Convex Functions 0 − 1 2 − 2 1 0 0 . 5 1 1 . 5 2 0 f ( x , y ) = x log x + y log y − x − y S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 43

  27. Convex Functions and Sets Below sets of Convex Functions If f is convex, then all its level sets are convex Is the converse true? (Exercise: construct a counter-example) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 14 / 43

  28. Convex Functions and Sets Minima on Convex Sets Set of minima of a convex function is a convex set Proof: Consider the set { x : f ( x ) ≤ f ∗ } S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 43

  29. Convex Functions and Sets Minima on Convex Sets Set of minima of a strictly convex function is a singleton Proof: try this at home! S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 43

  30. Operations Which Preserve Convexity Outline Convex Functions and Sets 1 Operations Which Preserve Convexity 2 First Order Properties 3 Subgradients 4 Constraints 5 Warmup: Minimizing a 1-d Convex Function 6 Warmup: Coordinate Descent 7 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 16 / 43

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend