statistical machine learning
play

Statistical Machine Learning Lecture 04: Optimization Refresher - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 04: Optimization Refresher Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 65 Todays Objectives Make


  1. Statistical Machine Learning Lecture 04: Optimization Refresher Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 1 / 65

  2. Today’s Objectives Make you remember Calculus and teach you advanced topics! Brute Force right through optimization! Covered Topics: Unconstrained Optimization Lagrangian Optimization Numerical Methods (Gradient Descent) Go deeper? Take the Optimization Class of Prof. von Stryk / SIM! Read Convex Optimization by Boyd & Vandenberghe - http:// www.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 2 / 65

  3. Outline 1. Motivation 2. Convexity Convex Sets Convex Functions 3. Unconstrained & Constrained Optimization 4. Numerical Optimization 5. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 3 / 65

  4. 1. Motivation Outline 1. Motivation 2. Convexity Convex Sets Convex Functions 3. Unconstrained & Constrained Optimization 4. Numerical Optimization 5. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 4 / 65

  5. 1. Motivation 1. Motivation “All learning problems are essentially optimization problems on data.” Christopher G. Atkeson, Professor at CMU K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 5 / 65

  6. 1. Motivation Robot Arm You want to predict the torques of a robot arm y = I ¨ q − µ ˙ q + mlg sin ( q ) � ¨ ˙ � � � ⊺ = q q sin( q ) I − µ mlg = φ ( x ) ⊺ θ Can we do this with a data set? D = { ( x i , y i ) | i = 1 · · · n } Yes, by minimizing the sum of the squared error i = 1 ( y i − φ ( x i ) ⊺ θ ) 2 min θ J ( θ , D ) = � n Carl Friedrich Gauss (1777–1855) Note that this is just one way to measure an error... K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 6 / 65

  7. 1. Motivation Will the previous method work? Sure! But the solution may be faulty, e.g., m = − 1kg , . . . Hence, we need to ensure some extra conditions, and our problem results in a constrained optimization problem n ( y i − φ ( x i ) ⊺ θ ) 2 � min J ( θ , D ) = θ i = 1 s . t . g ( θ , D ) ≥ 0 � � ⊺ where g ( θ , D ) = − θ 2 θ 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 7 / 65

  8. 1. Motivation Motivation ALL learning problems are optimization problems In any learning system, we have 1. Parameters θ to enable learning 2. Data set D to learn from 3. A cost function J ( θ , D ) to measure our performance 4. Some assumptions on the data, with equality and inequality constraints, f ( θ , D ) = 0 and g ( θ , D ) > 0 How can we solve such problems in general? K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 65

  9. 1. Motivation Optimization problems in Machine Learning Machine Learning tells us how to come up with data-based cost functions such that optimization can solve them! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 9 / 65

  10. 1. Motivation Most Cost Functions are Useless Good Machine Learning tells us how to come up with data-based cost functions such that optimization can solve them efficiently! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 65

  11. 1. Motivation Good cost functions should be Convex Ideally, the Cost Functions should be Convex! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 11 / 65

  12. 2. Convexity Outline 1. Motivation 2. Convexity Convex Sets Convex Functions 3. Unconstrained & Constrained Optimization 4. Numerical Optimization 5. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 12 / 65

  13. 2. Convexity : Convex Sets Convex Sets A set C ⊆ R n is convex if ∀ x , y ∈ C and ∀ α ∈ [ 0 , 1 ] α x + ( 1 − α ) y ∈ C This is the equation of the line segment between x and y . I.e., for a given α , the point α x + ( 1 − α ) y lies in the line segment between x and y K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 13 / 65

  14. 2. Convexity : Convex Sets Examples of Convex Sets All of R n (obvious) Non-negative orthant : R n + . Let x � 0 , y � 0, clearly α x + ( 1 − α ) y � 0 Norm balls . Let � x � ≤ 1 , � y � ≤ 1, then � α x + ( 1 − α ) y � ≤ � α x � + � ( 1 − α ) y � = α � x � + ( 1 − α ) � y � ≤ 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 14 / 65

  15. 2. Convexity : Convex Sets Examples of Convex Sets Affine subspaces (linear manifold) : Ax = b , Ay = b , then A ( α x + ( 1 − α ) y ) = α Ax + ( 1 − α ) Ay = α b + ( 1 − α ) b = b K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 15 / 65

  16. 2. Convexity : Convex Functions Convex Functions A function f : R n → R is convex if ∀ x , y ∈ dom ( f ) and ∀ α ∈ [ 0 , 1 ] f ( α x + ( 1 − α ) y ) ≤ α f ( x ) + ( 1 − α ) f ( y ) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 16 / 65

  17. 2. Convexity : Convex Functions Examples of Convex Functions Linear/affine functions f ( x ) = b ⊺ x + c Quadratic functions f ( x ) = 1 2 x ⊺ Ax + b ⊺ x + c where A � 0 (positive semidefinite matrix) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 17 / 65

  18. 2. Convexity : Convex Functions Examples of Convex Functions Norms (such as l 1 and l 2 ) � α x + ( 1 − α ) y � ≤ � α x � + � ( 1 − α ) y � = α � x � + ( 1 − α ) � y � Log-sum-exp (aka softmax, a smooth approximation to the maximum function often used on machine learning) � n � � f ( x ) = log exp ( x i ) i = 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 18 / 65

  19. 2. Convexity : Convex Functions Important Convex Functions from Classification SVM loss � 1 − y i x ⊺ � f ( w ) = i w + Binary logistic loss � � �� f ( w ) = log 1 + exp − y i x ⊺ i w K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 19 / 65

  20. 2. Convexity : Convex Functions First-Order Convexity Condition Suppose f : R n → R is differentiable . Then f is convex iff ∀ x , y ∈ dom ( f ) f ( y ) ≥ f ( x ) + ∇ x f ( x ) ⊺ ( y − x ) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 20 / 65

  21. 2. Convexity : Convex Functions First-Order Convexity Condition - generally... The subgradient, or subdifferential set, ∂ f ( x ) of f at x is ∂ f ( x ) = { g : f ( y ) ≥ f ( x ) + g ⊺ ( y − x ) , ∀ y } Differentiability is not a requirement! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 21 / 65

  22. 2. Convexity : Convex Functions Second-Order Convexity Condition Suppose f : R n → R is twice differentiable . Then f is convex iff ∀ x ∈ dom ( f ) ∇ 2 x f ( x ) � 0 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 22 / 65

  23. 2. Convexity : Convex Functions Ideal Machine Learning Cost Functions 0 Convex Function min J ( θ , D ) = θ s.t. f ( θ , D ) = 0 Affine/Linear Function g ( θ , D ) ≥ 0 Convex Set K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 23 / 65

  24. 2. Convexity : Convex Functions Why are these conditions nice? Local solutions are globally optimal! Fast and well studied optimizers already exist for a long time! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 24 / 65

  25. 3. Unconstrained & Constrained Optimization Outline 1. Motivation 2. Convexity Convex Sets Convex Functions 3. Unconstrained & Constrained Optimization 4. Numerical Optimization 5. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 25 / 65

  26. 3. Unconstrained & Constrained Optimization Unconstrained optimization Can you solve this problem? 1 − θ 2 1 − θ 2 max J ( θ ) = 2 θ With θ ∗ = � ⊺ , J ∗ = 1 � 0 0 For any other θ � = 0 , J < 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 26 / 65

  27. 3. Unconstrained & Constrained Optimization Constrained optimization Can you solve this problem? 1 − θ 2 1 − θ 2 max J ( θ ) = 2 θ s.t. f ( θ ) = θ 1 + θ 2 − 1 = 0 First approach: convert the problem to an unconstrained problem Second approach: Lagrange Multipliers K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 27 / 65

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend