Statistical Machine Learning Lecture 04: Optimization Refresher - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 04: Optimization Refresher Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 1 / 65

Today’s Objectives Make you remember Calculus and teach you advanced topics! Brute Force right through optimization! Covered Topics: Unconstrained Optimization Lagrangian Optimization Numerical Methods (Gradient Descent) Go deeper? Take the Optimization Class of Prof. von Stryk / SIM! Read Convex Optimization by Boyd & Vandenberghe - http:// www.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 2 / 65

Outline 1. Motivation 2. Convexity Convex Sets Convex Functions 3. Unconstrained & Constrained Optimization 4. Numerical Optimization 5. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 3 / 65

1. Motivation Outline 1. Motivation 2. Convexity Convex Sets Convex Functions 3. Unconstrained & Constrained Optimization 4. Numerical Optimization 5. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 4 / 65

1. Motivation 1. Motivation “All learning problems are essentially optimization problems on data.” Christopher G. Atkeson, Professor at CMU K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 5 / 65

1. Motivation Robot Arm You want to predict the torques of a robot arm y = I ¨ q − µ ˙ q + mlg sin ( q ) � ¨ ˙ � � � ⊺ = q q sin( q ) I − µ mlg = φ ( x ) ⊺ θ Can we do this with a data set? D = { ( x i , y i ) | i = 1 · · · n } Yes, by minimizing the sum of the squared error i = 1 ( y i − φ ( x i ) ⊺ θ ) 2 min θ J ( θ , D ) = � n Carl Friedrich Gauss (1777–1855) Note that this is just one way to measure an error... K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 6 / 65

1. Motivation Will the previous method work? Sure! But the solution may be faulty, e.g., m = − 1kg , . . . Hence, we need to ensure some extra conditions, and our problem results in a constrained optimization problem n ( y i − φ ( x i ) ⊺ θ ) 2 � min J ( θ , D ) = θ i = 1 s . t . g ( θ , D ) ≥ 0 � � ⊺ where g ( θ , D ) = − θ 2 θ 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 7 / 65

1. Motivation Motivation ALL learning problems are optimization problems In any learning system, we have 1. Parameters θ to enable learning 2. Data set D to learn from 3. A cost function J ( θ , D ) to measure our performance 4. Some assumptions on the data, with equality and inequality constraints, f ( θ , D ) = 0 and g ( θ , D ) > 0 How can we solve such problems in general? K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 65

1. Motivation Optimization problems in Machine Learning Machine Learning tells us how to come up with data-based cost functions such that optimization can solve them! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 9 / 65

1. Motivation Most Cost Functions are Useless Good Machine Learning tells us how to come up with data-based cost functions such that optimization can solve them efficiently! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 65

1. Motivation Good cost functions should be Convex Ideally, the Cost Functions should be Convex! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 11 / 65

2. Convexity Outline 1. Motivation 2. Convexity Convex Sets Convex Functions 3. Unconstrained & Constrained Optimization 4. Numerical Optimization 5. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 12 / 65

2. Convexity : Convex Sets Convex Sets A set C ⊆ R n is convex if ∀ x , y ∈ C and ∀ α ∈ [ 0 , 1 ] α x + ( 1 − α ) y ∈ C This is the equation of the line segment between x and y . I.e., for a given α , the point α x + ( 1 − α ) y lies in the line segment between x and y K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 13 / 65

2. Convexity : Convex Sets Examples of Convex Sets All of R n (obvious) Non-negative orthant : R n + . Let x � 0 , y � 0, clearly α x + ( 1 − α ) y � 0 Norm balls . Let � x � ≤ 1 , � y � ≤ 1, then � α x + ( 1 − α ) y � ≤ � α x � + � ( 1 − α ) y � = α � x � + ( 1 − α ) � y � ≤ 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 14 / 65

2. Convexity : Convex Sets Examples of Convex Sets Affine subspaces (linear manifold) : Ax = b , Ay = b , then A ( α x + ( 1 − α ) y ) = α Ax + ( 1 − α ) Ay = α b + ( 1 − α ) b = b K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 15 / 65

2. Convexity : Convex Functions Convex Functions A function f : R n → R is convex if ∀ x , y ∈ dom ( f ) and ∀ α ∈ [ 0 , 1 ] f ( α x + ( 1 − α ) y ) ≤ α f ( x ) + ( 1 − α ) f ( y ) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 16 / 65

2. Convexity : Convex Functions Examples of Convex Functions Linear/affine functions f ( x ) = b ⊺ x + c Quadratic functions f ( x ) = 1 2 x ⊺ Ax + b ⊺ x + c where A � 0 (positive semidefinite matrix) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 17 / 65

2. Convexity : Convex Functions Examples of Convex Functions Norms (such as l 1 and l 2 ) � α x + ( 1 − α ) y � ≤ � α x � + � ( 1 − α ) y � = α � x � + ( 1 − α ) � y � Log-sum-exp (aka softmax, a smooth approximation to the maximum function often used on machine learning) � n � � f ( x ) = log exp ( x i ) i = 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 18 / 65

2. Convexity : Convex Functions Important Convex Functions from Classification SVM loss � 1 − y i x ⊺ � f ( w ) = i w + Binary logistic loss � � �� f ( w ) = log 1 + exp − y i x ⊺ i w K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 19 / 65

2. Convexity : Convex Functions First-Order Convexity Condition Suppose f : R n → R is differentiable . Then f is convex iff ∀ x , y ∈ dom ( f ) f ( y ) ≥ f ( x ) + ∇ x f ( x ) ⊺ ( y − x ) K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 20 / 65

2. Convexity : Convex Functions First-Order Convexity Condition - generally... The subgradient, or subdifferential set, ∂ f ( x ) of f at x is ∂ f ( x ) = { g : f ( y ) ≥ f ( x ) + g ⊺ ( y − x ) , ∀ y } Differentiability is not a requirement! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 21 / 65

2. Convexity : Convex Functions Second-Order Convexity Condition Suppose f : R n → R is twice differentiable . Then f is convex iff ∀ x ∈ dom ( f ) ∇ 2 x f ( x ) � 0 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 22 / 65

2. Convexity : Convex Functions Ideal Machine Learning Cost Functions 0 Convex Function min J ( θ , D ) = θ s.t. f ( θ , D ) = 0 Affine/Linear Function g ( θ , D ) ≥ 0 Convex Set K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 23 / 65

2. Convexity : Convex Functions Why are these conditions nice? Local solutions are globally optimal! Fast and well studied optimizers already exist for a long time! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 24 / 65

3. Unconstrained & Constrained Optimization Outline 1. Motivation 2. Convexity Convex Sets Convex Functions 3. Unconstrained & Constrained Optimization 4. Numerical Optimization 5. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 25 / 65

3. Unconstrained & Constrained Optimization Unconstrained optimization Can you solve this problem? 1 − θ 2 1 − θ 2 max J ( θ ) = 2 θ With θ ∗ = � ⊺ , J ∗ = 1 � 0 0 For any other θ � = 0 , J < 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 26 / 65

3. Unconstrained & Constrained Optimization Constrained optimization Can you solve this problem? 1 − θ 2 1 − θ 2 max J ( θ ) = 2 θ s.t. f ( θ ) = θ 1 + θ 2 − 1 = 0 First approach: convert the problem to an unconstrained problem Second approach: Lagrange Multipliers K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 27 / 65

Statistical Machine Learning Lecture 04: Optimization Refresher - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 04: Optimization Refresher Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 65 Todays Objectives Make

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Hardware Acceleration of Feature Detection and Description Algorithms on LowPower Embedded

Self-Adjusting Data Structures 1 Self-Adjusting Data Structures move-to-front 2 7 4 1 9 5

Synchronous Grammars Synchronous grammars are a way of simultaneously generating pairs of

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

Space-filling curves in S p MV multiplication Albert-Jan Yzelman (ExaScience Lab / KU Leuven)

Data Structures in Java Session 10 Instructor: Bert Huang

Secondary electron interference from trigonal warping in clean carbon nanotubes A. Dirnaichner et

Transform Coding - Overview Principle of block-wise transform coding Properties of orthonormal

Sambuz

Useful Links

Newsletter

Mail Us