Optimization for Machine Learning Lecture 1: Introduction to - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 1: Introduction to Convexity S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 12, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 43

Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function a sophisticated discrepancy score Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� i =1 Regularizer � �� Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

Convex Functions and Sets Outline Convex Functions and Sets 1 Operations Which Preserve Convexity 2 First Order Properties 3 Subgradients 4 Constraints 5 Warmup: Minimizing a 1-d Convex Function 6 Warmup: Coordinate Descent 7 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 43

Convex Functions and Sets Focus of my Lectures S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

Convex Functions and Sets Focus of my Lectures 10 2 0 0 − 2 0 − 2 2 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

Convex Functions and Sets Disclaimer My focus is on showing connections between various methods I will sacrifice mathematical rigor and focus on intuition S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 43

Convex Functions and Sets Convex Function f ( x ′ ) f ( x ) A function f is convex if, and only if, for all x , x ′ and λ ∈ (0 , 1) f ( λ x + (1 − λ ) x ′ ) ≤ λ f ( x ) + (1 − λ ) f ( x ′ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

Convex Functions and Sets Convex Function f ( x ′ ) f ( x ) A function f is strictly convex if, and only if, for all x , x ′ and λ ∈ (0 , 1) f ( λ x + (1 − λ ) x ′ ) <λ f ( x ) + (1 − λ ) f ( x ′ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

Convex Functions and Sets Convex Function f ( x ′ ) f ( x ) 2 �·� 2 is convex. A function f is σ -strongly convex if, and only if, f ( · ) − σ That is, for all x , x ′ and λ ∈ (0 , 1) f ( λ x + (1 − λ ) x ′ ) ≤ λ f ( x ) + (1 − λ ) f ( x ′ ) − σ � � x − x ′ � � 2 2 λ (1 − λ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

Convex Functions and Sets Exercise: Jensen’s Inequality Extend the definition of convexity to show that if f is convex, then for all λ i ≥ 0 such that � i λ i = 1 we have �� f λ i x i ≤ λ i f ( x i ) i i S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 7 / 43

Convex Functions and Sets Some Familiar Examples 12 10 8 6 4 2 − 4 − 2 2 4 2 x 2 (Square norm) f ( x ) = 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

Convex Functions and Sets Some Familiar Examples 60 40 20 0 3 − 2 2 1 0 0 − 1 2 − 2 − 3 � � 10 , 1 � � x � f ( x , y ) = 1 � x , y 2 2 , 1 y S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

Convex Functions and Sets Some Familiar Examples 0 − 0 . 2 − 0 . 4 − 0 . 6 0 0 . 2 0 . 4 0 . 6 0 . 8 1 f ( x ) = x log x + (1 − x ) log(1 − x ) (Negative entropy) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

Convex Functions and Sets Some Familiar Examples 0 − 0 . 5 − 1 − 1 . 5 2 1 . 5 − 2 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 . 5 1 . 2 1 . 4 1 . 6 1 . 8 2 0 f ( x , y ) = x log x + y log y − x − y (Un-normalized negative entropy) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

Convex Functions and Sets Some Familiar Examples 4 3 2 1 0 − 3 − 2 − 1 0 1 2 3 f ( x ) = max(0 , 1 − x ) (Hinge Loss) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

Convex Functions and Sets Some Other Important Examples Linear functions: f ( x ) = ax + b Softmax: f ( x ) = log � i exp( x i ) �� i x 2 Norms: For example the 2-norm f ( x ) = i S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 9 / 43

Convex Functions and Sets Convex Sets A set C is convex if, and only if, for all x , x ′ ∈ C and λ ∈ (0 , 1) we have λ x + (1 − λ ) x ′ ∈ C S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 43

Convex Functions and Sets Convex Sets and Convex Functions A function f is convex if, and only if, its epigraph is a convex set S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 43

Convex Functions and Sets Convex Sets and Convex Functions Indicator functions of convex sets are convex � 0 if x ∈ C I C ( x ) = ∞ otherwise . S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 43

Convex Functions and Sets Below sets of Convex Functions 10 2 0 0 − 2 0 − 2 2 f ( x , y ) = x 2 + y 2 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 43

Convex Functions and Sets Below sets of Convex Functions 0 − 1 2 − 2 1 0 0 . 5 1 1 . 5 2 0 f ( x , y ) = x log x + y log y − x − y S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 43

Convex Functions and Sets Below sets of Convex Functions If f is convex, then all its level sets are convex Is the converse true? (Exercise: construct a counter-example) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 14 / 43

Convex Functions and Sets Minima on Convex Sets Set of minima of a convex function is a convex set Proof: Consider the set { x : f ( x ) ≤ f ∗ } S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 43

Convex Functions and Sets Minima on Convex Sets Set of minima of a strictly convex function is a singleton Proof: try this at home! S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 43

Operations Which Preserve Convexity Outline Convex Functions and Sets 1 Operations Which Preserve Convexity 2 First Order Properties 3 Subgradients 4 Constraints 5 Warmup: Minimizing a 1-d Convex Function 6 Warmup: Coordinate Descent 7 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 16 / 43

Optimization for Machine Learning Lecture 1: Introduction to - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 1: Introduction to Convexity S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 12, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 43

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Local Function Optimization COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Convex Functions (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline The

Application of K ahler manifold to signal processing and Bayesian inference Jaehyung Choi ,

= + N 1 N M measurements M 1 sparse signal Problem : Solve for x Basis pursuit,

Bayesian calibration and uncertainty quantification of computer models Application to the

Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic

Note on von Neumann and R enyi entropies of a graph Jephian C.-H. Lin Department of

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Randomness in Computing L ECTURE 21 Last time Probabilistic method Sample and Modify