with Linear Models CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu - PowerPoint PPT Presentation

Binary Classification with Linear Models CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Figures credit: Piyush Rai

T opics • Linear Models – Loss functions – Regularization • Gradient Descent • Calculus refresher – Convexity – Gradients [CIML Chapter 6]

Binary classification via hyperplanes • A classifier is a hyperplane (w,b) • At test time, we check on what side of the hyperplane examples fall 𝑧 = 𝑡𝑗𝑕𝑜(𝑥 𝑈 𝑦 + 𝑐) • This is a linear classifier – Because the prediction is a linear combination of feature values x

Learning a Linear Classifier as an Optimization Problem Loss function Regularizer measures how well prefers solutions Objective classifier fits training that generalize function data well Indicator function: 1 if (.) is true, 0 otherwise The loss function above is called the 0-1 loss

Learning a Linear Classifier as an Optimization Problem • Problem: The 0-1 loss above is NP-hard to optimize • Solution: Different loss function approximations and regularizers lead to specific algorithms (e.g., perceptron, support vector machines, logistic regression, etc.)

The 0-1 Loss • Small changes in w,b can lead to big changes in the loss value • 0-1 loss is non-smooth, non-convex

Calculus refresher: Smooth functions, convex functions

Approximating the 0-1 loss with surrogate loss functions • Examples (with b = 0) – Hinge loss – Log loss – Exponential loss • All are convex upper- bounds on the 0-1 loss

Approximating the 0-1 loss with surrogate loss functions • Examples (with b = 0) – Hinge loss – Log loss – Exponential loss • Q: Which of these loss functions is not smooth?

Approximating the 0-1 loss with surrogate loss functions • Examples (with b = 0) – Hinge loss – Log loss – Exponential loss • Q: Which of these loss functions is most sensitive to outliers?

Casting Linear Classification as an Optimization Problem Loss function Regularizer measures how well prefers solutions Objective classifier fits training that generalize function data well Indicator function: 1 if (.) is true, 0 otherwise The loss function above is called the 0-1 loss

The regularizer term • Goal: find simple solutions (inductive bias) • Ideally, we want most entries of w to be zero, so prediction depends only on a small number of features. • Formally, we want to minimize: • That’s NP -hard, so we use approximations instead. – E.g., we encourage w d ’s to be small

Norm-based Regularizers • 𝑚 𝑞 norms can be used as regularizers Contour plots for p = 2 p = 1 p < 1

Norm-based Regularizers • 𝑚 𝑞 norms can be used as regularizers • Smaller p favors sparse vectors w – i.e. most entries of w are close or equal to 0 • 𝑚 2 norm: convex, smooth, easy to optimize • 𝑚 1 norm: encourages sparse w, convex, but not smooth at axis points • 𝑞 < 1 : norm becomes non convex and hard to optimize

Casting Linear Classification as an Optimization Problem Loss function Regularizer measures how well prefers solutions Objective classifier fits training that generalize function data well Indicator function: 1 if (.) is true, 0 otherwise The loss function above is called the 0-1 loss

What is the perceptron optimizing? • Loss function is a variant of the hinge loss

Recap: Linear Models • General framework for binary classification • Cast learning as optimization problem • Optimization objective combines 2 terms – loss function: measures how well classifier fits training data – Regularizer: measures how simple classifier is • Does not assume data is linearly separable • Lets us separate model definition from training algorithm

Calculus refresher: Gradients

Gradient descent • A general solution for our optimization problem Idea: take iterative steps to update parameters in the direction of the gradient

Gradient descent algorithm

Recap: Linear Models • General framework for binary classification • Cast learning as optimization problem • Optimization objective combines 2 terms – loss function: measures how well classifier fits training data – Regularizer: measures how simple classifier is • Does not assume data is linearly separable • Lets us separate model definition from training algorithm (Gradient Descent)

with Linear Models CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu - PowerPoint PPT Presentation

Binary Classification with Linear Models CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Figures credit: Piyush Rai T opics Linear Models Loss functions Regularization Gradient Descent Calculus refresher Convexity

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Application Site Area 2006 Aerial Photos Application Number : WA10/0425 Aerial 1 : The former

www.toddklindt.com www.toddklindt.com/HSUS How Do I Get to it? Your Browser no really

Decision aid methodologies in transportation Lecture 3: Crew Scheduling Prem Kumar

Goal Appreciate the importance of complexity management in networked computing

Webinar: How to Lower inputs Take II Chris Leslie, Georgina Key, Mark Topliff, AHDB The

Quo Vadis Program Verification Krzysztof R. Apt CWI, Amsterdam, the Netherlands , University of

Arrangements of Conic Arcs Athanasios K AKARGIAS a joint work with Elias T SIGARIDAS Ioannis E

2 )

with Linear Models CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu - PowerPoint PPT Presentation

Binary Classification with Linear Models CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Figures credit: Piyush Rai T opics Linear Models Loss functions Regularization Gradient Descent Calculus refresher Convexity

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Application Site Area 2006 Aerial Photos Application Number : WA10/0425 Aerial 1 : The former

www.toddklindt.com www.toddklindt.com/HSUS How Do I Get to it? Your Browser no really

Decision aid methodologies in transportation Lecture 3: Crew Scheduling Prem Kumar

Goal Appreciate the importance of complexity management in networked computing

Webinar: How to Lower inputs Take II Chris Leslie, Georgina Key, Mark Topliff, AHDB The

Quo Vadis Program Verification Krzysztof R. Apt CWI, Amsterdam, the Netherlands , University of

Arrangements of Conic Arcs Athanasios K AKARGIAS a joint work with Elias T SIGARIDAS Ioannis E

2 )

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE