Regression l For classification the output(s) is nominal l In - PowerPoint PPT Presentation

Regression l For classification the output(s) is nominal l In regression the output is continuous – Function Approximation l Many models could be used – Simplest is linear regression – Fit data with the best hyper-plane which "goes through" the points y dependent variable (output) x – independent variable (input) CS 472 - Regression 1

Regression l For classification the output(s) is nominal l In regression the output is continuous – Function Approximation l Many models could be used – Simplest is linear regression – Fit data with the best hyper-plane which "goes through" the points y dependent variable (output) x – independent variable (input) CS 472 - Regression 2

Regression l For classification the output(s) is nominal l In regression the output is continuous – Function Approximation l Many models could be used – Simplest is linear regression – Fit data with the best hyper-plane which "goes through" the points – For each point the difference between the predicted point and the actual observation is the residue y x CS 472 - Regression 3

Simple Linear Regression l For now, assume just one (input) independent variable x , and one (output) dependent variable y – Multiple linear regression assumes an input vector x – Multivariate linear regression assumes an output vector y l We "fit" the points with a line (i.e. hyper-plane) l Which line should we use? – Choose an objective function – For simple linear regression we choose sum squared error (SSE) l S ( predicted i – actual i ) 2 = S ( residue i ) 2 – Thus, find the line which minimizes the sum of the squared residues (e.g. least squares) – This mimics the case where data points were sampled from the actual hyperplane with Gaussian noise added CS 472 - Regression 4

How do we "learn" parameters l For the 2- d problem (line) there are coefficients for the bias and the independent variable ( y -intercept and slope) Y = β 0 + β 1 X l To find the values for the coefficients which minimize the objective function we take the partial derivates of the objective function (SSE) with respect to the coefficients. Set these to 0, and solve. ∑ ∑ ∑ n xy x y − ∑ ∑ y x − β 1 β 1 = β 0 = 2 x 2 − ( ) ∑ ∑ n n x CS 472 - Regression 5

Multiple Linear Regression 1 X 1 + β 2 X 2 + … + β n X n Y = β 0 + β l There is a closed form for finding multiple linear regression weights which requires matrix inversion, etc. l There are also iterative techniques to find weights l One is the delta rule. For regression we use an output node which is not thresholded (just does a linear sum) and iteratively apply the delta rule – thus the net is the output Δ w i = c ( t − net ) x i l Where c is the learning rate and x is the input for that weight l Delta rule will update towards the objective of minimizing the SSE, thus solving multiple linear regression l There are other regression approaches that give different results by trying to better handle outliers and other statistical anomalies CS 472 - Regression 6

SSE and Linear Regression l SSE chooses to square the difference of the predicted vs actual. Why square? l Don't want residues to cancel each other l Could use absolute or other distances to solve problem S | predicted i – actual i | : L1 vs L2 l SSE leads to a parabolic error surface which is good for gradient descent l Which line would least squares choose? – There is always one “best” fit CS 472 - Regression 7

SSE and Linear Regression l SSE leads to a parabolic error surface which is good for gradient descent 7 l Which line would least squares choose? – There is always one “best” fit CS 472 - Regression 8

SSE and Linear Regression l SSE leads to a parabolic error surface which is good for gradient descent 7 l Which line would least squares 5 choose? – There is always one “best” fit CS 472 - Regression 9

SSE and Linear Regression l SSE leads to a parabolic error surface which is good for gradient descent 7 l Which line would least squares 3 3 5 choose? – There is always one “best” fit l Note that the squared error causes the model to be more highly influenced by outliers – Though best fit assuming Gaussian noise error from true surface CS 472 - Regression 10

SSE and Linear Regression Generalization l In generalization all x values map to a y value on the chosen 7 regression line 3 5 3 y – Input Value 1 0 1 2 3 x – Input Value CS 472 - Regression 11

Linear Regression - Challenge Question Δ w i = c ( t − net ) x i l Assume we start with all weights as 1 (don’t use bias weight though you usually always will – else forces the line through the origin) l Remember for regression we use an output node which is not thresholded (just does a linear sum) and iteratively apply the delta rule – thus the net is the output l What are the new weights after one iteration through the following training set using the delta rule with a learning rate c = 1 l How does it generalize for the novel input (-.3, 0)? l After one epoch the weight vector is: x 1 x 2 Target y 1 .5 A. 1.35 .94 B. .5 -.2 1 1.35 .86 C. 1 0 -.4 .4 .86 D. None of the above E. CS 472 - Regression 12

Linear Regression - Challenge Question Δ w i = c ( t − net ) x i l Assume we start with all weights as 1 l What are the new weights after one iteration through the training set using the delta rule with a learning rate c = 1 l How does it generalize for the novel input (-.3, 0)? x 1 x 2 Target Net w 1 w 2 1 1 w 1 = 1 + .5 -.2 1 1 0 -.4 CS 472 - Regression 13

Linear Regression - Challenge Question Δ w i = c ( t − net ) x i l Assume we start with all weights as 1 l What are the new weights after one iteration through the training set using the delta rule with a learning rate c = 1 l How does it generalize for the novel input (-.3, 0)? – -.3*-.4 + 0*.86 = .12 x 1 x 2 Target Net w 1 w 2 1 1 w 1 = 1 + 1(1 – .3).5 = 1.35 .5 -.2 1 .3 1.35 .86 1 0 -.4 1.35 -.4 .86 CS 472 - Regression 14

Linear Regression Homework Δ w i = c ( t − net ) x i l Assume we start with all weights as 0 (Include the bias!) l What are the new weights after one iteration through the following training set using the delta rule with a learning rate c = .2 l How does it generalize for the novel input (1, .5)? x 1 x 2 Target .3 .8 .7 -.3 1.6 -.1 .9 0 1.3 CS 472 - Regression 15

Intelligibility (Interpretable ML, Transparent) l One advantage of linear regression models (and linear classification) is the potential to look at the coefficients to give insight into which input variables are most important in predicting the output l The variables with the largest magnitude have the highest correlation with the output – A large positive coefficient implies that the output will increase when this input is increased (positively correlated) – A large negative coefficient implies that the output will decrease when this input is increased (negatively correlated) – A small or 0 coefficient suggests that the input is uncorrelated with the output (at least at the 1 st order) l Linear regression can be used to find best "indicators" – Be careful not to confuse correlation with causality – Linear cannot detect higher order correlations!! The power of more complex machine learning models. CS 472 - Regression 16

Anscombe's Quartet What lines "really" best fit each case? – different approaches CS 472 - Regression 17

Delta rule natural for regression, not classification Δ w i = c ( t − net ) x i First consider the one dimensional case l The decision surface for the perceptron would be any (first) point that divides l instances x Delta rule will try to fit a line through the target values which minimizes SSE l and the decision point is where the line crosses .5 for 0/1 targets. Look down on data for perceptron view. Now flip it on its side for delta rule view. Will converge to the one optimal line (and dividing point) for this objective l 1 z 0 x CS 472 - Regression 18

Delta Rule for Classification? 1 z 0 x 1 z 0 x What would happen in this adjusted case for perceptron and delta rule and l where would the decision point (i.e. .5 crossing) be? CS 472 - Regression 19

Delta Rule for Classification? 1 z 0 x 1 z 0 x Leads to misclassifications even though the data is linearly separable l For Delta rule the objective function is to minimize the regression line SSE, l not maximize classification CS 472 - Regression 20

Delta Rule for Classification? 1 z 0 x 1 z 0 x 1 z 0 x What would happen if we were doing a regression fit with a sigmoid/logistic l curve rather than a line? CS 472 - Regression 21

Delta Rule for Classification? 1 z 0 x 1 z 0 x 1 z 0 x Sigmoid fits many decision cases quite well! This is basically what logistic l regression does. CS 472 - Regression 22

Regression l For classification the output(s) is nominal l In - PowerPoint PPT Presentation

Regression l For classification the output(s) is nominal l In regression the output is continuous Function Approximation l Many models could be used Simplest is linear regression Fit data with the best hyper-plane which "goes

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

BS2247 Introduction to Econometrics Lecture 3: The simple regression model OLS, Algebraic

Lecture 16: Subexponential Time Algorithm for Small Set Expansion and Unique Games Lecture

SSE 3200 Mysql lab Introduction SQL (Structured Query Language) is a standard language for

SSE and Projector Quantum Monte Carlo Pranay Patil Department of Physics Boston University

Colliding Blobs with Threading Building Blocks Adam Sampson Institute of Arts, Media and

Introduction to Machine Learning Linear Regression Models Learning goals Know the hypothesis

Dynamic Searchable M. Naveed, Encryption via Blind M. Prabhakaran, C.A. Gunter Storage

Mobile SSM Sources (magma/mobileip/ssm interactions) Dave Thaler dthaler@microsoft.com Mobile

Regression l For classification the output(s) is nominal l In - PowerPoint PPT Presentation

Regression l For classification the output(s) is nominal l In regression the output is continuous Function Approximation l Many models could be used Simplest is linear regression Fit data with the best hyper-plane which "goes

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

BS2247 Introduction to Econometrics Lecture 3: The simple regression model OLS, Algebraic

Lecture 16: Subexponential Time Algorithm for Small Set Expansion and Unique Games Lecture

SSE 3200 Mysql lab Introduction SQL (Structured Query Language) is a standard language for

SSE and Projector Quantum Monte Carlo Pranay Patil Department of Physics Boston University

Colliding Blobs with Threading Building Blocks Adam Sampson Institute of Arts, Media and

Introduction to Machine Learning Linear Regression Models Learning goals Know the hypothesis

Dynamic Searchable M. Naveed, Encryption via Blind M. Prabhakaran, C.A. Gunter Storage

Mobile SSM Sources (magma/mobileip/ssm interactions) Dave Thaler dthaler@microsoft.com Mobile

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and