Lecture 6: Multiple and Poly Linear Regression CS109A Introduction - PowerPoint PPT Presentation

Lecture 6: Multiple and Poly Linear Regression CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader and Chris Tanner 1

ANNOUNCEMENTS Office Hours : • More office hours, schedule will be posted soon. On-line office hours are for everyone, please take advantage of them. Projects: • Project guidelines and project descriptions will be posted Thursday 9/25. Milestone-1: Signup for project is Wed 10/2 . CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 2

Summary from last lecture We assume a simple form of the statistical model 𝑔: 𝑍 = 𝑔 𝑌 + 𝜗 = 𝛾 ) + 𝛾 * 𝑌 + 𝜗 CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 3

� Summary from last lecture , ) , 𝛾 , * that minimize the loss We fit the model, i.e. estimate , 𝛾 function, which we assume to be the MSE: 𝑀 ./0 𝛾 ) , 𝛾 * = 1 𝑜 3 𝑧 5 − 𝛾 ) + 𝛾 * 𝑌 7 9 β 0 , b b β 1 = argmin L ( β 0 , β 1 ) . β 0 , β 1 CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 4

Summary from last lecture We acknowledge that because there are errors in measurements and a limited sample, there is an inherent uncertainty in the , ) , 𝛾 , * . estimation of 𝛾 , ) , 𝛾 , * We used bo bootstrap to estimate the distributions of 𝛾 2 CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 5

Summary from last lecture We calculate the confidence intervals, which are the ranges of values such that the true value of 𝛾 * is contained in this interval with n percent probability. 95% 68% CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 6

Summary from last lecture We evaluate the importance of predictors using hypothesis testing, using the t-statistics and p-values. 𝜈 S T U − 0 2 𝜏 S T U CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 7

Summary from last lecture Model Fitness How does the model perform predicting? Comparison of Two Models How do we choose from two different models? Evaluating Significance of Predictors Does the outcome depend on the predictors? Y How well do we know 𝒈 This lecture , The confidence intervals of our 𝑔 CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 8

Summary , How well do we know 𝑔 , The confidence intervals of our 𝑔 • Multi-linear Regression • Formulate it in Linear Algebra • Categorical Variables • Interaction terms • Polynomial Regression • Linear Algebra Formulation CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 9

Summary , How well do we know 𝑔 , The confidence intervals of our 𝑔 • Multi-linear Regression • Formulate it in Linear Algebra • Categorical Variables • Interaction terms • Polynomial Regression • Linear Algebra Formulation CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 10

, ? How well do we know 𝑔 Our confidence in 𝑔 is directly connected with the confidence in 𝛾 s. So for each bootstrap sample, we have one 𝛾 ) , 𝛾 * which we can use to predict y for all x’ s. CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 11

, ? How well do we know 𝑔 Here we show two difference set of models given the fitted coefficients. CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 12

, ? How well do we know 𝑔 There is one such regression line for every bootstrapped sample. CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 13

, ? How well do we know 𝑔 Below we show all regression lines for a thousand of such bootstrapped samples. , , and determine the mean For a given 𝑦 , we examine the distribution of 𝑔 and standard deviation. CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 14

, ? How well do we know 𝑔 Below we show all regression lines for a thousand of such sub-samples. , , and determine the mean For a given 𝑦 , we examine the distribution of 𝑔 and standard deviation. CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 15

, ? How well do we know 𝑔 Below we show all regression lines for a thousand of such sub-samples. , , and determine the mean For a given 𝑦 , we examine the distribution of 𝑔 and standard deviation. CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 16

, ? How well do we know 𝑔 , (shown with dotted For every 𝑦 , we calculate the mean of the models, 𝑔 line) and the 95% CI of those models (shaded area). , Estimated 𝑔 CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 17

Confidence in predicting 𝑧 ] CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 18

Confidence in predicting 𝑧 ] • for a given x , we have a distribution of models 𝑔 𝑦 • for each of these 𝑔 𝑦 , the prediction for 𝑧~𝑂(𝑔, 𝜏 a ) CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 19

Confidence in predicting 𝑧 ] • for a given x , we have a distribution of models 𝑔 𝑦 • for each of these 𝑔 𝑦 , the prediction for 𝑧~𝑂 𝑔, 𝜏 a • The prediction confidence intervals are then CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 20

Lecture Outline Y How well do we know 𝒈 , The confidence intervals of our 𝑔 • Multi-linear Regression Brute Force • Exact method • Gradient Descent • • Polynomial Regression CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 21

Multiple Linear Regression If you have to guess someone's height, would you rather be told • Their weight, only • Their weight and gender • Their weight, gender, and income • Their weight, gender, income, and favorite number Of course, you'd always want as much data about a person as possible. Even though height and favorite number may not be strongly related, at worst you could just ignore the information on favorite number. We want our models to be able to take in lots of data as they make their predictions. CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 22

Response vs. Predictor Variables X Y predictors outcome features response variable covariates dependent variable n observations TV radio newspaper sales 230.1 37.8 69.2 22.1 44.5 39.3 45.1 10.4 17.2 45.9 69.3 9.3 151.5 41.3 58.5 18.5 180.8 10.8 58.4 12.9 p predictors CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 23

Multilinear Models In practice, it is unlikely that any response variable Y depends solely on one predictor x . Rather, we expect that is a function of multiple predictors 𝑔(𝑌 * , … , 𝑌 d ) . Using the notation we introduced last lecture, 𝑍 = 𝑧 * , … , 𝑧 9 , 𝑌 = 𝑌 * , … , 𝑌 d and 𝑌 e = 𝑦 *e , … , 𝑦 5e , … , 𝑦 9e In this case, we can still assume a simple form for 𝑔 -a multilinear form: Y = f ( X 1 , . . . , X J ) + ✏ = � 0 + � 1 X 1 + � 2 X 2 + . . . + � J X J + ✏ , , has the form Hence, 𝑔 Y = ˆ ˆ f ( X 1 , . . . , X J ) + ✏ = ˆ � 0 + ˆ � 1 X 1 + ˆ � 2 X 2 + . . . + ˆ � J X J + ✏ CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 24

Multiple Linear Regression , ) , … , 𝛾 , d or to minimize a loss Again, to fit this model means to compute 𝛾 function; we will again choose the MSE as our loss function. Given a set of observations, { ( x 1 , 1 , . . . , x 1 ,J , y 1 ) , . . . ( x n, 1 , . . . , x n,J , y n ) } , the data and the model can be expressed in vector notation,     β 0 1 x 1 , 1 . . . x 1 ,J   y 1 β 1 1 x 2 , 1 . . . x 2 ,J     . . β β β = Y = X =       . . . .  , ...  ,  , . . . . .      . . . .   y y β J 1 x n, 1 . . . x n,J CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 25

Multilinear Model, example For our data Sales = 𝛾 ) + 𝛾 * × 𝑈𝑊 + 𝛾 7 ×𝑆𝑏𝑒𝑗𝑝 + 𝛾 o ×𝑂𝑓𝑥𝑡𝑞𝑏𝑞𝑓𝑠 + 𝜗 In linear algebra notation 𝑇𝑏𝑚𝑓𝑡 * 1 𝑈𝑊 * 𝑆𝑏𝑒𝑗𝑝 * 𝑂𝑓𝑥𝑡 * 𝛾 ) ⋮ ⋮ ⋮ ⋮ ⋮ 𝒁 = , 𝒀 = , 𝜸 = 𝑇𝑏𝑚𝑓𝑡 9 1 𝑈𝑊 9 . 𝑆𝑏𝑒𝑗𝑝 9 𝑂𝑓𝑥𝑡 9 𝛾 o = × CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 26

Multiple Linear Regression The model takes a simple algebraic form: Y = X � + ✏ Thus, the MSE can be expressed in vector notation as MSE( β ) = 1 n k Y − X β k 2 Minimizing the MSE using vector calculus yields, � � � 1 X > Y = argmin b X > X β = MSE( β β ) . β β β β β β CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 27

Standard Errors for Multiple Linear Regression As with the simple linear regression, he standard errors can be calculated either using statistical modeling SE ( β 1 ) = σ 2 ( XX T ) − 1 Or bootstrap CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 28

Collinearity Collinearity refers to the case in which two or more predictors are correlated (related). We will re-visit collinearity in the next lecture when we address overfitting , but for now we want to examine how does collinearity affects our confidence on the coefficients and consequently on the importance of those coefficients. CS109A, P ROTOPAPAS , R ADER , T ANNER P AVLOS P ROTOPAPAS 29

Lecture 6: Multiple and Poly Linear Regression CS109A Introduction - PowerPoint PPT Presentation

Lecture 6: Multiple and Poly Linear Regression CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader and Chris Tanner 1 ANNOUNCEMENTS Office Hours : More office hours, schedule will be posted soon. On-line office hours are

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Interactive Proofs Lecture 19 And Beyond 1 So far 2 So far IP = PSPACE = AM[poly] 2 So far

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model

Multiple Regression Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression

Multiple Linear Regression James H. Steiger Department of Psychology and Human Development

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Classification of species and age composition of forest stands from hyperspectral airborne remote

Classifier MTL 782 IIT DELHI Instance-Based Classifiers Set of Stored Cases Store the

Anne Neumann (DIW Berlin, University Potsdam) Maria Nieswand (DIW Berlin) Torben Schubert

CFD Lab Course The Lattice Boltzmann Method Philipp Neumann 20.5.2011 P. Neumann: CFD Lab

Measuring Audible Wind Turbine Noise Measuring Audible Wind Turbine Noise Consultation Session

SIMULTANEOUS GEOMETRIC AND COLORIMETRIC CAMERA CALIBRATION Ilmenau, 7th October 2010 Daniel

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Franz Kellermanns Importance of family friendly journals Family Business Review