Multivariate Regression Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Multivariate Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 1 / 21

Table of Contents Multivariate Regression 1 Confidence Intervals and Significance Tests 2 ANOVA Tables for Multivariate Regression 3 Chapter #11 R Assignment 4 Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 2 / 21

Multivariate Regression Multivariate Regression Multivariate Regression Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 3 / 21

Multivariate Regression Given multivariate variate data, ( x (1) 1 , x (1) 2 , · · · , x (1) k , y 1 ) , ( x (2) 1 , x (2) 2 , · · · , x (2) k , y 2 ) , · · · , ( x ( n ) 1 , x ( n ) 2 , · · · , x ( n ) k , y n ) where x ( i ) 1 , x ( i ) 2 , · · · , x ( i ) is a predictor of the response y i , one explores the k following possible model. Definition (Statistical Model of Multivariate Linear Regression) Given a k dimensional multivariate predictor, ( x ( i ) 1 , x ( i ) 2 , · · · , x ( i ) k ), the response, y i , is y i = β 0 + β 1 x ( i ) + · · · + β k x ( i ) + ǫ i 1 k where β 0 + β 1 x ( i ) + · · · + β k x ( i ) is the mean response . The noise terms, 1 k the ǫ i ’s are assumed to be independent of each other and to be randomly sampled from N (0 , σ ). The parameters of the model are β 0 , β 1 , · · · , β k and σ . Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 4 / 21

Multivariate Regression Definition Given a multivariate normal sample, � x (1) 1 , · · · , x (1) � � x ( n ) 1 , · · · , x ( n ) � k , y 1 , · · · , k , y n , the least–squares multiple regression equation , y = b 0 + b 1 x 1 + · · · + b k x k , ˆ is the linear equation that minimizes n y j − y j ) 2 , � (ˆ j =1 where def = b 0 + b 1 x ( j ) + · · · + b k x ( j ) y j ˆ k . 1 Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 5 / 21

Multivariate Regression There must be at least k + 2 data points to do obtain the estimators � n y i ) 2 j =1 ( y i − ˆ s 2 def b 0 , b j ’s and = n − k − 1 of β 0 , β j ’s and σ 2 , where b 0 , the y –intercept, is the unbiased, least square estimator of β 0 . b j , the coefficient of x j , is the unbiased, least square estimator of β j . s 2 is an unbiased estimator of σ 2 and s is an estimator of σ . Due to computational intensity, computers are used to obtain b 0 , b j ’s and s 2 . Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 6 / 21

Confidence Intervals and Significance Tests Confidence Intervals and Significance Tests Confidence Intervals and Significance Tests Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 7 / 21

Confidence Intervals and Significance Tests Due to computational intensity – computer programs are used with multiple regression. In particular, computers are used to calculate the SE b j ’s, the standard error of the b j ’s. Theorem To test the hypothesis H 0 : β j = 0 use the test statistic b j t ∼ ∼ t ( n − k − 1) for H 0 . SE b j A level (1 − α )100 % confidence interval for β j is b j ± t ∗ ( n − k − 1) SE b j . Accepting H 0 : β j = 0 is accepting that there is no linear association between X j and Y , ie that correlation between X j and Y is zero. Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 8 / 21

Confidence Intervals and Significance Tests Example > g.lm=lm(mpg~disp+hp+wt+qsec, data=mtcars) > par(mfrow=c(2,2)) > plot(g.lm) > par(mfrow=c(1,1)) Does the linear model fit? Residuals vs Fitted Normal Q−Q 6 ● Chrysler Imperial Fiat 128 ● Chrysler Imperial ● Toyota Corolla Fiat 128 ● Toyota Corolla ● Standardized residuals ● 2 4 ● Residuals ● ● ● 2 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● −2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● −4 ● ● ● ● 10 15 20 25 30 −2 −1 0 1 2 Fitted values Theoretical Quantiles Scale−Location Residuals vs Leverage 1.5 ● Chrysler Imperial Fiat 128 ● Toyota Corolla ● Chrysler Imperial ● ● Standardized residuals ● Toyota Corolla Standardized residuals 2 1 ● ● ● 0.5 1.0 ● ● ● ● ● ● ● 1 Maserati Bora ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● 0.5 Cook's distance ● 0.0 −2 1 10 15 20 25 30 0.0 0.1 0.2 0.3 0.4 0.5 Fitted values Leverage Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 9 / 21

Confidence Intervals and Significance Tests Example (cont.) > summary(g.lm) Call: lm(formula = mpg ~ disp + hp + wt + qsec, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.8664 -1.5819 -0.3788 1.1712 5.6468 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 27.329638 8.639032 3.164 0.00383 ** disp 0.002666 0.010738 0.248 0.80576 hp -0.018666 0.015613 -1.196 0.24227 wt -4.609123 1.265851 -3.641 0.00113 ** qsec 0.544160 0.466493 1.166 0.25362 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 2.622 on 27 degrees of freedom Multiple R-squared: 0.8351, Adjusted R-squared: 0.8107 F-statistic: 34.19 on 4 and 27 DF, p-value: 3.311e-10 Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 10 / 21

Confidence Intervals and Significance Tests Example (cont.) And to find confidence intervals for the coefficients: > confint(g.lm) 2.5 % 97.5 % (Intercept) 9.60380809 45.05546784 disp -0.01936545 0.02469831 hp -0.05070153 0.01336912 wt -7.20643496 -2.01181027 qsec -0.41300458 1.50132521 Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 11 / 21

ANOVA Tables for Multivariate Regression ANOVA Tables for Multivariate Regression ANOVA Tables for Multivariate Regression Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 12 / 21

ANOVA Tables for Multivariate Regression Definition n def y ) 2 � SS A = Sum of Squares of Model = (ˆ y j − ¯ j =1 n def � y j ) 2 SS E = Sum of Squares of Error = ( y j − ˆ j =1 n def � y j ) 2 SS TOT = Sum of Squares of Total = ( y j − ¯ j =1 Mean Square of Model = SS A def = MS A k SS E def = Mean Square of Error = MS E n − k − 1 Theorem SS TOT = SS A + SS E . Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 13 / 21

ANOVA Tables for Multivariate Regression Theorem (ANOVA F Test for Multivariate Regression) The test statistic for H O : β 1 = β 2 = · · · = β k = 0 versus H A : not H 0 is f = MS A MS E . The p–value of the above test is P ( F ≥ f ) where F ∼ F ( k , n − k − 1) under H 0 . Statistical Software usually summarizes the calculations and conclusion above in an ANOVA table: Definition (ANOVA Table) Source df SS MS F p –value MS A Model k SS A MS A P ( F ( k , n − k − 1) ≥ f ) MS E Error n − k − 1 SS E MS E Total n − 1 SS TOT Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 14 / 21

ANOVA Tables for Multivariate Regression Definition The squared multiple correlation is given by R 2 def SS A = SS TOT . The multiple √ correlation coefficient is just R = R 2 . SS A measures how much of variation in the data is explained by model. By taking the ratio of SS A to the total amount of variation, SS TOT , one obtains R 2 , the portion of the variation that is explained by the model . In fact, R is just the correlation between the observations and the predicted values. Inflation Problem: As k increases r 2 increases, but the increase in predictability is illusionary. Solution: Best to use Definition The adjusted coefficient of determination is n − 1 R 2 n − k − 1(1 − R 2 ) . adj = 1 − Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 15 / 21

ANOVA Tables for Multivariate Regression > g.lm=lm(mpg~disp+hp+wt+qsec, data=mtcars) > summary(g.lm) Call: lm(formula = mpg ~ disp + hp + wt + qsec, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.8664 -1.5819 -0.3788 1.1712 5.6468 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 27.329638 8.639032 3.164 0.00383 ** disp 0.002666 0.010738 0.248 0.80576 hp -0.018666 0.015613 -1.196 0.24227 wt -4.609123 1.265851 -3.641 0.00113 ** qsec 0.544160 0.466493 1.166 0.25362 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 2.622 on 27 degrees of freedom Multiple R-squared: 0.8351, Adjusted R-squared: 0.8107 F-statistic: 34.19 on 4 and 27 DF, p-value: 3.311e-10 Over 80% of variation explained by the model, but it seems like only weight matters. Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 16 / 21

Multivariate Regression Marc H. Mehlman marcmehlman@yahoo.com - PowerPoint PPT Presentation

Multivariate Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 1 / 21 Table of Contents Multivariate Regression 1 Confidence

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Ensembled Multivariate Adaptive Regression Splines Ensembled Multivariate Adaptive Regression

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions Linear in Parameters : 1)

Spring 2012 Markus Kalisch Goals: Hands-on knowledge able to identify suitable method

Introduction to Psychology 312 (2620) James H. Steiger Department of Psychology and Human

Parametric and non-parametric multivariate test statistics for high-dimensional fMRI data Daniela

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik

Announcements Dont forget about Problem Set 4 Midterm 2 is getting closer (Thursday, February

Multivariate Network Exploration and Presentation: From Detail to Overview via Selections and

Participating centers USA: 6 Europe: 8 Australia: 1 Classification system based on degree of

Sambuz

Useful Links

Newsletter

Mail Us