Marc Mehlman Marc Mehlman
Multivariate Regression
Marc H. Mehlman
marcmehlman@yahoo.com
University of New Haven
Marc Mehlman (University of New Haven) Multivariate Regression 1 / 21
Multivariate Regression Marc H. Mehlman marcmehlman@yahoo.com - - PowerPoint PPT Presentation
Multivariate Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Multivariate Regression 1 / 21 Table of Contents Multivariate Regression 1 Confidence
Marc Mehlman Marc Mehlman
University of New Haven
Marc Mehlman (University of New Haven) Multivariate Regression 1 / 21
Marc Mehlman Marc Mehlman
1
2
3
4
Marc Mehlman (University of New Haven) Multivariate Regression 2 / 21
Marc Mehlman Marc Mehlman
Multivariate Regression
Marc Mehlman (University of New Haven) Multivariate Regression 3 / 21
Marc Mehlman Marc Mehlman
Multivariate Regression
1 , x(1) 2 , · · · , x(1) k , y1), (x(2) 1 , x(2) 2 , · · · , x(2) k , y2), · · · , (x(n) 1 , x(n) 2 , · · · , x(n) k , yn)
1 , x(i) 2 , · · · , x(i) k
1 , x(i) 2 , · · · , x(i) k ), the
1
k
1
k
Marc Mehlman (University of New Haven) Multivariate Regression 4 / 21
Marc Mehlman Marc Mehlman
Multivariate Regression
1 , · · · , x(1) k , y1
1 , · · · , x(n) k , yn
n
def
1
k .
Marc Mehlman (University of New Haven) Multivariate Regression 5 / 21
Marc Mehlman Marc Mehlman
Multivariate Regression
j=1(yi − ˆ
Marc Mehlman (University of New Haven) Multivariate Regression 6 / 21
Marc Mehlman Marc Mehlman
Confidence Intervals and Significance Tests
Marc Mehlman (University of New Haven) Multivariate Regression 7 / 21
Marc Mehlman Marc Mehlman
Confidence Intervals and Significance Tests
Marc Mehlman (University of New Haven) Multivariate Regression 8 / 21
Marc Mehlman Marc Mehlman
Confidence Intervals and Significance Tests
> g.lm=lm(mpg~disp+hp+wt+qsec, data=mtcars) > par(mfrow=c(2,2)) > plot(g.lm) > par(mfrow=c(1,1)) Does the linear model fit?
10 15 20 25 30 −4 −2 2 4 6 Fitted values Residuals
Chrysler Imperial Fiat 128 Toyota Corolla
−1 1 2 −1 1 2 Theoretical Quantiles Standardized residuals
Normal Q−Q
Chrysler Imperial Fiat 128 Toyota Corolla
10 15 20 25 30 0.0 0.5 1.0 1.5 Fitted values Standardized residuals
Chrysler Imperial Fiat 128 Toyota Corolla
0.0 0.1 0.2 0.3 0.4 0.5 −2 −1 1 2 Leverage Standardized residuals
1 0.5 0.5 1
Residuals vs Leverage
Chrysler Imperial Maserati Bora Toyota Corolla
Marc Mehlman (University of New Haven) Multivariate Regression 9 / 21
Marc Mehlman Marc Mehlman
Confidence Intervals and Significance Tests
> summary(g.lm) Call: lm(formula = mpg ~ disp + hp + wt + qsec, data = mtcars) Residuals: Min 1Q Median 3Q Max
1.1712 5.6468 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 27.329638 8.639032 3.164 0.00383 ** disp 0.002666 0.010738 0.248 0.80576 hp
0.015613
0.24227 wt
1.265851
0.00113 ** qsec 0.544160 0.466493 1.166 0.25362
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 2.622 on 27 degrees of freedom Multiple R-squared: 0.8351, Adjusted R-squared: 0.8107 F-statistic: 34.19 on 4 and 27 DF, p-value: 3.311e-10
Marc Mehlman (University of New Haven) Multivariate Regression 10 / 21
Marc Mehlman Marc Mehlman
Confidence Intervals and Significance Tests
Marc Mehlman (University of New Haven) Multivariate Regression 11 / 21
Marc Mehlman Marc Mehlman
ANOVA Tables for Multivariate Regression
Marc Mehlman (University of New Haven) Multivariate Regression 12 / 21
Marc Mehlman Marc Mehlman
ANOVA Tables for Multivariate Regression
def
n
def
n
def
n
def
def
Marc Mehlman (University of New Haven) Multivariate Regression 13 / 21
Marc Mehlman Marc Mehlman
ANOVA Tables for Multivariate Regression
MSE . The p–value of the above test is P(F ≥ f ) where
MSA MSE
Marc Mehlman (University of New Haven) Multivariate Regression 14 / 21
Marc Mehlman Marc Mehlman
ANOVA Tables for Multivariate Regression
SSA SSTOT . The multiple
adj = 1 −
Marc Mehlman (University of New Haven) Multivariate Regression 15 / 21
Marc Mehlman Marc Mehlman
ANOVA Tables for Multivariate Regression > g.lm=lm(mpg~disp+hp+wt+qsec, data=mtcars) > summary(g.lm) Call: lm(formula = mpg ~ disp + hp + wt + qsec, data = mtcars) Residuals: Min 1Q Median 3Q Max
1.1712 5.6468 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 27.329638 8.639032 3.164 0.00383 ** disp 0.002666 0.010738 0.248 0.80576 hp
0.015613
0.24227 wt
1.265851
0.00113 ** qsec 0.544160 0.466493 1.166 0.25362
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 2.622 on 27 degrees of freedom Multiple R-squared: 0.8351, Adjusted R-squared: 0.8107 F-statistic: 34.19 on 4 and 27 DF, p-value: 3.311e-10
Marc Mehlman (University of New Haven) Multivariate Regression 16 / 21
Marc Mehlman Marc Mehlman
ANOVA Tables for Multivariate Regression > h.lm=lm(mpg~wt, data=mtcars) > summary(h.lm) Call: lm(formula = mpg ~ wt, data = mtcars) Residuals: Min 1Q Median 3Q Max
1.4096 6.8727 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** wt
0.5591
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 3.046 on 30 degrees of freedom Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Marc Mehlman (University of New Haven) Multivariate Regression 17 / 21
Marc Mehlman Marc Mehlman
ANOVA Tables for Multivariate Regression > anova(g.lm) Analysis of Variance Table Response: mpg Df Sum Sq Mean Sq F value Pr(>F) disp 1 808.89 808.89 117.6500 2.415e-11 *** hp 1 33.67 33.67 4.8965 0.035553 * wt 1 88.50 88.50 12.8724 0.001302 ** qsec 1 9.36 9.36 1.3607 0.253616 Residuals 27 185.64 6.88
0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Marc Mehlman (University of New Haven) Multivariate Regression 18 / 21
Marc Mehlman Marc Mehlman
ANOVA Tables for Multivariate Regression
Marc Mehlman (University of New Haven) Multivariate Regression 19 / 21
Marc Mehlman Marc Mehlman
Chapter #11 R Assignment
Marc Mehlman (University of New Haven) Multivariate Regression 20 / 21
Marc Mehlman Marc Mehlman
Chapter #11 R Assignment
First enter into R:
> class(state.x77) # "lm" needs a data.frame not a matrix [1] "matrix" > st = as.data.frame(state.x77) # make state.x77 a data.frame > class(st) # "st" is a data.frame [1] "data.frame" > colnames(st)[4] = "Life.Exp" # no spaces in variable names > colnames(st)[6] = "HS.Grad" # no spaces in variable names 1
Do a multivariate regression with “Life.Exp” as the response variable and “Population”, “Income”, “Illiteracy”, “Murder”, “HS.Grad”, “Frost” and “Area” as explanatory variables. (a) Show that the multivariate regression linear model fits this data. (b) What is R2 and adjusted–R2? (c) Which explanatory variables are relevant at the 0.05 significance level? (d) Find 95% confidence intervals for the y–intercepts and for each of the coefficients to the explanatory variables.
2
Do another multivariate regression, but only with explanatory variables “Murder” and “HS.Grad”. (a) Show that the multivariate regression linear model fits this data. (b) What is R2 and adjusted–R2? (c) Find 95% confidence intervals for the y–intercepts and for each of the coefficients to the explanatory variables.
3
Comparing the adjusted–R2 in the above two problems, what do you conclude?
Marc Mehlman (University of New Haven) Multivariate Regression 21 / 21