week 3 finish slr inference then multiple linear
play

Week 3: Finish SLR Inference Then Multiple Linear Regression I. - PowerPoint PPT Presentation

BUS41100 Applied Regression Analysis Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction Intervals II. Polynomials, log transformation, categorical variables, interactions & main effects Max H.


  1. BUS41100 Applied Regression Analysis Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction Intervals II. Polynomials, log transformation, categorical variables, interactions & main effects Max H. Farrell The University of Chicago Booth School of Business

  2. Quick Recap I. We drew a line through a cloud of points ˆ Y − ˆ Y = b 0 + b 1 X & Y = e ◮ It was a good line because: 1. It minimized the SSE 2. It extracted all linear information 3. It implemented the model II. The regression model helped us understand uncertainty ε ∼ N (0 , σ 2 ) Y = β 0 + β 1 X + ε, ◮ Sampling distribution: estimates change as data changes σ 2 b 1 ∼ N ( β 1 , σ 2 σ 2 b 1 ) b 1 = ( n − 1) s 2 x 1

  3. Our work today I. Finish SLR ◮ Put sampling distribution to work ◮ Communicable summaries of uncertainty II. Multiple Linear Regression ε ∼ N (0 , σ 2 ) Y = β 0 + β 1 X 1 + β 2 X 2 + ε, ◮ Everything carries over from SLR ◮ Interpretation requires one extra piece 2

  4. Summarizing the sampling distribution Remember the two types of regression questions: 1. Model 2. Prediction ˆ Y = β 0 + β 1 X + ε Y = b 0 + b 1 X Y = b 0 + b 1 X + e 1. Properties of β k ◮ Sign: Does Y go up when X goes up? ◮ Magnitude: By how much? ⇒ A confidence interval captures uncertainty about β 2. Predicting Y ◮ Best guess for Y given (or “conditional on”) X . ⇒ A prediction interval captures uncertainty about Y 3

  5. Confidence Intervals and Testing Suppose we think that the true β j is equal to some value β 0 j (often 0). Does the data support that guess? We can rephrase this in terms of competing hypotheses. (Null) H 0 : β j = β 0 j (Alternative) H 1 : β j � = β 0 j Our hypothesis test will either reject or fail to reject the null hypothesis ◮ If the hypothesis test rejects the null hypothesis, we have statistical support for our claim ◮ Gives only a “yes” or “no” answer! ◮ You choose the “probability” of false rejection: α 4

  6. We use b j for our test about β j . ◮ Reject H 0 if b j is “far” from β 0 j ; assume H 0 when close ◮ What we really care about is: how many standard errors b j is away from β 0 j ◮ standard error = s b 1 , cf σ b 1 z b j = b j − β 0 j H 0 ∼ N (0 , 1) . The t -statistic is this test is s b j “Big” | z β j | makes our guess β 0 j look silly ⇒ reject ◮ If H 0 is true, then P [ | z b j | > 2] < 0 . 05 = α b j − β 0 � � j � � β 0 | z β j | = � > 2 ⇔ j �∈ ( b j ± 2 s b j ) But: � � s b j � 5

  7. Confidence intervals Since b j ∼ N ( β j , σ 2 b j ) , � z α/ 2 < b j − β j � 1 − α = P < z 1 − α/ 2 s b j � � = P β j ∈ ( b j ± z α/ 2 s b j ) Why should we care about confidence intervals? ◮ The confidence interval completely captures the information in the data about the parameter. ◮ Center is your estimate ◮ Length is how sure you are about your estimate ◮ Any value outside would be rejected by a test! 6

  8. Real life or pretend? � � β 1 ∈ ( b 1 ± 2 σ b 1 ) = 95% P or � � β 1 ∈ ( b 1 ± 2 σ b 1 ) P = 0 or 1 ? True β 1 7

  9. Level, Size, and p -values The p -value is P [ | Z | > | z β j | ] . ◮ Test with size/level = p -value almost rejects ◮ CI of level 1 − ( p -value ) just excludes | z β j | Z α 2 Z1 − α 2 −|z β j | |z β j | 1 − α p/2 p/2 Level α p−value 8

  10. Example: revisit the CAPM regression for the Windsor fund. Does Windsor have a non-zero intercept? (i.e., does it make/lose money independent of the market?). H 0 : β 0 = 0 H 1 : β 0 � = 0 ◮ Recall: the intercept estimate b 0 is the stock’s “alpha” > summary(windsor.reg) ## output abbreviated Estimate Std. Error t value Pr(>|t|) (Intercept) 0.003647 0.001409 2.588 0.0105 * mfund$valmrkt 0.935717 0.029150 32.100 <2e-16 *** > 2*pnorm(-abs(0.003647/.001409)) [1] 0.009643399 We reject the null at α = . 05 , Windsor does have an “alpha” over the market. ◮ Why set α = . 05 ? What about at α = 0 . 01 ? 9

  11. Now let’s ask whether or not Windsor moves in a different way than the market (e.g., is it more conservative?). ◮ Recall that the estimate of the slope b 1 is the “beta” of the stock. This is a rare case where the null hypothesis is not zero: H 0 : β 1 = 1 , Windsor is just the market (+ alpha). H 1 : β 1 � = 1 , Windsor softens or exaggerates market moves. This time, R’s output t / p values are not what we want (why?). > summary(windsor.reg) ## output abbreviated Estimate Std. Error t value Pr(>|t|) (Intercept) 0.003647 0.001409 2.588 0.0105 * mfund$valmrkt 0.935717 0.029150 32.100 <2e-16 *** 10

  12. But we can get the appropriate values easily: ◮ Test and p -value: > b1 <- 0.935717; sb1 <- 0.029150 > zb1 <- (b1 - 1)/sb1 [1] -2.205249 > 2*pnorm(-abs(zb1)) [1] 0.02743665 ◮ Confidence Interval > confint(windsor.reg, level=0.95) 2.5 % 97.5 % (Intercept) 0.000865657 0.006428105 mfund$valmrkt 0.878193149 0.993240873 Reject at α = . 05 , so Windsor softens than the market. ◮ What about other values of α ? confint(windsor.reg, level=0.99) confint(windsor.reg, level=(1-2*pt(-abs(zb1), df=178))) 11

  13. Forecasting & Prediction Intervals The conditional forecasting problem: ◮ Given covariate X f and sample data { X i , Y i } n i =1 , predict the “future” observation Y f . The solution is to use our LS fitted value: ˆ Y f = b 0 + b 1 X f . ◮ That’s the easy bit. The hard (and very important!) part of forecasting is assessing uncertainty about our predictions. One method is to specify a prediction interval ◮ a range of Y values that are likely, given an X value. 12

  14. The least squares line is a prediction rule: Read ˆ Y off the line for a new X . ◮ It’s not a perfect prediction: ˆ Y is what we expect. ● 160 ● ˆ Y 140 ● 120 ● price ● ● 100 ● ● ● ● ● ● 80 ● ● 60 ● 1.0 1.5 2.0 2.5 3.0 3.5 X size 13

  15. If we use ˆ Y f , our prediction error has two pieces e f = Y f − ˆ Y f = Y f − b 0 − b 1 X f Y f E [ Y f | X f ] = β 0 + β 1 X f { ε e f } fit error b 0 + b 1 X f ˆ Y f X f 14

  16. We can decompose e f into two sources of error: ◮ Inherent idiosyncratic randomness (due to ε ). ◮ Estimation error in the intercept and slope (i.e., discrepancy between our line and “the truth”). Y f − ˆ Y f = ( Y f − E [ Y f | X f ]) + E [ Y f | X f ] − ˆ e f = Y f ε f + ( E [ Y f | X f ] − ˆ = Y f ) = ε f + ( β 0 − b 0 ) + ( β 1 − b 1 ) X f . The variance of our prediction error is thus Y f ) = σ 2 + var (ˆ var ( e f ) = var ( ε f ) + var ( E [ Y f | X f ] − ˆ Y f ) 15

  17. From the sampling distributions derived earlier, var (ˆ Y f ) is var ( b 0 ) + X 2 var ( b 0 + b 1 X f ) = f var ( b 1 ) + 2 X f cov( b 0 , b 1 ) n + ( X f − ¯ X ) 2 � 1 � σ 2 = . ( n − 1) s 2 x Replacing σ 2 with s 2 gives the standard error for ˆ Y f . And hence the variance of our predictive error is n + ( X f − ¯ X ) 2 � 1 + 1 � var ( e f ) = σ 2 . ( n − 1) s 2 x 16

  18. Putting it all together, we have that n + ( X f − ¯ X ) 2 � � 1 + 1 �� ˆ Y f , σ 2 Y f ∼ N ( n − 1) s 2 x A (1 − α )100% confidence/prediction interval for Y f is thus � � � n + ( X f − ¯ 1 + 1 X ) 2 b 0 + b 1 X f ± z α/ 2 × s . ( n − 1) s 2 x 17

  19. Looking closer at what we’ll call � n + ( X f − ¯ X ) 2 1 + 1 � s 2 + s 2 s pred = s = fit . ( n − 1) s 2 x A large predictive error variance (high uncertainty) comes from ◮ Large s (i.e., large ε ’s). ◮ Small n (not enough data). ◮ Small s x (not enough observed spread in covariates). ◮ Large ( X f − ¯ X ) . The first three are familiar... what about the last one? 18

  20. For X f far from our ¯ X , the space between lines is magnified ... Y ( X f − ¯ X ) True small Line ( ¯ X, ¯ Y ) point of means Estimated Line ( X f − ¯ X ) large X 19

  21. ⇒ The prediction (conf.) interval needs to widen away from ¯ X 20

  22. Returning to our housing data for an example ... > Xf <- data.frame(size=c(mean(size), 2.5, max(size))) > cbind(Xf,predict(reg, newdata=Xf, interval="prediction")) size fit lwr upr 1 1.853333 104.4667 72.92080 136.0125 2 2.500000 127.3496 95.18501 159.5142 3 3.500000 162.7356 127.36982 198.1013 ◮ interval="prediction" gives lwr and upr , otherwise we just get fit ◮ s pred is not shown in this output 21

  23. We can get s pred from the predict output. > p <- predict(reg, newdata=Xf, se.fit=TRUE) > s <- p$residual.scale > sfit <- p$se.fit > spred <- sqrt(s^2+sfit^2) > b <- reg$coef > b[1] + b[2]*Xf[1,]+ c(0,-1, 1)*qnorm(.975)*spred[1] [,1] [,2] [,3] [1,] 104.4667 75.84713 133.0862 > b[1] + b[2]*Xf[1,]+ c(0,-1, 1)*qt(.975, df=n-2)*spred[1] [1,] 104.4667 72.92080 136.0125 ◮ Or, we can calculate it by hand [see R code]. ————— � s 2 + s 2 Notice that s pred = fit ; you need to square before summing. 22

  24. Summary Uncertainty matters! Captured by the Sampling Distribution. ◮ Quantifies uncertainty from the data ◮ . . . only within the model, assumed before we see data. ◮ Which factors matter for signal-to-noise? Reporting ◮ Confidence Interval: completely captures the information in the data about the parameter. ◮ Testing/ p -value: only a yes/no answer. ( Don’t abuse p -values ) 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend