Week 3: Finish SLR Inference Then Multiple Linear Regression I. - PowerPoint PPT Presentation

BUS41100 Applied Regression Analysis Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction Intervals II. Polynomials, log transformation, categorical variables, interactions & main effects Max H. Farrell The University of Chicago Booth School of Business

Quick Recap I. We drew a line through a cloud of points ˆ Y − ˆ Y = b 0 + b 1 X & Y = e ◮ It was a good line because: 1. It minimized the SSE 2. It extracted all linear information 3. It implemented the model II. The regression model helped us understand uncertainty ε ∼ N (0 , σ 2 ) Y = β 0 + β 1 X + ε, ◮ Sampling distribution: estimates change as data changes σ 2 b 1 ∼ N ( β 1 , σ 2 σ 2 b 1 ) b 1 = ( n − 1) s 2 x 1

Our work today I. Finish SLR ◮ Put sampling distribution to work ◮ Communicable summaries of uncertainty II. Multiple Linear Regression ε ∼ N (0 , σ 2 ) Y = β 0 + β 1 X 1 + β 2 X 2 + ε, ◮ Everything carries over from SLR ◮ Interpretation requires one extra piece 2

Summarizing the sampling distribution Remember the two types of regression questions: 1. Model 2. Prediction ˆ Y = β 0 + β 1 X + ε Y = b 0 + b 1 X Y = b 0 + b 1 X + e 1. Properties of β k ◮ Sign: Does Y go up when X goes up? ◮ Magnitude: By how much? ⇒ A confidence interval captures uncertainty about β 2. Predicting Y ◮ Best guess for Y given (or “conditional on”) X . ⇒ A prediction interval captures uncertainty about Y 3

Confidence Intervals and Testing Suppose we think that the true β j is equal to some value β 0 j (often 0). Does the data support that guess? We can rephrase this in terms of competing hypotheses. (Null) H 0 : β j = β 0 j (Alternative) H 1 : β j � = β 0 j Our hypothesis test will either reject or fail to reject the null hypothesis ◮ If the hypothesis test rejects the null hypothesis, we have statistical support for our claim ◮ Gives only a “yes” or “no” answer! ◮ You choose the “probability” of false rejection: α 4

We use b j for our test about β j . ◮ Reject H 0 if b j is “far” from β 0 j ; assume H 0 when close ◮ What we really care about is: how many standard errors b j is away from β 0 j ◮ standard error = s b 1 , cf σ b 1 z b j = b j − β 0 j H 0 ∼ N (0 , 1) . The t -statistic is this test is s b j “Big” | z β j | makes our guess β 0 j look silly ⇒ reject ◮ If H 0 is true, then P [ | z b j | > 2] < 0 . 05 = α b j − β 0 � � j � � β 0 | z β j | = � > 2 ⇔ j �∈ ( b j ± 2 s b j ) But: � � s b j � 5

Confidence intervals Since b j ∼ N ( β j , σ 2 b j ) , � z α/ 2 < b j − β j � 1 − α = P < z 1 − α/ 2 s b j � � = P β j ∈ ( b j ± z α/ 2 s b j ) Why should we care about confidence intervals? ◮ The confidence interval completely captures the information in the data about the parameter. ◮ Center is your estimate ◮ Length is how sure you are about your estimate ◮ Any value outside would be rejected by a test! 6

Real life or pretend? � � β 1 ∈ ( b 1 ± 2 σ b 1 ) = 95% P or � � β 1 ∈ ( b 1 ± 2 σ b 1 ) P = 0 or 1 ? True β 1 7

Level, Size, and p -values The p -value is P [ | Z | > | z β j | ] . ◮ Test with size/level = p -value almost rejects ◮ CI of level 1 − ( p -value ) just excludes | z β j | Z α 2 Z1 − α 2 −|z β j | |z β j | 1 − α p/2 p/2 Level α p−value 8

Example: revisit the CAPM regression for the Windsor fund. Does Windsor have a non-zero intercept? (i.e., does it make/lose money independent of the market?). H 0 : β 0 = 0 H 1 : β 0 � = 0 ◮ Recall: the intercept estimate b 0 is the stock’s “alpha” > summary(windsor.reg) ## output abbreviated Estimate Std. Error t value Pr(>|t|) (Intercept) 0.003647 0.001409 2.588 0.0105 * mfund$valmrkt 0.935717 0.029150 32.100 <2e-16 *** > 2*pnorm(-abs(0.003647/.001409)) [1] 0.009643399 We reject the null at α = . 05 , Windsor does have an “alpha” over the market. ◮ Why set α = . 05 ? What about at α = 0 . 01 ? 9

Now let’s ask whether or not Windsor moves in a different way than the market (e.g., is it more conservative?). ◮ Recall that the estimate of the slope b 1 is the “beta” of the stock. This is a rare case where the null hypothesis is not zero: H 0 : β 1 = 1 , Windsor is just the market (+ alpha). H 1 : β 1 � = 1 , Windsor softens or exaggerates market moves. This time, R’s output t / p values are not what we want (why?). > summary(windsor.reg) ## output abbreviated Estimate Std. Error t value Pr(>|t|) (Intercept) 0.003647 0.001409 2.588 0.0105 * mfund$valmrkt 0.935717 0.029150 32.100 <2e-16 *** 10

But we can get the appropriate values easily: ◮ Test and p -value: > b1 <- 0.935717; sb1 <- 0.029150 > zb1 <- (b1 - 1)/sb1 [1] -2.205249 > 2*pnorm(-abs(zb1)) [1] 0.02743665 ◮ Confidence Interval > confint(windsor.reg, level=0.95) 2.5 % 97.5 % (Intercept) 0.000865657 0.006428105 mfund$valmrkt 0.878193149 0.993240873 Reject at α = . 05 , so Windsor softens than the market. ◮ What about other values of α ? confint(windsor.reg, level=0.99) confint(windsor.reg, level=(1-2*pt(-abs(zb1), df=178))) 11

Forecasting & Prediction Intervals The conditional forecasting problem: ◮ Given covariate X f and sample data { X i , Y i } n i =1 , predict the “future” observation Y f . The solution is to use our LS fitted value: ˆ Y f = b 0 + b 1 X f . ◮ That’s the easy bit. The hard (and very important!) part of forecasting is assessing uncertainty about our predictions. One method is to specify a prediction interval ◮ a range of Y values that are likely, given an X value. 12

The least squares line is a prediction rule: Read ˆ Y off the line for a new X . ◮ It’s not a perfect prediction: ˆ Y is what we expect. ● 160 ● ˆ Y 140 ● 120 ● price ● ● 100 ● ● ● ● ● ● 80 ● ● 60 ● 1.0 1.5 2.0 2.5 3.0 3.5 X size 13

If we use ˆ Y f , our prediction error has two pieces e f = Y f − ˆ Y f = Y f − b 0 − b 1 X f Y f E [ Y f | X f ] = β 0 + β 1 X f { ε e f } fit error b 0 + b 1 X f ˆ Y f X f 14

We can decompose e f into two sources of error: ◮ Inherent idiosyncratic randomness (due to ε ). ◮ Estimation error in the intercept and slope (i.e., discrepancy between our line and “the truth”). Y f − ˆ Y f = ( Y f − E [ Y f | X f ]) + E [ Y f | X f ] − ˆ e f = Y f ε f + ( E [ Y f | X f ] − ˆ = Y f ) = ε f + ( β 0 − b 0 ) + ( β 1 − b 1 ) X f . The variance of our prediction error is thus Y f ) = σ 2 + var (ˆ var ( e f ) = var ( ε f ) + var ( E [ Y f | X f ] − ˆ Y f ) 15

From the sampling distributions derived earlier, var (ˆ Y f ) is var ( b 0 ) + X 2 var ( b 0 + b 1 X f ) = f var ( b 1 ) + 2 X f cov( b 0 , b 1 ) n + ( X f − ¯ X ) 2 � 1 � σ 2 = . ( n − 1) s 2 x Replacing σ 2 with s 2 gives the standard error for ˆ Y f . And hence the variance of our predictive error is n + ( X f − ¯ X ) 2 � 1 + 1 � var ( e f ) = σ 2 . ( n − 1) s 2 x 16

Putting it all together, we have that n + ( X f − ¯ X ) 2 � � 1 + 1 �� ˆ Y f , σ 2 Y f ∼ N ( n − 1) s 2 x A (1 − α )100% confidence/prediction interval for Y f is thus � � � n + ( X f − ¯ 1 + 1 X ) 2 b 0 + b 1 X f ± z α/ 2 × s . ( n − 1) s 2 x 17

Looking closer at what we’ll call � n + ( X f − ¯ X ) 2 1 + 1 � s 2 + s 2 s pred = s = fit . ( n − 1) s 2 x A large predictive error variance (high uncertainty) comes from ◮ Large s (i.e., large ε ’s). ◮ Small n (not enough data). ◮ Small s x (not enough observed spread in covariates). ◮ Large ( X f − ¯ X ) . The first three are familiar... what about the last one? 18

For X f far from our ¯ X , the space between lines is magnified ... Y ( X f − ¯ X ) True small Line ( ¯ X, ¯ Y ) point of means Estimated Line ( X f − ¯ X ) large X 19

⇒ The prediction (conf.) interval needs to widen away from ¯ X 20

Returning to our housing data for an example ... > Xf <- data.frame(size=c(mean(size), 2.5, max(size))) > cbind(Xf,predict(reg, newdata=Xf, interval="prediction")) size fit lwr upr 1 1.853333 104.4667 72.92080 136.0125 2 2.500000 127.3496 95.18501 159.5142 3 3.500000 162.7356 127.36982 198.1013 ◮ interval="prediction" gives lwr and upr , otherwise we just get fit ◮ s pred is not shown in this output 21

We can get s pred from the predict output. > p <- predict(reg, newdata=Xf, se.fit=TRUE) > s <- p$residual.scale > sfit <- p$se.fit > spred <- sqrt(s^2+sfit^2) > b <- reg$coef > b[1] + b[2]*Xf[1,]+ c(0,-1, 1)*qnorm(.975)*spred[1] [,1] [,2] [,3] [1,] 104.4667 75.84713 133.0862 > b[1] + b[2]*Xf[1,]+ c(0,-1, 1)*qt(.975, df=n-2)*spred[1] [1,] 104.4667 72.92080 136.0125 ◮ Or, we can calculate it by hand [see R code]. ————— � s 2 + s 2 Notice that s pred = fit ; you need to square before summing. 22

Summary Uncertainty matters! Captured by the Sampling Distribution. ◮ Quantifies uncertainty from the data ◮ . . . only within the model, assumed before we see data. ◮ Which factors matter for signal-to-noise? Reporting ◮ Confidence Interval: completely captures the information in the data about the parameter. ◮ Testing/ p -value: only a yes/no answer. ( Don’t abuse p -values ) 23

Week 3: Finish SLR Inference Then Multiple Linear Regression I. - PowerPoint PPT Presentation

BUS41100 Applied Regression Analysis Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction Intervals II. Polynomials, log transformation, categorical variables, interactions & main effects Max H.

Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and

Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and

Plan for Today SLR (Simple LR) Problem with LR(0) SLR modification, follow sets strike

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Mathematical approximation Jo Hardin Professor, Pomona College DataCamp Inference for Linear

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Market Impact of TLAC Requirements FIG DCM Bank Capital Solutions December 17, 2015 RWA vs. SLR

LR(0) Drawbacks Simple LR (SLR) Consider the unambiguous augmented grammar: New algorithm for

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and

if-then-else Statements if-then Statements General form of an if-then statement: if [boolean

Multiple Source Multiple Destination Topology Inference Destination Topology Inference using

Galatians: week 3 Galatians 3:1-29 Week 1: Galatians 1:1-2:14 Week 2: Galatians 2:15-21 Week 3:

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Welcome to the course! Jo Hardin Professor, Pomona College DataCamp Inference for Linear

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Implementing the LeybourneTaylor test for seasonal unit roots in Stata Christopher F Baum

Discrete Probabilistic Programming from First Principles Guy Van den Broeck The Fourth

The Finite-Set Independence Criterion (FSIC) Zoltn Szab Arthur Gretton Wittawat Jitkrittum

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 8 Jan-Willem van de Meent (

WINLAB Contact: Liang Xiao lxiao@winlab.rutgers.edu With Profs. Larry Greenstein, Wade Trappe,

and the Tim ime-based Reconstruction Jenny Regina PANDA CM, Computing Session GSI, 24-28 June

Inference following aggregate-level hypothesis testing in large scale genomic data Ruth Heller