Simple Linear Regression Recall: A regression model describes how a - - PowerPoint PPT Presentation

simple linear regression
SMART_READER_LITE
LIVE PREVIEW

Simple Linear Regression Recall: A regression model describes how a - - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Simple Linear Regression Recall: A regression model describes how a dependent variable (or response ) Y is affected, on average, by one or more


slide-1
SLIDE 1

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Simple Linear Regression

Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates) x1, x2, . . . , xk.

1 / 20 Simple Linear Regression Introduction

slide-2
SLIDE 2

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The Straight-Line Probabilistic Model

Simplest case of a regression model: One independent variable, k = 1, x1 ≡ x; Linear dependence; Model equation: E(Y ) = β0 + β1x,

  • r equivalently

Y = β0 + β1x + ǫ.

2 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

slide-3
SLIDE 3

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Interpreting the parameters: β0 is the intercept (so called because it is where the graph of y = β0 + β1x meets the y-axis x = 0); β1 is the slope; that is, the change in E(y) as x is changed to x + 1. Note: if β1 = 0, x has no effect on y; that will often be an interesting hypothesis to test.

3 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

slide-4
SLIDE 4

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Advertising and Sales example x = monthly advertising expenditure, in hundreds of dollars; y = monthly sales revenue, in thousands of dollars; β0 = expected revenue with no advertising; β1 = expected revenue increase per $100 increase in advertising, in thousands of dollars. Sample data for five months: Advertising 1 2 3 4 5 Revenue 1 1 2 2 4

4 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

slide-5
SLIDE 5

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

What do these data tell about β0 and β1?

  • 1

2 3 4 5 1 2 3 4 x y

Advertising and revenue scatterplot

5 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

slide-6
SLIDE 6

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

We could try various values of β0 and β1. For given values of β0 and β1, we get predictions pi = β0 + β1xi, i = 1, 2, 3, 4, 5. The difference betweem the observed value yi and the prediction pi is the residual ri = yi − pi, i = 1, 2, 3, 4, 5. A good choice of β0 and β1 gives accurate predictions, and generally small residuals.

6 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

slide-7
SLIDE 7

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

One candidate line (β0 = −0.1, β1 = 0.7):

  • 1

2 3 4 5 1 2 3 4 x y

Advertising and revenue with candidate line

7 / 20 Simple Linear Regression The Straight-Line Probabilistic Model

slide-8
SLIDE 8

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Fitting the Model

How to measure the overall size of the residuals? Most common measure (but not the only possibility): sum of squares

  • f residuals
  • r 2

i =

  • (yi − pi)2

=

  • {yi − (β0 + β1xi)}2

= S(β0, β1). The least squares line is the one with the smallest sum of squares. Note: the least squares line has the property that ri = 0; Definition 3.1 (page 95) does not need to impose that as a constraint.

8 / 20 Simple Linear Regression Fitting the Model

slide-9
SLIDE 9

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The least squares estimates of β0 and β1 are the coefficients of the least squares line. Some algebra shows that the least squares estimates are ˆ β1 = (xi − ¯ x)(yi − ¯ y) (xi − ¯ x)2 = xiyi − n¯ x ¯ y x2

i − n¯

x2 and ˆ β0 = ¯ y − ˆ β1¯ x. With a little luck, you will never need to use these formulæ.

9 / 20 Simple Linear Regression Fitting the Model

slide-10
SLIDE 10

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Other criteria

Why square the residuals? We could use least absolute deviations estimates, minimizing S1(β0, β1) =

  • |yi − (β0 + β1xi)|.

Convenience: we have equations for the least squares estimates, but to find the least absolute deviations estimates we have to solve a linear programming problem. Optimality: least squares estimates are BLUE if the errors ǫ are uncorrelated with constant variance, and MVUE if additionally ǫ is normal.

10 / 20 Simple Linear Regression Fitting the Model

slide-11
SLIDE 11

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Model Assumptions

The least squares line gives point estimates of β0 and β1. These estimates are always unbiased. To use the other forms of statistical inference: interval estimates, such as confidence intervals; hypothesis tests; we need some assumptions about the random errors ǫ.

11 / 20 Simple Linear Regression Model Assumptions

slide-12
SLIDE 12

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

1

Zero mean: E(ǫi) = 0; as noted earlier, this is not really an assumption, but a consequence of the definition ǫ = Y − E(Y ).

2

Constant variance: V (ǫi) = σ2; this is a nontrivial assumption,

  • ften violated in practice.

3

Normality: ǫi ∼ N(0, σ2); this is also a nontrivial assumption, always violated in practice, but sometimes a useful approximation.

4

Independence: ǫi and ǫj are statistically independent; another nontrivial assumption, often true in practice, but typically violated with time series and spatial data.

12 / 20 Simple Linear Regression Model Assumptions

slide-13
SLIDE 13

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Notes: Assumptions 2 and 4 are the conditions under which least squares estimates are BLUE (Best Linear Unbiased Estimators); Assumptions 2, 3, and 4 are the conditions under which least squares estimates are MVUE (Minimum Variance Unbiased Estimators).

13 / 20 Simple Linear Regression Model Assumptions

slide-14
SLIDE 14

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Estimating σ2

Recall that σ2 is the variance of ǫi, which we have assumed to be the same for all i. That is, σ2 = V (ǫi) = V [Yi − E(Yi)] = V [Yi − (β0 + β1xi)], i = 1, 2, . . . , n. We observe Yi = yi and xi; if we knew β0 and β1, we would estimate σ2 by 1 n

  • {yi − (β0 + β1xi)}2 = 1

nS(β0, β1).

14 / 20 Simple Linear Regression An Estimator of σ2

slide-15
SLIDE 15

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

We do not know β0 and β1, but we have least squares estimates ˆ β0 and ˆ β1. So we could use S

  • ˆ

β0, ˆ β1

  • as an approximation to S(β0, β1).

But we know that S

  • ˆ

β0, ˆ β1

  • < S(β0, β1), so

1 nS

  • ˆ

β0, ˆ β1

  • would be a biased estimate of σ2.

15 / 20 Simple Linear Regression An Estimator of σ2

slide-16
SLIDE 16

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

We can show that, under Assumptions 2 and 4, E

  • S
  • ˆ

β0, ˆ β1

  • = (n − 2)σ2.

So s2 = 1 n − 2S

  • ˆ

β0, ˆ β1

  • =

1 n − 2

  • (yi − ˆ

yi)2, where ˆ yi = ˆ β0 + ˆ β1xi, is an unbiased estimate of σ2. This is sometimes written s2 = Mean Square for Error = MSE = Sum of Squares for Error degrees of freedom for Error = SSE dfE .

16 / 20 Simple Linear Regression An Estimator of σ2

slide-17
SLIDE 17

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Inferences about the line

We are often interested in the question of whether x has any effect

  • n E(Y ).

Since E(Y ) = β0 + β1x, the independent variable x has some effect whenever β1 = 0. So we need to test the null hypothesis H0 : β1 = 0.

17 / 20 Simple Linear Regression Making Inferences About the Slope β1

slide-18
SLIDE 18

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

We also need to construct a confidence interval for β1, to indicate how precisely we know its value. For both purposes, we need the standard error: σˆ

β1 =

σ √SSxx , where SSxx =

  • (xi − ¯

x)2. As always, since σ is unknown, we replace it by its estimate s, to get the estimated standard error ˆ σˆ

β1 =

s √SSxx .

18 / 20 Simple Linear Regression Making Inferences About the Slope β1

slide-19
SLIDE 19

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

A confidence interval for β1 is ˆ β1 ± tα/2,n−2 × ˆ σˆ

β1.

Note that we use the t-distribution with n − 2 degrees of freedom, because that is the degrees of freedom associated with s2. To test H0 : β1 = 0, we use the test statistic t = ˆ β1 ˆ σˆ

β1

, and reject H0 at the significance level α if |t| > tα/2,n−2.

19 / 20 Simple Linear Regression Making Inferences About the Slope β1

slide-20
SLIDE 20

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Compare the Confidence Interval and the Hypothesis Test Note that we reject H0 if and only if the corresponding confidence interval does not include 0.

20 / 20 Simple Linear Regression Making Inferences About the Slope β1