STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin - - PowerPoint PPT Presentation

stat 213 simple linear regression i
SMART_READER_LITE
LIVE PREVIEW

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin - - PowerPoint PPT Presentation

Outline Simple Linear Regression Model STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline Simple Linear Regression Model Outline Simple Linear Regression Model Outline Simple Linear Regression


slide-1
SLIDE 1

Outline Simple Linear Regression Model

STAT 213 Simple Linear Regression I

Colin Reimer Dawson

Oberlin College

5 October 2016

slide-2
SLIDE 2

Outline Simple Linear Regression Model

Outline

Simple Linear Regression Model

slide-3
SLIDE 3

Outline Simple Linear Regression Model

The Project

Find a relationship between a response variable (Y ) and one or more predictor/explanatory variables, X1, . . . , Xk. Y = f(X) + ε DATA = PATTERN + IDIOSYNCRACIES

  • One vs two means: Y quantitative, X categorical
  • Simple Linear Regression: Both quantitative (but still just
  • ne X)
slide-4
SLIDE 4

Outline Simple Linear Regression Model

Examples

  • Y = Home Price

X = Home size

  • Y = Exam score

X = Hours spent studying

  • Y = State % in poverty

X = State % with no health insurance

  • Y = SAT score

X = Family income

slide-5
SLIDE 5

Outline Simple Linear Regression Model

The Simple Linear Model

Y = β0 + β1 · X + ε aka Response = Intercept + Slope · Predictor + Random Error

Standard form: Assume the ε ∼ N(0, σε) and are independent Parameters to estimate: β0, β1 and σε

slide-6
SLIDE 6

Outline Simple Linear Regression Model

SLM Visualized

slide-7
SLIDE 7

Outline Simple Linear Regression Model

SLM With Data

slide-8
SLIDE 8

Outline Simple Linear Regression Model

Presidential Approval and Re-election Margin

  • 30

40 50 60 70 −10 5 10 20 Incumbent Approval (%) Reelection Margin (%)

slide-9
SLIDE 9

Outline Simple Linear Regression Model

Conditions for SLM

Pattern

  • 1. Mean Y at each X is a linear function of X:

µY (X) = f(X) = β0 + β1X Residuals

  • 2. Zero mean: Residuals centered at 0
  • 3. Constant variance: Same variability at all X

(Homoskedasticity)

  • 4. Independence: No relationship among errors
  • 5. Normality (for standard form): At each X, Y s are

Normally distributed

slide-10
SLIDE 10

Outline Simple Linear Regression Model

Exploring violations of conditions

https://gallery.shinyapps.io/slr_diag/

slide-11
SLIDE 11

Outline Simple Linear Regression Model

Re-election Margin: Two Models

  • 30

40 50 60 70 −10 5 10 20 Incumbent Approval (%) Reelection Margin (%)

  • 30

40 50 60 70 −10 5 10 20 Incumbent Approval (%) Reelection Margin (%)

Figure: Left: Constant Model Y = β0 + ε; Right: Best Fit Linear Model: Y = β0 + β1X + ε

slide-12
SLIDE 12

Outline Simple Linear Regression Model

FIT: What parameters?

The Simple Linear Model

Y = β0 + β1 · X + ε aka Response = Intercept + Slope · Predictor + Random Error

Standard form: Assume the ε ∼ N(0, σε) and are independent Parameters to estimate: β0, β1 and σε

slide-13
SLIDE 13

Outline Simple Linear Regression Model

Minimizing Sum of Squared Residuals

  • From data, pick estimates ˆ

β0 and ˆ β1 to define an estimated f(X) (can write ˆ f(X)). Defines prediction equation: ˆ Yi = ˆ f(Xi) = ˆ β0 + ˆ β1Xi

  • If we want ˆ

f(Xi) to represent mean Y at Xi, choose ˆ β0 and ˆ β1 to minimize sum of squared residuals: SSR =

  • (Yi − ˆ

Yi)2

  • How? Multivariable calculus gives us formulae:

ˆ β1 = (Xi − ¯ X)(Yi − ¯ Y ) (Xi − ¯ X)2 ˆ β0 = ¯ Y − ˆ β1 ¯ X

slide-14
SLIDE 14

Outline Simple Linear Regression Model

Re-election Margin: Two Models

  • 30

40 50 60 70 −10 5 10 20 Incumbent Approval (%) Reelection Margin (%)

  • 30

40 50 60 70 −10 5 10 20 Incumbent Approval (%) Reelection Margin (%)

Figure: Left: Best fit Constant Model Y = ¯ Y + ˆ ε; Right: Best Fit Linear Model: Y = ˆ β0 + ˆ β1X + ˆ ε

slide-15
SLIDE 15

Outline Simple Linear Regression Model

Estimating σε

  • The standard estimate of the population standard

deviation of residuals, σε is (almost) the sample standard deviation of the residuals ˆ σε =

  • SSR

n − 2 = (Yi − ˆ Yi)2 n − 2

  • We usually have n − 1 in the denominator when

computing sample variance. Why n − 2 here?

slide-16
SLIDE 16

Outline Simple Linear Regression Model

ASSESS: Check conditions with residual plots

https://gallery.shinyapps.io/slr_diag/