Welcome Back! EDUC 7610 Chapter 2 The Simple Regression Model - - PowerPoint PPT Presentation

welcome back
SMART_READER_LITE
LIVE PREVIEW

Welcome Back! EDUC 7610 Chapter 2 The Simple Regression Model - - PowerPoint PPT Presentation

Welcome Back! EDUC 7610 Chapter 2 The Simple Regression Model Fall 2018 Tyson S. Barrett, PhD ! " = $ % + $ ' ( '" + ) " Lets start with Scatterplots Each point represents a single 6 observation The red line is the line


slide-1
SLIDE 1

Welcome Back!

slide-2
SLIDE 2

Fall 2018 Tyson S. Barrett, PhD

EDUC 7610 Chapter 2

!" = $% + $'('" + )"

The Simple Regression Model

slide-3
SLIDE 3

Let’s start with Scatterplots

Each point represents a single

  • bservation

The red line is the line of best fit The line happens to go through each Conditional Mean

  • It goes through the mean at each value
  • f x
  • E.g. When x = 1, mean of y = 2.5 (the

conditional mean of y at x = 1 is 2.5)

2 4 6 2 4 6

x y

slide-4
SLIDE 4

Conditional Means and Prediction

The open circles are where the Conditional Means are In this case, all conditional means run along the line

  • When this happens (or approx.

happens) we have linearity

The line is the linear model’s predicted level of y for each level of x

2 4 6 2 4 6

x y

slide-5
SLIDE 5

That line is the line that minimizes the error between the predicted values and the observed values

2 4 6 2 4 6

x y

Why is that line the “best”?

!!"#$%&'() = +

%,- .

( 0

% −2 % )4 = + %,- .

5%

4

i.e., “residual” or “error”

This approach is called Ordinary Least Squares (OLS) regression

slide-6
SLIDE 6

Slope = !"

Features of the “Best” Line (Simple Regression)

!" = $

%&" '

(% − * (% +

% − *

+

%

(% − * (% , = -./((+) 234(()

Intercept = !5

  • ./((+) = $

%&" '

(% − * (% +

% − *

+

%

6

234(() = $

%&" '

(% − * (% , 6

!5 = 7 + − !" 7 (

The Line (8 +

%) = !5 + !"(%

slide-7
SLIDE 7

The “Best” Line and Correlation !" = $

%&

'& '(

We unstandardized the $

%& by )* )+

$

%& has no scale but !" is in the units

  • f the outcome

$

%& is affected by the range of the

variables measured !" is only affected by variables that influence both X and Y while $

%& is

affected by variables that only influence Y !" is the effect of X on Y while $

%& is

the relative importance of X on Y

slide-8
SLIDE 8

We unstandardized the !

"# by $% $&

That is, !

"# is the standardized version of '(

If we standardize our variables before using regression, both !

"# and '( are the same

+" = -. − 0

  • 12

Why?

slide-9
SLIDE 9

!

"# has a range of -1 to 1

$% is in the range of the outcome (approximately),

  • ften is from –∞ to ∞

“For a one unit increase in X there is an associated increase of $% units in the outcome”

!

"# has no scale but $% is in the units of the

  • utcome
slide-10
SLIDE 10

The value of !" is not affected by the range of X (the significance is…) #

$% is affected by having a less-than-representative

range of X

Why? #

$% is affected by the range of the variables

measured

slide-11
SLIDE 11

!

"# is affected by the range of the variables

measured

2 4 6 1 2 3 4

x y

2 4 6 2 4 6

x y

slide-12
SLIDE 12

!

"# is a measure of relative importance compared to

  • ther variables
  • If other variables are important, !

"# will be relatively smaller

$% is a measure of the effect of X on Y and therefore shouldn’t change much based on the range of X

  • The standard error is affected though (we’ll discuss later)

$% is only affected by variables that influence both X and Y while !

"# is

affected by variables that only influence Y $% is the effect of X on Y while !

"# is the

relative importance of X on Y

slide-13
SLIDE 13

The estimate of !" depends on minimizing the residuals so they are kind of a big deal

Back to Residuals

##$%&'()*+ = -

'." /

( 1

' −3

1

' )5 = - '." /

6'

5

−2 2 −2 −1 1 2

x y

slide-14
SLIDE 14

Our !

" values can be separated into

three parts:

Back to Residuals

!

" = $

! + & !

" − $

! + !

" − &

!

"

−2 2 −2 −1 1 2

x y

Explained component Unexplained component (residuals) The same for everyone (a constant)

slide-15
SLIDE 15

Our !

" values can be separated into

three parts:

Back to Residuals

!

" = $

! + & !

" − $

! + !

" − &

!

"

Explained component Unexplained component (residuals) The same for everyone (a constant)

−2 2 −2 −1 1 2

x y

slide-16
SLIDE 16

Properties of the Residuals

  • 1. The mean is exactly zero.
  • 2. The correlation with X is exactly zero.
  • 3. The variance is:

Var Y. X = Var(Y)(1 − r,-

. )

Var Y. X Var(Y) = (1 − r,-

. )

The proportion of variance in Y not explained by X

slide-17
SLIDE 17
  • 1. The mean is exactly zero.
  • 2. The correlation with X is exactly zero.
  • 3. The variance is:

Var Y. X = Var(Y)(1 − r,-

. )

Properties of the Residuals

Var Y. X Var(Y) = (1 − r,-

. )

The proportion of variance in Y not explained by X

r,-

. is the proportion of

variance in Y explained by X

slide-18
SLIDE 18
  • 1. Partial relationships because the residual is

what is remaining in Y after adjusting for X

  • 2. Residual analysis to detect anomalies
  • 3. Detect non-linearities
  • 4. Assess the homoskedasticity assumption

Residuals tell us stuff

slide-19
SLIDE 19