Welcome Back! EDUC 7610 Chapter 2 The Simple Regression Model - - PowerPoint PPT Presentation
Welcome Back! EDUC 7610 Chapter 2 The Simple Regression Model - - PowerPoint PPT Presentation
Welcome Back! EDUC 7610 Chapter 2 The Simple Regression Model Fall 2018 Tyson S. Barrett, PhD ! " = $ % + $ ' ( '" + ) " Lets start with Scatterplots Each point represents a single 6 observation The red line is the line
Fall 2018 Tyson S. Barrett, PhD
EDUC 7610 Chapter 2
!" = $% + $'('" + )"
The Simple Regression Model
Let’s start with Scatterplots
Each point represents a single
- bservation
The red line is the line of best fit The line happens to go through each Conditional Mean
- It goes through the mean at each value
- f x
- E.g. When x = 1, mean of y = 2.5 (the
conditional mean of y at x = 1 is 2.5)
2 4 6 2 4 6
x y
Conditional Means and Prediction
The open circles are where the Conditional Means are In this case, all conditional means run along the line
- When this happens (or approx.
happens) we have linearity
The line is the linear model’s predicted level of y for each level of x
2 4 6 2 4 6
x y
That line is the line that minimizes the error between the predicted values and the observed values
2 4 6 2 4 6
x y
Why is that line the “best”?
!!"#$%&'() = +
%,- .
( 0
% −2 % )4 = + %,- .
5%
4
i.e., “residual” or “error”
This approach is called Ordinary Least Squares (OLS) regression
Slope = !"
Features of the “Best” Line (Simple Regression)
!" = $
%&" '
(% − * (% +
% − *
+
%
(% − * (% , = -./((+) 234(()
Intercept = !5
- ./((+) = $
%&" '
(% − * (% +
% − *
+
%
6
234(() = $
%&" '
(% − * (% , 6
!5 = 7 + − !" 7 (
The Line (8 +
%) = !5 + !"(%
The “Best” Line and Correlation !" = $
%&
'& '(
We unstandardized the $
%& by )* )+
$
%& has no scale but !" is in the units
- f the outcome
$
%& is affected by the range of the
variables measured !" is only affected by variables that influence both X and Y while $
%& is
affected by variables that only influence Y !" is the effect of X on Y while $
%& is
the relative importance of X on Y
We unstandardized the !
"# by $% $&
That is, !
"# is the standardized version of '(
If we standardize our variables before using regression, both !
"# and '( are the same
+" = -. − 0
- 12
Why?
!
"# has a range of -1 to 1
$% is in the range of the outcome (approximately),
- ften is from –∞ to ∞
“For a one unit increase in X there is an associated increase of $% units in the outcome”
!
"# has no scale but $% is in the units of the
- utcome
The value of !" is not affected by the range of X (the significance is…) #
$% is affected by having a less-than-representative
range of X
Why? #
$% is affected by the range of the variables
measured
!
"# is affected by the range of the variables
measured
2 4 6 1 2 3 4
x y
2 4 6 2 4 6
x y
!
"# is a measure of relative importance compared to
- ther variables
- If other variables are important, !
"# will be relatively smaller
$% is a measure of the effect of X on Y and therefore shouldn’t change much based on the range of X
- The standard error is affected though (we’ll discuss later)
$% is only affected by variables that influence both X and Y while !
"# is
affected by variables that only influence Y $% is the effect of X on Y while !
"# is the
relative importance of X on Y
The estimate of !" depends on minimizing the residuals so they are kind of a big deal
Back to Residuals
##$%&'()*+ = -
'." /
( 1
' −3
1
' )5 = - '." /
6'
5
−2 2 −2 −1 1 2
x y
Our !
" values can be separated into
three parts:
Back to Residuals
!
" = $
! + & !
" − $
! + !
" − &
!
"
−2 2 −2 −1 1 2
x y
Explained component Unexplained component (residuals) The same for everyone (a constant)
Our !
" values can be separated into
three parts:
Back to Residuals
!
" = $
! + & !
" − $
! + !
" − &
!
"
Explained component Unexplained component (residuals) The same for everyone (a constant)
−2 2 −2 −1 1 2
x y
Properties of the Residuals
- 1. The mean is exactly zero.
- 2. The correlation with X is exactly zero.
- 3. The variance is:
Var Y. X = Var(Y)(1 − r,-
. )
Var Y. X Var(Y) = (1 − r,-
. )
The proportion of variance in Y not explained by X
- 1. The mean is exactly zero.
- 2. The correlation with X is exactly zero.
- 3. The variance is:
Var Y. X = Var(Y)(1 − r,-
. )
Properties of the Residuals
Var Y. X Var(Y) = (1 − r,-
. )
The proportion of variance in Y not explained by X
r,-
. is the proportion of
variance in Y explained by X
- 1. Partial relationships because the residual is
what is remaining in Y after adjusting for X
- 2. Residual analysis to detect anomalies
- 3. Detect non-linearities
- 4. Assess the homoskedasticity assumption