Lecture 9: Interactions, Quadratic terms and Splines Ani - - PowerPoint PPT Presentation

lecture 9 interactions quadratic terms and splines
SMART_READER_LITE
LIVE PREVIEW

Lecture 9: Interactions, Quadratic terms and Splines Ani - - PowerPoint PPT Presentation

Lecture 9: Interactions, Quadratic terms and Splines Ani Manichaikul amanicha@jhsph.edu 30 April 2007 Effect Modification n The phenomenon in which the relationship between the primary predictor and outcome varies across levels of another


slide-1
SLIDE 1

Lecture 9: Interactions, Quadratic terms and Splines

Ani Manichaikul amanicha@jhsph.edu 30 April 2007

slide-2
SLIDE 2

Effect Modification

n The phenomenon in which the relationship

between the primary predictor and outcome varies across levels of another predictor

n We say the other predictor modifies the effect

between the primary predictor and outcome

n In linear regression, coded by inclusion of

interaction term between primary predictor and another predictor

slide-3
SLIDE 3

Reminder: Nested models

n Parent model

n contains one set of variables

n Extended model

n adds one or more new variables to the parent

model

n one variable added: compare models with t test n two or more variables added: compare models with F

test

n Return to the example of wage versus

experience

slide-4
SLIDE 4

Model 1

n This model allows the average wage to differ

for men and women, but the difference in average wage between men and women is

always the same regardless of experience

level.

) ender G (

  • ˆ

) Experience (

  • ˆ
  • ˆ

] Wage [ E

i 2 i 1 i

+ + =

slide-5
SLIDE 5

Model 1

Source | SS df MS Number of obs = 534

  • ------------+------------------------------

F( 2, 531) = 61.62 Model | 2651.49936 2 1325.74968 Prob > F = 0.0000 Residual | 11425.1992 531 21.5163827 R-squared = 0.1884

  • ------------+------------------------------

Adj R-squared = 0.1853 Total | 14076.6985 533 26.4103162 Root MSE = 4.6386

  • wagehr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

educyrs | .7512834 .0768225 9.78 0.000 .6003701 .9021966 gender | -2.124057 .4028322 -5.27 0.000 -2.915397 -1.332716 _cons | .2178312 1.036322 0.21 0.834 -1.817962 2.253624

slide-6
SLIDE 6

Model 2

n What is the interaction variable??

) Experience ender G (

  • ˆ

) ender G (

  • ˆ

) Experience (

  • ˆ
  • ˆ

] Wage [ E

i i 3 i 2 i 1 i

× + + + =

slide-7
SLIDE 7

Model 2: Creating the interaction variable

n gender:

n 0 for men n 1 for women

n gender* experience

= 0* experience = 0 for men = 1* experience = experience for women

slide-8
SLIDE 8

Model 2: output

. generate gender_educ = gender*educ . reg wagehr educyrs gender gender_educ Source | SS df MS Number of obs = 534

  • ------------+------------------------------

F( 3, 530) = 41.50 Model | 2677.43224 3 892.477414 Prob > F = 0.0000 Residual | 11399.2663 530 21.5080496 R-squared = 0.1902

  • ------------+------------------------------

Adj R-squared = 0.1856 Total | 14076.6985 533 26.4103162 Root MSE = 4.6377

  • wagehr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

educyrs | .6831451 .0987423 6.92 0.000 .4891708 .8771194 gender | -4.37045 2.085057 -2.10 0.037 -8.466441 -.2744591 gender_educ | .1725303 .1571232 1.10 0.273 -.1361305 .481191 _cons | 1.104571 1.313655 0.84 0.401 -1.476038 3.685181

slide-9
SLIDE 9

Model 2: Interpretation

n Equation for men: n Equation for women: n 2: change in mean wage for women vs. men

with no experience

n 3: change in slope (of experience) for women

  • vs. men

) Experience ( 68 . 10 . 1 ] Wage [ E ) Experience (

  • ˆ
  • ˆ

] Wage [ E

i i i 1 i

+ = + =

( ) ( )

( ) ( )

) Experience ( 0.17 68 . 37 . 4 10 . 1 ] Wage [ E ) Experience (

  • ˆ
  • ˆ
  • ˆ
  • ˆ

] Wage [ E

i i i 3 1 2 i

+ + − = + + + =

slide-10
SLIDE 10

n Men with no experience n Women with no experience n

is the difference in mean wage between women and men of no experience

i

  • ˆ

1.10 0) ( 17 . ) ( 4.37 0) ( 68 . 10 . 1 ] Wage [ E = = × + − + =

2 i

  • ˆ
  • ˆ

4.37

  • 1.10

0) 1 ( 17 . ) 1 ( 4.37 0) ( 68 . 10 . 1 ] Wage [ E + = = × + − + =

2

  • ˆ

Model 2: Predictions by gender, no experience

slide-11
SLIDE 11

n Men with 1 year of experience n Women with 1 year of experience n

is the difference in mean wage between women and men with one year of

experience

1 i

  • ˆ
  • ˆ

68 . 1.10 1) ( 17 . ) ( 4.37 1) ( 68 . 10 . 1 ] Wage [ E + = + = × + − + =

3 2 1 i

  • ˆ
  • ˆ
  • ˆ
  • ˆ

17 . 4.37

  • 68

. 1.10 1) 1 ( 17 . ) 1 ( 4.37 1) ( 68 . 10 . 1 ] Wage [ E + + + = + + = × + − + =

3 2

  • ˆ
  • ˆ +

Model 2: Predictions by gender, 1 year of experience

slide-12
SLIDE 12

n Men with 2 years of experience n Women with 2 years of experience n

is the difference in mean wage between women and men with two years

  • f experience

1 i

  • ˆ

2

  • ˆ

2) ( 68 . 1.10 2) ( 17 . ) ( 4.37 2) ( 68 . 10 . 1 ] Wage [ E + = + = × + − + =

3 2 1 i

  • ˆ

2

  • ˆ
  • ˆ

2

  • ˆ

2) ( 17 . 4.37

  • 2)

( 68 . 1.10 2) 1 ( 17 . ) 1 ( 4.37 2) ( 68 . 10 . 1 ] Wage [ E + + + = + + = × + − + =

3 2

  • ˆ

2

  • ˆ +

Model 2: Predictions by gender, 2 years of experience

slide-13
SLIDE 13

Model 2: Interpretation

n

0: The average wage for men with no experience

n

1: The difference in average wage for a one year

increase in experience among men

n

2: The difference in average wage between women

and men with no experience

n

3: The difference of the difference in average

wage for a one year increase in experience between

women and men

n the change in slope between women and men n the slope for women is 1+ 3

slide-14
SLIDE 14

Compare to model 1

n In the parent model

n 1 was slope for both men and women n 2 was difference between women & men at every

experience level

n In the extended model (with interaction)

n 1 is slope for men n 2 is difference between women & men for

experience= 0

n 3 is change in slope per year of experience

between men & women

slide-15
SLIDE 15

Is the change in slope statistically significant?

n Test model 1 vs. model 2

n only 1 variable added n use t test for that variable to compare

models

n H0: 3= 0 in the population n From the t-statistic, p = 0.27 n Fail to reject H0 n Conclude that model 1 is better

slide-16
SLIDE 16

Model 3: Interaction of two binary predictors

n Model 2:

n continuous X, binary X, their interaction

n slope changes by group

n Model 3:

n binary X, binary X, their interaction

n difference in mean changes by group

slide-17
SLIDE 17

Model 3: output

Source | SS df MS Number of obs = 534

  • ------------+------------------------------

F( 3, 530) = 13.94 Model | 1029.58518 3 343.195059 Prob > F = 0.0000 Residual | 13047.1134 530 24.617195 R-squared = 0.0731

  • ------------+------------------------------

Adj R-squared = 0.0679 Total | 14076.6985 533 26.4103162 Root MSE = 4.9616

  • wagehr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

gender | -.0951139 .7350696 -0.13 0.897 -1.539121 1.348894 married | 2.521311 .6121088 4.12 0.000 1.318854 3.723768 gender_mar~d | -3.097184 .907319 -3.41 0.001 -4.879567 -1.314802 _cons | 8.354752 .4936948 16.92 0.000 7.384914 9.324591

slide-18
SLIDE 18

Model 3: Creating the interaction variable

n

gender:

n 0 for men n 1 for women

n

married:

n 0 if unmarried n 1 if married

n

gender* married

= 0* 0 = 0 for unmarried men = 1* 0 = 0 for unmarried women = 0* 1 = 0 for married men = 1* 1 = 1 for married women

slide-19
SLIDE 19

Graph for Model 3

2 4 6 8 10 12

unmarried men unmarried women married men married women

Mean hourly wage

Difference = 1 Difference = 2 Difference = 13 Difference = 23 3 = Difference of differences

slide-20
SLIDE 20

Model 3: Interpretation

n

0: The average wage for unmarried men

n

1: The difference in average wage between unmarried women and unmarried men

n

1+ 3 : The difference in average wage between married women and married men

n

3: The difference of the difference in average

wage between married women and married men and between unmarried women and unmarried men

slide-21
SLIDE 21

Model 3: Interpretation

n

0: The average wage for unmarried men

n

2: The difference in average wage between married men and unmarried men

n

2 + 3 : The difference in average wage between

married women and unmarried women

n

3: The difference of the difference in average

wage between married women and unmarried

women and between married men and unmarried men

slide-22
SLIDE 22

Model 3: conclusion

n The interaction variable is statistically

significantly different from 0

n (p= 0.001, CI: -4.9 to -1.3 )

n The difference in mean hourly wage between

women and men is greater for married people than for unmarried people.

  • or-

The difference in mean hourly wage between married people and unmarried people is greater for men than for women.

slide-23
SLIDE 23

23

Summary

n Interaction

n interaction= var1* var2 n interaction variable changes interpretation of

entire model

n with interaction, the effect of one variable

changes according to the level of the second variable

n Test for interaction by testing new variable

n if significant (p< , 0 not in CI), keep n if not significant, go back to parent model without

interaction variable

slide-24
SLIDE 24

Flexibility in linear models

n In linear regression, we assume the

  • utcome, Y, has a linear relationship

with the predictors, X

n However, we have flexibility in defining

the predictors

n transform X, such as X2 or X3 n use linear splines to fit ”broken arrow”

models

slide-25
SLIDE 25

Example: Hospital Expenditures ($$)

n The data are similar to an example from the

book by Pagano and Gauvreau: Principles of Biostatistics Data:

n Y - Average Hospital expenditure ($s) per

admission

n X1 - Average length of stay (days) n X2 - Average employee salary ($s) n n = 51; 50 U.S. states + DC

slide-26
SLIDE 26

Scientific Question

n How is per capita expenditure (Y)

related to:

n Length of stay (X1) n Employee salary (X2)

slide-27
SLIDE 27

Model

n We might formulate a MLR:

1) Y = β0 + β1X1 + β2X2 + ε 2) ε ~ N(0, σ2) where:

n Y =

Expenditures per admission in $s

n X1 =

Length of stay (LOS) in days

n X2 =

Salary in $s

slide-28
SLIDE 28

Model: E( Y | X ) = β0 + β1X1 + β2X2

Parameter Interpretations:

n β0: expected expenditure when LOS = 0 and

salary = 0; (Need to center the model!)

n β1: difference in expected expenditure ($s)

for two states with same average salary but LOS that differs by one day

n β2: difference in expected per capita

expenditure ($s) for two states with same average LOS but salary that differs by one dollar

slide-29
SLIDE 29

Basic Model

Source | SS df MS Number of obs = 51

  • ------------+------------------------------

F( 2, 48) = 46.08 Model | 25555145.4 2 12777572.7 Prob > F = 0.0000 Residual | 13311254.7 48 277317.807 R-squared = 0.6575

  • ------------+------------------------------

Adj R-squared = 0.6432 Total | 38866400.2 50 777328.003 Root MSE = 526.61

  • expend | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

los | 313.5297 73.44155 4.27 0.000 165.8656 461.1938 salary | .333249 .0379306 8.79 0.000 .2569844 .4095137 _cons | -4662.343 808.7017 -5.77 0.000 -6288.346 -3036.339

slide-30
SLIDE 30

Check for curvature & other patterns of interest:

e(expend | X) e( los | X )

  • 2.10593

2.25279

  • 1131.39

1516.55 e(expend | X) e( salary | X )

  • 2968.18

8599.59

  • 989.144

4815.65 Standardized Residuals length of stay (days) 4 6 8 10

  • 2

2 4 Standardized Residuals salary ($) 10000 15000 2000025000

  • 2

2 4

AVPlots Residuals

slide-31
SLIDE 31

Diagnosis

n The Alaskan outlier appears here as well as

some curvature in the salary relationship

n There appears to be a non-linear relationship

between expenditures (Y) and salary (X2).

n How could we incorporate this in our model?

n Define a new variable: salary2 and include it in

the model:

slide-32
SLIDE 32

New Model

E( Y | X ) = β0 + β1X1 + β2X2 + β3X2

2

Linear relationship with X1 Quadratic relationship with X2

slide-33
SLIDE 33

Quadratic Term

n Expenditures are linearly related to

length of stay, but have a quadratic relationship with salary.

n Define a new variable:

salary2 = salary^ 2 and include it in the regression.

slide-34
SLIDE 34

Model Output

Source | SS df MS Number of obs = 50

  • ------------+------------------------------

F( 3, 46) = 142.76 Model | 17552265.1 3 5850755.03 Prob > F = 0.0000 Residual | 1885257.79 46 40983.8651 R-squared = 0.9030

  • ------------+------------------------------

Adj R-squared = 0.8967 Total | 19437522.9 49 396684.14 Root MSE = 202.44

  • expend | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

los | 441.9992 29.34269 15.06 0.000 382.9354 501.063 salary | -2.883287 .2929512 -9.84 0.000 -3.472967 -2.293607 salary2 | .0001002 9.58e-06 10.46 0.000 .0000809 .0001195 _cons | 19724.65 2206.543 8.94 0.000 15283.11 24166.19

slide-35
SLIDE 35

Interpretations

n β0: ??? n β1: We estimate that expected expenditures

per admission will be $442 higher (95% CI: $372-512) in a state whose average LOS is

  • ne day longer than another state with the

same average employee salary

n β2: ??? n β3: ???

slide-36
SLIDE 36

Inferences

n Is salary related to expenditures? n Could test:

n H0: β2 = 0? n H0: β3 = 0?

n But really want

n H0: β2 = β3 = 0 n overall test for salary

slide-37
SLIDE 37

Hospital Example

n Recall Model:

E( Y | X ) = β0 + β1X1 + β2X2 + β3X2

2

Ho: β2 = β3 = 0

(Test by hand: need SSEE, SSEF)

slide-38
SLIDE 38

Full Model Results

Source | SS df MS Number of obs = 50

  • ------------+------------------------------

F( 3, 46) = 142.76 Model | 17552265.1 3 5850755.03 Prob > F = 0.0000 Residual | 1885257.79 46 40983.8651 R-squared = 0.9030

  • ------------+------------------------------

Adj R-squared = 0.8967 Total | 19437522.9 49 396684.14 Root MSE = 202.44

  • expend | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

los | 441.9992 29.34269 15.06 0.000 382.9354 501.063 salary | -2.883287 .2929512 -9.84 0.000 -3.472967 -2.293607 salary2 | .0001002 9.58e-06 10.46 0.000 .0000809 .0001195 _cons | 19724.65 2206.543 8.94 0.000 15283.11 24166.19

  • SSEF = 1885257.79, n-p-s-1 = 50-1-2-1 = 46
slide-39
SLIDE 39

Null Model Results

Source | SS df MS Number of obs = 50

  • ------------+------------------------------

F( 1, 48) = 47.04 Model | 9621038.76 1 9621038.76 Prob > F = 0.0000 Residual | 9816484.12 48 204510.086 R-squared = 0.4950

  • ------------+------------------------------

Adj R-squared = 0.4845 Total | 19437522.9 49 396684.14 Root MSE = 452.23

  • expend | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

los | 443.3567 64.63975 6.86 0.000 313.3898 573.3236 _cons | -786.6091 490.4083 -1.60 0.115 -1772.641 199.4228

  • “Null” model: E( Y | X ) = β0 + β1X1

SSEE = 9816484.12, s= 2

slide-40
SLIDE 40

F-test Results

F-test: F2,46 =

96.76 (p< 0.001; F.05,2,46= 3.2)

Reject the null: conclude that the salary effects were statistically significant in regression model

) 1 2 1 50 /( 79 . 1885257 2 / ) 3 . 7931226 ( − − −

slide-41
SLIDE 41

Linear Splines: set-up

n The broken arrow model n Example:

n A researcher tells you most Health Management

Organizations (HMOs) will usually pay for the first week of a hospital stay only

n She expects expenditures to increase dramitically

if LOS was longer than one week

n How should we set up the model?

slide-42
SLIDE 42

The researcher thought the LOS regression line should look like:

Broken Arrow Model Expenditures length of stay (days) 3 5 7 9 2000 2500 3000 3500

slide-43
SLIDE 43

Defining a New Variable

n

Similar to what we did in ANCOVA, we could just define a new variable that checks to see if the slope is indeed different if LOS is greater than 7.

n

Idea, include a term:

n

(LOS-7)+ = (LOS – 7) if LOS> 7 = 0 if LOS< = 7

The spline allows you to change the magnitude of the slope!

slide-44
SLIDE 44

When to use a spline?

n When a continuous predictor is used, a

typical regression equation assumes there is a straight-line relationship between X and Y in the population.

n If the relationship between X and Y is

n a bent line n a curve

adding a spline may more accurately model the relationship between X and Y

slide-45
SLIDE 45

Visualizing the Model

Broken Arrow Model Expenditures length of stay (days) 3 5 7 9 2000 2500 3000 3500

Slope = β1 Slope = β1 + β2

slide-46
SLIDE 46

The Model

n Model:

E(expenditures) = β0 + β1LOS + β2(LOS-7)+

Where: (LOS – 7) if LOS> 7 (LOS-7)+ = 0 if LOS< = 7

slide-47
SLIDE 47

Then:

E(expenditures | LOS < = 7) = β0 + β1LOS E(exp | LOS > 7) = β0 + β1LOS + β2(LOS - 7) = (β0 - β2⋅7)+ (β1+ β2)LOS = β0* + β1* LOS

slide-48
SLIDE 48

New Model

E(Y | X) = β0 + β1X1 + β2(X1 - 7)+ + β3X2 + β4X2

2

Broken Arrow relationship with X1 Quadratic relationship with X2

slide-49
SLIDE 49

Adding Spline to Quadratic

n Expenditures have a different linear

relationship before and after a 7 day length of stay, and have a quadratic relationship with salary.

n We’ll just define a new variable:

los7 = (los-7)* (los> 7) and include it in the regression.

slide-50
SLIDE 50

Results

Source | SS df MS Number of obs = 50

  • ------------+------------------------------

F( 4, 45) = 126.01 Model | 17844348.0 4 4461087.00 Prob > F = 0.0000 Residual | 1593174.87 45 35403.8861 R-squared = 0.9180

  • ------------+------------------------------

Adj R-squared = 0.9108 Total | 19437522.9 49 396684.14 Root MSE = 188.16

  • expend | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

los | 212.5361 84.41545 2.52 0.015 42.51468 382.5576 los7 | 347.7778 121.0805 2.87 0.006 103.9091 591.6465 salary | -3.143061 .2869069 -10.95 0.000 -3.720921 -2.565201 salary2 | .0001082 9.32e-06 11.60 0.000 .0000894 .0001269 _cons | 23276.97 2394.892 9.72 0.000 18453.41 28100.53

slide-51
SLIDE 51

Centering LOS in the expenditures model

n Y: Average Hospital expenditure ($s)

per admission

n X1: Average length of stay (days) n X2: Average employee salary($1000s)

Centered Model: E(Y|X) = β0 + β1(X1-7) + β2(X1-7)+ +

β3(X2 -15) + β4(X2-15)2

slide-52
SLIDE 52

Final Model for Expenditures

Source | SS df MS Number of obs = 50

  • ------------+------------------------------

F( 4, 45) = 126.01 Model | 17844345.3 4 4461086.31 Prob > F = 0.0000 Residual | 1593177.63 45 35403.9473 R-squared = 0.9180

  • ------------+------------------------------

Adj R-squared = 0.9108 Total | 19437522.9 49 396684.14 Root MSE = 188.16

  • expend | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

losc | 212.5366 84.41552 2.52 0.015 42.515 382.5582 losc7 | 347.7772 121.0806 2.87 0.006 103.9083 591.646 salc | 101.6865 19.69614 5.16 0.000 62.01645 141.3566 salc2 | 108.1581 9.324742 11.60 0.000 89.37714 126.9391 _cons | 1954.413 68.69979 28.45 0.000 1816.045 2092.782

  • E( Y | X ) = 1954 + 213(X1-7) + 348(X1-7) +

+ 102(X2 -15) + 108(X2-15) 2

slide-53
SLIDE 53

Back to modelling wages

  • 2

2 4 Standardized residuals 20 30 40 50 60 age

We removed an outlier, but do we still need a spline?

slide-54
SLIDE 54

How should we add the spline?

n Goal: let the regression line bend n Model:

E(Wagei) = 0+ 1(age-35)+ 2(age-35)+

n What is (age-35) + ?

n 0 if age< 35 n (age-35) if age> = 35

slide-55
SLIDE 55

Fitted model with spline at 35

Source | SS df MS Number of obs = 533

  • ------------+------------------------------

F( 2, 530) = 28.18 Model | 1231.65577 2 615.827885 Prob > F = 0.0000 Residual | 11584.1395 530 21.8568669 R-squared = 0.0961

  • ------------+------------------------------

Adj R-squared = 0.0927 Total | 12815.7952 532 24.0898407 Root MSE = 4.6751

  • wagehr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

age_cent | .3328909 .0470853 7.07 0.000 .2403943 .4253876 age_spline | -.374546 .0663082 -5.65 0.000 -.504805 -.2442869 _cons | 10.45389 .3577241 29.22 0.000 9.751156 11.15662

slide-56
SLIDE 56

Fitted Graph (with spline)

10 20 30 40 50 20 30 40 50 60 age Wage ($/hour)

slide-57
SLIDE 57

n E(Wagei)

= 10.45+ 0.33(age-35)-0.37(age-35)+

n For a person under 35:

n E(Wagei)

= 10.45+ 0.33(age-35)-0.37(age-35)+

n For a person 35 or older:

n E(Wagei)

= 10.45+ 0.33(age-35)-0.37(age-35)+ = 10.45-0.04(age-35)

(age-35) 12 = new slope for those over 35

slide-58
SLIDE 58

Interpretation

n 0 is the average wage for people who are 35

years old

n 1 is the change in average wage per

additional year of age for those under 35

n 2 is the difference in the change in average

wage per additional year of age for those

  • ver age 35 as compared to those under

age 35

n 2 is the change in the slope for over 35 vs.

under 35

slide-59
SLIDE 59

Better Interpretation

n The average wage for people who are 35

years old is $10.45/hour (95% CI: $9.75, 11.16)

n For each additional year of age, those under

age 35 earn an average of $0.33 more per

hour (95% CI: $0.24, $0.43)

n For each additional year of age, those over

age 35 earn an average of $0.04 less per

hour (95% CI: -$0.10, $0.01)

slide-60
SLIDE 60

Is the change in slope statistically significant?

n

One variable was added to create the change in slope

n

compare nested models with t test

. regress wagehr age_cent age_spline if sres_age<6 Source | SS df MS Number of obs = 533

  • ------------+------------------------------

F( 2, 530) = 28.18 Model | 1231.65577 2 615.827885 Prob > F = 0.0000 Residual | 11584.1395 530 21.8568669 R-squared = 0.0961

  • ------------+------------------------------

Adj R-squared = 0.0927 Total | 12815.7952 532 24.0898407 Root MSE = 4.6751

  • wagehr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

age_cent | .3328909 .0470853 7.07 0.000 .2403943 .4253876 age_spline | -.374546 .0663082 -5.65 0.000 -.504805 -.2442869 _cons | 10.45389 .3577241 29.22 0.000 9.751156 11.15662

  • n

H0: spline is not needed (no change in slope in the population)

n

p< 0.001 or CI does not include 0: reject H0

n

Conclude slope differs for those over vs. under 35 in population

slide-61
SLIDE 61

“L” – Linear relationship

n With the spline, there is no longer any

pattern in the residuals

n After removing the one outlier, no

  • thers appear to stand out
slide-62
SLIDE 62

“I” - Independence

n We cannot check this by looking at the

data

slide-63
SLIDE 63

“N” – Normality of the residuals

n The residuals are slightly skewed to

positive values

n the estimated regression coefficients are

still correct

n their confidence intervals may be

misleading

slide-64
SLIDE 64

“E” – Equal variance of the residuals across X

n The vertical spread of the residuals may

be smaller for those under 25 years of age

n the estimated regression coefficients are

still correct

n their confidence intervals may be

misleading

slide-65
SLIDE 65

Conclusion

n

The increase in hourly wage with increasing age is statistically significant for those who recently entered the workforce (ages 18-35): for each additional year, these workers earn an average of 33 cents more per hour.

n

However, this increase in wage with increasing age levels

  • ff for those over age 35, so that no appreciable increase

in average wage is observed for those over age 35.

n

One 21-year-old had much higher earnings ($44.50 per hour) than other young workers. This person’s results were so unlike the rest of the sample that the observation was dropped from the analysis. It is possible that the data was incorrectly entered for this person, but we are unable to assess the data entry since the original completed surveys are unavailable.

slide-66
SLIDE 66

66

Splines

n Splines are used to allow the regression

line to bend

n the breakpoint is arbitrary and decided

graphically

n the actual slope above and below the

breakpoint is usually of more interest than the coefficient for the spline (ie the

change in slope)