STAT 401A - Statistical Methods for Research Workers Case statistics - - PowerPoint PPT Presentation

stat 401a statistical methods for research workers
SMART_READER_LITE
LIVE PREVIEW

STAT 401A - Statistical Methods for Research Workers Case statistics - - PowerPoint PPT Presentation

STAT 401A - Statistical Methods for Research Workers Case statistics Jarad Niemi (Dr. J) Iowa State University last updated: November 17, 2014 Jarad Niemi (Iowa State) Case statistics November 17, 2014 1 / 9 Influential observations Case


slide-1
SLIDE 1

STAT 401A - Statistical Methods for Research Workers

Case statistics Jarad Niemi (Dr. J)

Iowa State University

last updated: November 17, 2014

Jarad Niemi (Iowa State) Case statistics November 17, 2014 1 / 9

slide-2
SLIDE 2

Influential observations

Case statistics

Definition Leverage (hi) is a measure of the distance between an observation’s explanatory variable values and the average of the explanatory variable values in the entire data set. Rule-of-thumb: Possible concern when leverage > 2p/n where p is the number of regression coefficients and n is the number of observations. Definition Cook’s distance (D) is a measure of the overall effect on estimated regression coefficients when removing an observation. Rule-of-thumb: Concerned when Cook’s D ≈ 1.

Jarad Niemi (Iowa State) Case statistics November 17, 2014 2 / 9

slide-3
SLIDE 3

Influential observations Leverage and influence

Consider simple linear regression (point of interest is the open circle):

Low influence

Leverage= 0.05 Cook's D= 0

Low leverage High influence

Leverage= 0.05 Cook's D= 0.36 Leverage= 0.42 Cook's D= 0.05

High leverage

Leverage= 0.42 Cook's D= 4.11 Jarad Niemi (Iowa State) Case statistics November 17, 2014 3 / 9

slide-4
SLIDE 4

Influential observations Residuals

Residuals

Residual (observed minus predicted): ri = ˆ ei = Yi − ˆ µi (Internally) studentized residual ri

  • SD(ri)

= ri ˆ σ√1 − hi Externally studentized residuals ri ˆ σ(i) √1 − hi where ˆ σ(i) is the estimate of the standard deviation about the regression line from the fit that excludes observation i. 95% of studentized residuals should be within -2 and 2.

Jarad Niemi (Iowa State) Case statistics November 17, 2014 4 / 9

slide-5
SLIDE 5

Influential observations Residuals

SAT residuals after adjusting for % taking and median class rank:

Residuals Studentized residuals Externally studentized residuals −100 −50 50 −3 −2 −1 1 −3 −2 −1 1 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50

Case number value

Jarad Niemi (Iowa State) Case statistics November 17, 2014 5 / 9

slide-6
SLIDE 6

Influential observations Residuals

DATA case1201; INFILE 'case1201.csv' DSD FIRSTOBS=2; INPUT state $ sat takers income years public expend rank; ltakers = log(takers); IF state='Alaska' THEN DELETE; RUN; PROC GLM DATA=case1201; MODEL sat = ltakers rank; RUN;

Jarad Niemi (Iowa State) Case statistics November 17, 2014 6 / 9

slide-7
SLIDE 7

Influential observations Residuals

SAS diagnostics:

Jarad Niemi (Iowa State) Case statistics November 17, 2014 7 / 9

slide-8
SLIDE 8

Influential observations Residuals mod = lm(SAT~log(Takers)+Rank, case1201)

  • par = par(mfrow=c(2,3)); plot(mod, 1:6, ask=FALSE); par(opar)

850 950 1050 −100 −50 50 Fitted values Residuals

Residuals vs Fitted

50 48 16

−2 −1 1 2 −3 −2 −1 1 2 Theoretical Quantiles Standardized residuals

Normal Q−Q

50 48 16

850 950 1050 0.0 0.5 1.0 1.5 Fitted values Standardized residuals

Scale−Location

50 48 16

10 20 30 40 50 0.00 0.05 0.10 0.15

  • Obs. number

Cook's distance

Cook's distance

50 16 48

0.00 0.05 0.10 0.15 −3 −1 1 2 Leverage Standardized residuals Cook's distance 0.5

Residuals vs Leverage

50 16 48

0.00 0.04 0.08 0.12 Leverage hii Cook's distance 0.02 0.08 0.14 0.5 1 1.5 2 2.5 3 3.5

Cook's dist vs Leverage hii (1

50 16 48

Jarad Niemi (Iowa State) Case statistics November 17, 2014 8 / 9

slide-9
SLIDE 9

Influential observations Summary

Summary of case statistics

Leverage: observations that might be influential Cook’s distance: observations had large overall influence on their own

If influential, fit with and without to determine impact on questions of interest

Residuals: observations are not being fit accurately by the model Check out this app (on campus or VPN): http://shiny1.stat.iastate.edu/_Statistics/14-outlier/

Jarad Niemi (Iowa State) Case statistics November 17, 2014 9 / 9