checking model assumptions with regression diagnostics
play

Checking model assumptions with regression diagnostics Graeme L. - PowerPoint PPT Presentation

@graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Co Confl flicts s of f interest None Assistant Editor (Statistical


  1. @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool

  2. Co Confl flicts s of f interest • None • Assistant Editor (Statistical Consultant) for EJCTS and ICVTS

  3. Question : who routinely checks model assumptions when analyzing data? (raise your hand if the answer is Yes )

  4. Ou Outline • Illustrate with multiple linear regression • Plethora of residuals and diagnostics for other model types • Focus is not to “what to do if you detect a problem”, but “how to diagnose (potential) problems”

  5. My My personal experi rience* • Reviewer of EJCTS and ICVTS for 5-years • Authors almost never report if they assessed model assumptions • Example: only one paper submitted where authors considered sphericity in RM-ANOVA at first submission • Usually one or more comment is sent to authors regarding model assumptions * My views do not reflect those of the EJCTS, ICVTS, or of other statistical reviewers

  6. Li Linear r regression mo modelling • Collect some data • 𝑧 " : the observed continuous outcome for subject 𝑗 (e.g. biomarker) • 𝑦 %" , 𝑦 '" , … , 𝑦 )" : p covariates (e.g. age, male, …) • Want to fit the model • 𝑧 " = 𝛾 , + 𝛾 % 𝑦 %" + 𝛾 ' 𝑦 "' + ⋯ + 𝛾 ) 𝑦 )" + 𝜁 " • Estimate the regression coefficients 0 , , 𝛾 0 % , 𝛾 0 ' , … , 𝛾 0 ) • 𝛾 • Report the coefficients and make inference, e.g. report 95% CIs • But we do not stop there…

  7. Re Residuals • For a linear regression model, the residual for the 𝑗 -th observation is 𝑠 " = 𝑧 " − 𝑧 3 " • where 𝑧 3 " is the predicted value given by 0 , + 𝛾 0 % 𝑦 %" + 𝛾 0 ' 𝑦 "' + ⋯ + 𝛾 0 ) 𝑦 )" 𝑧 3 " = 𝛾 • Lots of useful diagnostics are based on residuals

  8. Li Lineari rity of functional form rm • Assumption: scatterplot of (𝑦 " , 𝑠 " ) should not show any systematic trends • Trends imply that higher-order terms are required, e.g. quadratic, cubic, etc.

  9. Fitted model: A B 80 ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Residual ● 5 ● ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 𝑍 = 𝛾 , + 𝛾 % 𝑌 + 𝜁 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Y ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 5 ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● − 10 0 5 10 15 20 0 5 10 15 20 X X C D 80 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● Residual ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 𝑍 = 𝛾 , + 𝛾 % 𝑌 + 𝛾 ' 𝑌 ' + 𝜁 ● ● ● ● ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● Y ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 4 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● 0 5 10 15 20 0 5 10 15 20 X X

  10. Ho Homogeneity eneity • We often assume assume that 𝜁 " ∼ 𝑂 0, 𝜏 ' • The assumption here is that the variance is constant, i.e. homogeneous • Estimates and predictions are robust to violation, but not inferences (e.g. F -tests, confidence intervals) • We should not see any pattern in a scatterplot of 𝑧 3 " , 𝑠 " • Residuals should be symmetric about 0

  11. Homoscedastic residuals Heteroscedastic residuals A B ● ● ● ● ● 5 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Residual Residual ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 5 − 5 ● ● ● ● ● ● ● − 10 − 10 0 5 10 15 20 25 0 5 10 15 20 25 Fitted value Fitted value

  12. No Normality • If we want to make inferences, we generally assume 𝜁 " ∼ 𝑂 0, 𝜏 ' • Not always a critical assumption, e.g.: • Want to estimate the ‘best fit’ line • Want to make predictions • The sample size is quite large and the other assumptions are met • We can assess graphically using a Q-Q plot, histogram • Note : the assumption is about the errors, not the outcomes 𝑧 "

  13. Normal residuals Skewed residuals 15 ● ● 6 ● ● ● ● ● Sample Quantiles Sample Quantiles ● ● ● 4 ● ● ● ● ● 10 ● ● ● ●● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 6 ● ● ● − 2 − 1 0 1 2 − 2 − 1 0 1 2 Theoretical Quantiles Theoretical Quantiles 25 30 20 Frequency Frequency 15 20 10 10 5 5 0 0 − 6 − 4 − 2 0 2 4 6 8 0 5 10 15 Residuals Residuals

  14. Independenc Independence • We assume the errors are independent • Usually able to identify this assumption from the study design and analysis plan • E.g. if repeated measures, we should not treat each measurement as independent • If independence holds, plotting the residuals against the time (or order of the observations) should show no pattern

  15. Independent Non-independent A B ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Residual ● Residual 0 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 100 ● ● ● ● ● ● ● − 60 ● ● − 150 ● 0 25 50 75 100 0 25 50 75 100 X X

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend