regression diagnostics
play

Regression Diagnostics Introduction to Regression 1 Why do we need - PowerPoint PPT Presentation

Regression Diagnostics Introduction to Regression 1 Why do we need to do all this? Theory based on assumptions Focuses on the residuals and fitted values Validate the model Gives us clues how to change the model Is it


  1. Regression Diagnostics Introduction to Regression 1

  2. Why do we need to do all this? • Theory based on assumptions • Focuses on the residuals and fitted values • Validate the model • Gives us clues how to change the model • Is it appropriate? • Lots of statistical tests based on certain assumptions 2

  3. What shall we look at? • Calculate residuals for each case • Observed value – Predicted value • Standardise them by dividing by their SD (approx.) • Different types of standardised residuals • We need to do a series of plots • Remember the model • Constant variance 3

  4. 4

  5. Standardised Residuals • Should be – Size??? – Independent – Normally distributed – Constant – Unrelated to the fitted values – Unrelated to the independent variables 5

  6. Plots to do compute • Normal probability plot of residuals • Look for large standardised residuals • Check values • Plot residuals vs fitted/ predicted values • Plot residuals vs each independent variable • Plot residuals against time (if that is appropriate) 6

  7. 7

  8. 8

  9. 9

  10. 10

  11. The regression equation is sqrtrooms = 0.200 + 1.90 sqrtcrews 11

  12. Leverage • Measure the distance from the x-values to the mean of the x-values • May influence results • p-predictors • High values for leverage > 2 ∗ (𝑞+1) 𝑜 • Be careful here 12

  13. Outliers and Bad leverage points • Examine them and see if they are different • Flag a problem with model • Consider fitting another model 13

  14. Cooks distance • Measures the influence of an observation on the set of regression coefficients . Influential observations can be leverage points, outliers, or both. • Look for gaps • Function of leverage and standardised residuals • Suggested cutoffs are 4/(n-2). • What happens when you omit points 14

  15. Makey up data SRESID Leverage Cooks 1.54 0.26 0.42 -4.35 0.26 3.35 15

  16. 16

  17. Results Including all points 17

  18. 18

  19. And more Without x=20, and y=10 point 19

  20. 20

  21. Without point x=20, y=95 21

  22. DFITS • Measures the influence of each observation on the fitted values • Roughly the number of standard deviations that the fitted value changes when each observation is removed from the data set and the model is refit. 22

  23. 23

  24. What model to fit? • Suppose we start with • Salaries = α+β 1 Experience+ε – linear model • And look at residuals vs fitted values 24

  25. 25

  26. So what model should we fit? • Salaries = α+β 1 Experience+ε – linear model • Should create a new variable • Experience*Experience and added it to model • Salaries = α+β 1 Experience+β 2 Exper*Exper + ε • Polynomial model 26

  27. 27

  28. 28

  29. Oregon Housing • Description • 76 single-family homes in Eugene, Oregon during 2005 • Estate agents have their methods of determining price • Seller wanted a method of determining asking price

  30. Variables • Price (thousands of $) • Floor size (thousands of sq ft) • Age of house • Number of bedrooms • Number of bathrooms • Garage size • School area • Lot size (1:11)- interesting variable too.

  31. Coding of Lot size Category Lot Size 1 0-3k 2 3-5k 3 5-7k 4 7-10k 5 10-15k 6 15-20k 7 20K-1acre 8 1-3ac 9 3-5ac 10 5-10ac 11 10-20ac 0-3k = 0-3,000 sq ft 1 acre = 43,560 sq ft 31

  32. Model • Going to focus on three variables Price, Size and Age • Age is coded as (Year – 70)/10 • Going to fit two models • Price = α + β 1 *Size+ β 2 *Age+ ε • Price = α + β 1 *Size+ β 2 *Age+ β 3 *Age*Age+ ε • First we draw some graphs 32

  33. 33

  34. First model 34

  35. Residuals vs Age 35

  36. Second model 36

  37. Article Modeling Home Prices Using Realtor Data Iain Pardoe Lundquist College of Business, University of Oregon Journal of Statistics Education Volume 16, Number 2 (2008), www.amstat.org/publications/jse/v16n2/datasets.pardoe.html 37

  38. Conclusion • Be sure to run diagnostics • Examine the plots • Check funny points • Try out some changes 38

  39. Added variable plots • Added-variable plots enable us to visually assess the effect of each predictor, having adjusted for the effects of the other predictors. • Y and two predictor variables X and Z • Regress Y on X – calculate residuals – Set 1 • Regress Z on X – calculate residuals – Set 2 • Plot Set 1 residuals vs Set 2 residuals 39

  40. And more… • Residuals from Y and X = part of Y not predicted by X • Residuals from Z and X = part of Z not predicted by X • Added-variable plot for predictor variable Z shows that part of Y that is not predicted by X against that part of Z that is not predicted by X 40

  41. Another dataset • Price = the price (in $US) of dinner (including one drink and a tip) • Food = customer rating of the food (out of 30) • Décor = customer rating of the decor (out of 30) • Service = customer rating of the service (out of 30) • East = 1 (0) if the restaurant is east (west) of Fifth Avenue 41

  42. Added Variable plots • For Food variable • Price vs Décor, service and East – calculate residuals • Food vs Décor, service and East- calculate residuals • Plot residuals against each other 42

  43. 43

  44. 44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend