Regression Diagnostics Introduction to Regression 1 Why do we need - PowerPoint PPT Presentation

Regression Diagnostics Introduction to Regression 1

Why do we need to do all this? • Theory based on assumptions • Focuses on the residuals and fitted values • Validate the model • Gives us clues how to change the model • Is it appropriate? • Lots of statistical tests based on certain assumptions 2

What shall we look at? • Calculate residuals for each case • Observed value – Predicted value • Standardise them by dividing by their SD (approx.) • Different types of standardised residuals • We need to do a series of plots • Remember the model • Constant variance 3

Standardised Residuals • Should be – Size??? – Independent – Normally distributed – Constant – Unrelated to the fitted values – Unrelated to the independent variables 5

Plots to do compute • Normal probability plot of residuals • Look for large standardised residuals • Check values • Plot residuals vs fitted/ predicted values • Plot residuals vs each independent variable • Plot residuals against time (if that is appropriate) 6

The regression equation is sqrtrooms = 0.200 + 1.90 sqrtcrews 11

Leverage • Measure the distance from the x-values to the mean of the x-values • May influence results • p-predictors • High values for leverage > 2 ∗ (𝑞+1) 𝑜 • Be careful here 12

Outliers and Bad leverage points • Examine them and see if they are different • Flag a problem with model • Consider fitting another model 13

Cooks distance • Measures the influence of an observation on the set of regression coefficients . Influential observations can be leverage points, outliers, or both. • Look for gaps • Function of leverage and standardised residuals • Suggested cutoffs are 4/(n-2). • What happens when you omit points 14

Makey up data SRESID Leverage Cooks 1.54 0.26 0.42 -4.35 0.26 3.35 15

Results Including all points 17

And more Without x=20, and y=10 point 19

Without point x=20, y=95 21

DFITS • Measures the influence of each observation on the fitted values • Roughly the number of standard deviations that the fitted value changes when each observation is removed from the data set and the model is refit. 22

What model to fit? • Suppose we start with • Salaries = α+β 1 Experience+ε – linear model • And look at residuals vs fitted values 24

So what model should we fit? • Salaries = α+β 1 Experience+ε – linear model • Should create a new variable • Experience*Experience and added it to model • Salaries = α+β 1 Experience+β 2 Exper*Exper + ε • Polynomial model 26

Oregon Housing • Description • 76 single-family homes in Eugene, Oregon during 2005 • Estate agents have their methods of determining price • Seller wanted a method of determining asking price

Variables • Price (thousands of $) • Floor size (thousands of sq ft) • Age of house • Number of bedrooms • Number of bathrooms • Garage size • School area • Lot size (1:11)- interesting variable too.

Coding of Lot size Category Lot Size 1 0-3k 2 3-5k 3 5-7k 4 7-10k 5 10-15k 6 15-20k 7 20K-1acre 8 1-3ac 9 3-5ac 10 5-10ac 11 10-20ac 0-3k = 0-3,000 sq ft 1 acre = 43,560 sq ft 31

Model • Going to focus on three variables Price, Size and Age • Age is coded as (Year – 70)/10 • Going to fit two models • Price = α + β 1 *Size+ β 2 *Age+ ε • Price = α + β 1 *Size+ β 2 *Age+ β 3 *Age*Age+ ε • First we draw some graphs 32

First model 34

Residuals vs Age 35

Second model 36

Article Modeling Home Prices Using Realtor Data Iain Pardoe Lundquist College of Business, University of Oregon Journal of Statistics Education Volume 16, Number 2 (2008), www.amstat.org/publications/jse/v16n2/datasets.pardoe.html 37

Conclusion • Be sure to run diagnostics • Examine the plots • Check funny points • Try out some changes 38

Added variable plots • Added-variable plots enable us to visually assess the effect of each predictor, having adjusted for the effects of the other predictors. • Y and two predictor variables X and Z • Regress Y on X – calculate residuals – Set 1 • Regress Z on X – calculate residuals – Set 2 • Plot Set 1 residuals vs Set 2 residuals 39

And more… • Residuals from Y and X = part of Y not predicted by X • Residuals from Z and X = part of Z not predicted by X • Added-variable plot for predictor variable Z shows that part of Y that is not predicted by X against that part of Z that is not predicted by X 40

Another dataset • Price = the price (in $US) of dinner (including one drink and a tip) • Food = customer rating of the food (out of 30) • Décor = customer rating of the decor (out of 30) • Service = customer rating of the service (out of 30) • East = 1 (0) if the restaurant is east (west) of Fifth Avenue 41

Added Variable plots • For Food variable • Price vs Décor, service and East – calculate residuals • Food vs Décor, service and East- calculate residuals • Plot residuals against each other 42

Regression Diagnostics Introduction to Regression 1 Why do we need - PowerPoint PPT Presentation

Regression Diagnostics Introduction to Regression 1 Why do we need to do all this? Theory based on assumptions Focuses on the residuals and fitted values Validate the model Gives us clues how to change the model Is it

Regression Diagnostics and the Forward Search 1 A. C. Atkinson, London School of Economics

Regression Diagnostics and Troubleshooting Jeffrey Arnold May 3, 2016 Question How do

Application of Local Influence Diagnostics to the Buckley-James Model Nazrina Aziz 1 and Dong Q

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Innatoss Innovative diagnostics Expert in intracellular infectious diseases Diagnostics for Lyme

Diagnostics Applications, Limitations and Outlook Dr. Dirk Biskup, CeGaT Companion

Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp

On ringing gravitational waves from black holes Takahiro Tanaka (Kyoto Univeristy) Hiroyuki

Planck 2015 results. XXV. Diffuse low-frequency Galactic foregrounds A&A (in publication),

ASL Cal L. Strow UMBC Introduction AIRS L1C Frequency Calibration Raw Data Model Fit M3

More Regression Thomas J. Leeper Department of Political Science and Government Aarhus

Exploring sample attributes Transformations and model summaries R.W. Oldford Example: Cosmetic

NSI Sensitivities: Octant-NSI degeneracy N. R. Khan Chowdhury, Tarak Thakore 1 May 9, 2019 | N.

Linear and logisitic regression models Sren Hjsgaard Department of Mathematical Sciences

Regression Diagnostics Introduction to Regression 1 Why do we need - PowerPoint PPT Presentation

Regression Diagnostics Introduction to Regression 1 Why do we need to do all this? Theory based on assumptions Focuses on the residuals and fitted values Validate the model Gives us clues how to change the model Is it

Regression Diagnostics and the Forward Search 1 A. C. Atkinson, London School of Economics

Regression Diagnostics and Troubleshooting Jeffrey Arnold May 3, 2016 Question How do

Application of Local Influence Diagnostics to the Buckley-James Model Nazrina Aziz 1 and Dong Q

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Innatoss Innovative diagnostics Expert in intracellular infectious diseases Diagnostics for Lyme

Diagnostics Applications, Limitations and Outlook Dr. Dirk Biskup, CeGaT Companion

Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp

On ringing gravitational waves from black holes Takahiro Tanaka (Kyoto Univeristy) Hiroyuki

Planck 2015 results. XXV. Diffuse low-frequency Galactic foregrounds A&amp;A (in publication),

ASL Cal L. Strow UMBC Introduction AIRS L1C Frequency Calibration Raw Data Model Fit M3

More Regression Thomas J. Leeper Department of Political Science and Government Aarhus

Exploring sample attributes Transformations and model summaries R.W. Oldford Example: Cosmetic

NSI Sensitivities: Octant-NSI degeneracy N. R. Khan Chowdhury, Tarak Thakore 1 May 9, 2019 | N.

Linear and logisitic regression models Sren Hjsgaard Department of Mathematical Sciences

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Planck 2015 results. XXV. Diffuse low-frequency Galactic foregrounds A&A (in publication),