Assumptions EDUC 7610 Chapter 16 and 17 Diagnostics (Detecting - PDF document

Assumptions…

EDUC 7610 Chapter 16 and 17 Diagnostics (Detecting Irregularities) and miscellaneous stuff Tyson S. Barrett, PhD

This is one of the most important chapters The regression results’ validity depend on whether the assumptions of the model hold or not Assumptions of the model: HOLDS? 1. Linear relationship 2. Homoscedasticity of residuals 3. Normally-distributed residuals with mean 0 4. No omitted variables VIOLATED? 5. Independence of residuals 6. Variance of X > 0

This is one of the most important chapters The regression results validity depend on whether the assumptions of the model hold or not Violations usually occur because of Extreme Cases High Leverage High Influence High Distance

Leverage The atypicalness of a case’s pattern of values on the regressors in the model A point with high leverage could be: • A 55-year-old pregnant female In a general population, is being 55 strange by itself? What about pregnant? • A high-income individual receiving welfare assistance In a general population, is having a high-income strange by itself? What about being on welfare assistance? We must consider the combination of the variables to know if it has high leverage Measured with ℎ ! “case i ’s hat value ”

Distance ! value deviates from " How far case i ’s 𝑍 𝑍 ! 2 Often measured with ! − % residual: 𝑓 ! = 𝑍 𝑍 ! 0 y But an outlier pulls the # 𝑍 ! so we can adjust it using ℎ ! 𝑓 ! (1 − ℎ ! ) − 2 Turns out with mathemagic, we can see that ℎ ! is equal to proportion by which case I lowers its own residual by pulling the regression surface − 2 − 1 0 1 2 x

Influence The extent to which its inclusion changes the regression solution or some aspect of it Which extreme point (A, B, or C) changes the solution the most?

Influence The extent to which its inclusion changes the regression solution or some aspect of it Which extreme point (A, B, or C) total influence changes the solution the most? vs. partial influence Measured with Cook’s Distance the change in the value of case j ’s residual when case i is deleted from the model % & ∑ "#$ 𝑒 !" 𝐷𝑝𝑝𝑙 ! = 𝑙 × 𝑁𝑇 '()!*+,- number of regressors The error variance from the model

Approaching Diagnostics Diagnostic statistics are estimates of different features of the observations Anything look weird in this data? Any extreme values? Eye-balling a data set (especially larger ones) is not very productive

Approaching Diagnostics Diagnostic statistics are estimates of different features of the observations regular residual Instead, let’s use the diagnostics Which has a high distance? residual with this case removed from 𝑍 !

Approaching Diagnostics Diagnostic statistics are estimates of different features of the observations Measures leverage Which has a high leverage? Measures leverage and ranges from 1/N to 1

Approaching Diagnostics Diagnostic statistics are estimates of different features of the observations Measures influence Which has a high influence?

Assumptions The book provides four basic assumptions of regression, we make explicit two implicit ones below Assumptions of the model: 1. Linear relationship 2. Homoscedasticity of residuals 3. Normally-distributed residuals with mean 0 4. No omitted variables 5. Independence of residuals 6. Variance of X > 0 Note: there are a lot of “tests” for these assumptions but they usually have their own assumptions so we won’t discuss them here

Assumptions The book provides four basic assumptions of regression, we make explicit two implicit ones below Assumptions of the model: 1. Linear relationship 2. Homoscedasticity of residuals Omitted variables is largely theoretically 3. Normally-distributed residuals with mean 0 based Can show up as weird residuals (maybe 4. No omitted variables we are missing an effect) We’ll see this in the next slide 5. Independence of residuals 6. Variance of X > 0 This is easy to test: are there at least two values in X?

Assumptions The book provides four basic assumptions of regression, we make explicit two implicit ones below Assumptions of the model: 1. Linear relationship Residuals were not good until we added a quadratic effect 2. Homoscedasticity of residuals (omitted variable bias) 3. Normally-distributed residuals with mean 0 4. No omitted variables 5. Independence of residuals 6. Variance of X > 0 Use a scatterplot of t-residuals against X

Assumptions The book provides four basic assumptions of regression, we make explicit two implicit ones below Assumptions of the model: 1. Linear relationship 2. Homoscedasticity of residuals 3. Normally-distributed residuals with mean 0 4. No omitted variables 5. Independence of residuals 6. Variance of X > 0 Scatterplots of residuals on X or Y on X

Assumptions The book provides four basic assumptions of regression, we make explicit two implicit ones below Assumptions of the model: This one is somewhat tricky but important: 1. Linear relationship The residuals at each point of x are • normally distributed is the 2. Homoscedasticity of residuals assumption • Are more points closer to the line 3. Normally-distributed residuals with mean 0 than far away? 4. No omitted variables 5. Independence of residuals 2 6. Variance of X > 0 0 y − 2 − 3 − 2 − 1 0 1 2 x

Assumptions The book provides four basic assumptions of Normal Q − Q regression, we make explicit two implicit ones below 60 2 Assumptions of the model: This one is somewhat tricky but important: Standardized residuals 1. Linear relationship The residuals at each point of x are • 1 normally distributed is the 2. Homoscedasticity of residuals assumption 0 • Are more points closer to the line 3. Normally-distributed residuals with mean 0 than far away? 4. No omitted variables − 1 5. Independence of residuals 2 − 2 6. Variance of X > 0 81 34 0 y − 2 − 1 0 1 2 Usually tested with a Q-Q plot − 2 Theoretical Quantiles lm(y ~ x) − 3 − 2 − 1 0 1 2 x

Assumptions The book provides four basic assumptions of regression, we make explicit two implicit ones below Assumptions of the model: 1. Linear relationship 2. Homoscedasticity of residuals Generally theoretical in nature 3. Normally-distributed residuals with mean 0 • Are the observations (participants) independent? 4. No omitted variables • Are the observations connected in 5. Independence of residuals some way? Time-series almost always violate • 6. Variance of X > 0 this assumption because previous time points are correlated with current ones • In many cases, we can use Multilevel Modeling here

Dealing with Irregularities The book provides four basic ways to deal with irregularities, we add a fifth Correction Robustification Use an alternative approach that is less Correct the error that lead to the extreme value sensitive to the extreme value Transformation Generalized Linear Models Transform the outcome or predictors using a monotonic transformation (log, square root, etc.) A family of approaches that can assess categorical, ordinal, and otherwise strange Elimination outcomes Remove the extreme value • Chapter 18 is about these and we’ll discuss • I recommend if you do this to report the results them much more then both with and without the extreme value in any publication

Robustification Use an alternative approach that is less sensitive to the extreme value Two main ways of robustifying 1 Use alternative way of estimating coefficients 2 Use alternative way of estimating the standard error (or, more generally speaking, the uncertainty) We’ll talk about this one

Robustification Use an alternative approach that is less sensitive to the extreme value Heteroscedasticity- Bootstrapping Permutation Consistent Standard Errors Adjusts the SE’s to be less Resamples from the sample with Randomly shuffles the data and sensitive to extreme values using replacement to come up with an re-runs the model many times sandwich estimators empirical distribution of the estimate The proportion of these shuffled models that are as big or bigger They are consistent (gets closer Can obtain SE’s, CI’s, and p-values than the original model gives us and closer to the right value as info on p-values sample size increases) Works with any statistic Works with any statistic Many versions, HC3 and HC4 are best

Robustification Use an alternative approach that is less sensitive to the extreme value Heteroscedasticity- Bootstrapping Permutation Consistent Standard Errors Adjusts the SE’s to be less Resamples from the sample with Randomly shuffles the data and sensitive to extreme values using replacement to come up with an re-runs the model Can use whether or not sandwich estimators empirical distribution of the estimate The proportion of these shuffled models that are as big or bigger They are consistent (gets closer homoscedasticity exists Can obtain SE’s, CI’s, and p-values than the original model gives us and closer to the right value as info on p-values sample size increases) Works with any statistic Works with any statistic Many versions, HC3 and HC4 are best

Remember: No model is “correct” but some models are useful

Some Miscellaneous Stuff Measurement Specification Power Error Error Non-interval Missing Data Outcomes

Assumptions EDUC 7610 Chapter 16 and 17 Diagnostics (Detecting - PDF document

Assumptions EDUC 7610 Chapter 16 and 17 Diagnostics (Detecting Irregularities) and miscellaneous stuff Tyson S. Barrett, PhD This is one of the most important chapters The regression results validity depend on whether the assumptions

2011 SERTP Input 2011 SERTP Input Assumptions Assumptions 1 2011 Load Forecast 2011 Load

Outline 1. What are philosophical assumptions and Philosophical Paradigms in why should we give

Unrealistic Assumptions Philosophy of Economics University of Virginia Matthias Brinkmann

Invertivore MeHg exposure and sensitivity: Past assumptions, current findings Past assumptions,

& TIME ABSTRACTIONS sndag 21 april 13 TIME ASSUMPTIONS Previously: asynchronous systems.

1 I. Major Budget Assumptions General Assumptions 1. The 2019-20 General Fund Unrestricted

CASE MANAGEMENT OUTSOURCING MH/ ID/ EI ASSUMPTIONS See detailed assumptions in separate

PRELIMINARY BUDGET INFORMATION & ASSUMPTIONS JANUARY 15, 2019 INFORMATION & ASSUMPTIONS

Land Use Land Use Land Use Land Use Assumptions Plan Assumptions Plan 2011- 2020 2011- 2020

Southw est Carpenters Southw est Carpenters Pension Plan Pension Plan Assumptions Assumptions

Finiteness Assumptions in Game Theory E.g. Results Finiteness assumptions baked into

Conclusions We have surveyed surface reconstruction algorithms from the perspective of

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Questioning Kerberos Assumptions Sam Hartman IETF 63 Questioning Kerberos Assumptions

Model Assumptions in Model Assumptions in ANOVA ANOVA

Fitting Linear Models Requires assumptions about i s. Usual assumptions: 1. 1 , . . . , n

Calorimetry for High Rate Intensity Frontier Experiments Ren-Yuan Zhu California Institute of

Modulated plane wave methods for Helmholtz problems in heterogeneous media Timo Betcke

Alex Suciu Northeastern University Joint work with Stefan Papadima (IMAR) arxiv:1511.08948

Aid effectiveness: have we learnt anything? Sam Jones University of Copenhagen September 2015 1

CS 559: Machine Learning Fundamentals and Applications 8 th Set of Notes Instructor: Philippos

Regression models P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON Conor De w

Panel Data Analysis Part III Modern Moment Estimation James J. Heckman University of Chicago

Multiple Regression Sample Formulas Least Squares . . . James H. Steiger Bias of the Sample R 2

Assumptions EDUC 7610 Chapter 16 and 17 Diagnostics (Detecting - PDF document

Assumptions EDUC 7610 Chapter 16 and 17 Diagnostics (Detecting Irregularities) and miscellaneous stuff Tyson S. Barrett, PhD This is one of the most important chapters The regression results validity depend on whether the assumptions

2011 SERTP Input 2011 SERTP Input Assumptions Assumptions 1 2011 Load Forecast 2011 Load

Outline 1. What are philosophical assumptions and Philosophical Paradigms in why should we give

Unrealistic Assumptions Philosophy of Economics University of Virginia Matthias Brinkmann

Invertivore MeHg exposure and sensitivity: Past assumptions, current findings Past assumptions,

&amp; TIME ABSTRACTIONS sndag 21 april 13 TIME ASSUMPTIONS Previously: asynchronous systems.

1 I. Major Budget Assumptions General Assumptions 1. The 2019-20 General Fund Unrestricted

CASE MANAGEMENT OUTSOURCING MH/ ID/ EI ASSUMPTIONS See detailed assumptions in separate

PRELIMINARY BUDGET INFORMATION &amp; ASSUMPTIONS JANUARY 15, 2019 INFORMATION &amp; ASSUMPTIONS

Land Use Land Use Land Use Land Use Assumptions Plan Assumptions Plan 2011- 2020 2011- 2020

Southw est Carpenters Southw est Carpenters Pension Plan Pension Plan Assumptions Assumptions

Finiteness Assumptions in Game Theory E.g. Results Finiteness assumptions baked into

Conclusions We have surveyed surface reconstruction algorithms from the perspective of

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Questioning Kerberos Assumptions Sam Hartman IETF 63 Questioning Kerberos Assumptions

Model Assumptions in Model Assumptions in ANOVA ANOVA

Fitting Linear Models Requires assumptions about i s. Usual assumptions: 1. 1 , . . . , n

Calorimetry for High Rate Intensity Frontier Experiments Ren-Yuan Zhu California Institute of

Modulated plane wave methods for Helmholtz problems in heterogeneous media Timo Betcke

Alex Suciu Northeastern University Joint work with Stefan Papadima (IMAR) arxiv:1511.08948

Aid effectiveness: have we learnt anything? Sam Jones University of Copenhagen September 2015 1

CS 559: Machine Learning Fundamentals and Applications 8 th Set of Notes Instructor: Philippos

Regression models P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON Conor De w

Panel Data Analysis Part III Modern Moment Estimation James J. Heckman University of Chicago

Multiple Regression Sample Formulas Least Squares . . . James H. Steiger Bias of the Sample R 2

& TIME ABSTRACTIONS sndag 21 april 13 TIME ASSUMPTIONS Previously: asynchronous systems.

PRELIMINARY BUDGET INFORMATION & ASSUMPTIONS JANUARY 15, 2019 INFORMATION & ASSUMPTIONS