Regression Pitfalls Pitfall Noun: A hidden or unsuspected danger - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Regression Pitfalls Pitfall Noun: A hidden or unsuspected danger or difficulty. A covered pit used as a trap. Multiple regression is a widely used and powerful tool. It is also one of the most abused statistical techniques. 1 / 17 Some Regression Pitfalls Introduction

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Observational versus Experimental Data Recall: In some investigations, the independent variables x 1 , x 2 , . . . , x k can be controlled ; that is, held at desired values. The resulting data are called experimental . In other cases, the independent variables cannot be controlled, and their values are simply observed. The resulting data are called observational . 2 / 17 Some Regression Pitfalls Observational vs Experimental Data

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Observational example “Cocaine Use During Pregnancy Linked To Development Problems” Two groups of new mothers, 218 used cocaine during pregnancy, 197 did not. IQ tests of infants at age 2 showed lower scores for children of users. “Correlation does not imply causation.” 3 / 17 Some Regression Pitfalls Observational vs Experimental Data

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The study does not show that cocaine use causes development problems. It does show association , which might be used in prediction. For instance, it could help identify children at high risk of having development problems. 4 / 17 Some Regression Pitfalls Observational vs Experimental Data

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Experimental example Animal-assisted therapy. 76 heart patients randomly assigned to three therapies: T: visit from a volunteer and a trained dog; V: visit from a volunteer only; C: no visit. Response y is decrease in anxiety. 5 / 17 Some Regression Pitfalls Observational vs Experimental Data

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Result: ¯ y T = 10 . 5 , y V = 3 . 9 , ¯ y C = 1 . 4 . ¯ Model: E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 , where x 1 is the indicator variable for group T and x 2 is the indicator variable for group V. The model-utility F -test shows significant differences among groups. Because of random assignment, the differences can be assumed to be caused by the treatments. 6 / 17 Some Regression Pitfalls Observational vs Experimental Data

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Parameter Estimability Recall The normal equations X ′ X ˆ β = X ′ y that define least squares parameter estimates always have a solution. But if X ′ X is singular, they have many solutions. An individual parameter that is not uniquely estimated is called nonestimable . 7 / 17 Some Regression Pitfalls Parameter Estimability

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example The animal-assisted therapy data. Suppose we tried to fit the model E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 , where x 3 is the third indicator variable, for group C. One solution is ˆ β 0 = 0, ˆ y T = 10 . 5, ˆ β 1 = ¯ β 2 = ¯ y V = 3 . 9, ˆ β 3 = ¯ y C = 1 . 4. The more usual solution is ˆ y C = 1 . 4, ˆ β 0 = ¯ β 1 = ¯ y T − ¯ y C = 9 . 1, ˆ y C = 2 . 5, ˆ β 2 = ¯ y V − ¯ β 3 = 0. 8 / 17 Some Regression Pitfalls Parameter Estimability

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II All estimates change, so no parameter is estimable. The conventional solution is to leave out one variable, or equivalently to constrain one parameter to be zero. Another possibility is to constrain β 1 + β 2 + β 3 = 0 , which is appealing in its symmetry, but rarely used in practice. In more complex cases, estimability may be harder to understand. 9 / 17 Some Regression Pitfalls Parameter Estimability

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Multicollinearity Two independent variables are orthogonal if their sample correlation coefficient is zero. If all pairs of independent variables are orthogonal, X ′ X is diagonal, and the normal equations are trivial to solve. In a controlled experiment, the variables are often orthogonal by design. If some pairs are far from orthogonal, the equations may be nearly singular. 10 / 17 Some Regression Pitfalls Multicollinearity

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II If X ′ X is nearly singular, its inverse ( X ′ X ) − 1 exists but will have large entries. So the least squares estimates β = ( X ′ X ) − 1 X ′ y ˆ are very sensitive to small changes in y . That makes their standard errors large. 11 / 17 Some Regression Pitfalls Multicollinearity

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example Carbon monoxide from cigarettes cigar <- read.table("Text/Exercises&Examples/FTCCIGAR.txt", header = TRUE) pairs(cigar) cor(cigar) summary(lm(CO ~ TAR, cigar)) summary(lm(CO ~ TAR + NICOTINE + WEIGHT, cigar)) The standard error of ˆ β TAR increases nearly five-fold when NICOTINE is added to the model. Note the negative coefficients for NICOTINE and WEIGHT. But both are positively correlated with CO. 12 / 17 Some Regression Pitfalls Multicollinearity

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Multicollinearity is sometimes measured using the Variance Inflation Factor (VIF). For variable x i , the VIF is 1 VIF i = ≥ 1 , 1 − R 2 i where R 2 i is the coefficient of determination in the regression of x i on the other independent variables { x j , j � = i } . VIF i is related to the increase in the standard error of ˆ β i when the other variables are included. VIF i = 1 if x i is orthogonal to the other independent variables. 13 / 17 Some Regression Pitfalls Multicollinearity

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Extrapolation A regression model is an approximation to the complexities of the real world. It may fit the sample data well. If it fits well, it will usually give a reliable prediction for a new context that is similar to those in the sample data. With several variables, deciding when the new context is too different for reliable prediction may be difficult, especially in the presence of multicollinearity. 14 / 17 Some Regression Pitfalls Extrapolation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Transformation In many problems, one or more of the variables (dependent and independent) may be measured and recorded in a form that is not the best from a modeling perspective. Linear transformations are usually pointless, as a linear model is essentially unchanged by it. Among nonlinear transformations, logarithms are most widely useful, followed by powers of the variables. 15 / 17 Some Regression Pitfalls Variable Transformation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The primary goal of transformation is to find a good approximation to the way E ( Y ) depends on x . Another goal is to make the variance of the random error ǫ = Y − E ( Y ) reasonably constant. Finally, if a transformation makes ǫ approximately normally distributed, that is worth achieving. 16 / 17 Some Regression Pitfalls Variable Transformation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example 7.8 Impact of price of coffee on demand: coffee <- read.table("Text/Exercises&Examples/COFFEE.txt", header = TRUE) with(coffee, plot(PRICE, DEMAND)) Example 7.8 models Y ( DEMAND ) against p − 1 , where p = PRICE . We could also consider log( Y ) and log( p ), as well as other powers of p . 17 / 17 Some Regression Pitfalls Variable Transformation

Regression Pitfalls Pitfall Noun: A hidden or unsuspected danger - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Regression Pitfalls Pitfall Noun: A hidden or unsuspected danger or difficulty. A covered pit used as a trap. Multiple regression is a widely

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Pitfalls in Using a case based approach, we will Arrhythmias review pitfalls in management of:

Knowledge Engineering Pitfalls Knowledge Engineering Pitfalls Which one is better to represent

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

110 Rules for Prometheus Brian Brazil Founder Rule 110 110 Rules for Prometheus Brian Brazil

Opening remarks Strucure of presentation I believe that the exercise of public consultation,

NO Midterm: Whats the point? Midterm: Whats the point? Sadism? Preparation, study,

Programming Languages Course Motivation (or, why we are spending so much time on a language that

Offshore Structures Oil Rigs Offshore Platforms Stationed Platform FPSO Wind Turbines Over

Ludus project: Securing your router with GT Kalin Ivanov & Ondej Luk

1 Limits of a LAN One shared LAN can limit us in terms of: Distance Max Ethernet

Applications: A Selective Text-Based vs. Multimedia Retransmission Protocol Text for

Regression Pitfalls Pitfall Noun: A hidden or unsuspected danger - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Regression Pitfalls Pitfall Noun: A hidden or unsuspected danger or difficulty. A covered pit used as a trap. Multiple regression is a widely

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Pitfalls in Using a case based approach, we will Arrhythmias review pitfalls in management of:

Knowledge Engineering Pitfalls Knowledge Engineering Pitfalls Which one is better to represent

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

110 Rules for Prometheus Brian Brazil Founder Rule 110 110 Rules for Prometheus Brian Brazil

Opening remarks Strucure of presentation I believe that the exercise of public consultation,

NO Midterm: Whats the point? Midterm: Whats the point? Sadism? Preparation, study,

Programming Languages Course Motivation (or, why we are spending so much time on a language that

Offshore Structures Oil Rigs Offshore Platforms Stationed Platform FPSO Wind Turbines Over

Ludus project: Securing your router with GT Kalin Ivanov &amp; Ondej Luk

1 Limits of a LAN One shared LAN can limit us in terms of: Distance Max Ethernet

Applications: A Selective Text-Based vs. Multimedia Retransmission Protocol Text for

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Ludus project: Securing your router with GT Kalin Ivanov & Ondej Luk