M7S3 - Regression Thoughts
Professor Jarad Niemi
STAT 226 - Iowa State University
November 27, 2018
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 1 / 21
M7S3 - Regression Thoughts Professor Jarad Niemi STAT 226 - Iowa - - PowerPoint PPT Presentation
M7S3 - Regression Thoughts Professor Jarad Niemi STAT 226 - Iowa State University November 27, 2018 Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 1 / 21 Outline Regression thoughts Properties Coefficient
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 1 / 21
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 2 / 21
Simple linear regression Review
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 3 / 21
Simple linear regression Review
−30 −20 −10 1 2 3 4 5 6 7 8 9 10
x y Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 4 / 21
Properties Coefficient of determination
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 5 / 21
Properties Coefficient of determination
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 6 / 21
Properties Coefficient of determination
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 7 / 21
Properties Not reversible
regress(y,x) (Intercept) x
[1] -0.7355215 [1] -0.3862206 regress(x,y) (Intercept) x 0.4915144
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 8 / 21
Properties Always through (x, y)
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 9 / 21
Properties Always through (x, y)
−30 −20 −10 1 2 3 4 5 6 7 8 9 10
x y Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 10 / 21
Properties Residuals sum to zero
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 11 / 21
Properties Residual plots
−4 4 8 0.0 2.5 5.0 7.5 10.0
x residual Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 12 / 21
Properties Residual plots
−4 4 8 −20 −10
predicted residual Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 13 / 21
Properties Leverage and influence
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 14 / 21
Properties Leverage and influence
high low high low 5 10 15 5 10 15 −40 −30 −20 −10 −40 −30 −20 −10
x y Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 15 / 21
Cautions Correlation does not imply causation
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 16 / 21
Cautions Correlation does not imply causation
From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5402407/: My attention was drawn to the recent article by Song at al. entitled “How jet lag impairs Major League Baseball performance” (1), not only by its slightly unusual subject but more importantly because I wondered how one could ever actually prove the effect of jet lag on baseball performance. ...Although I do not dispute the large amount of work involved and would be well-nigh incapable of judging the validity of the analyses performed, I must admit that I was taken aback by the way Song et al. (1) systematically present the correlations they identify as direct proof of causality between jet lag and the affected variables. It is actually quite remarkable to me that the word “correlation” does not appear even once in the paper, when this is actually what the authors have been looking at and, in my opinion, to be scientifically accurate, the title of the article should really read: “How jet lag correlates with impairments in Major League Baseball performance.” ...this tendency to amalgamate correlation with causality is apparently extremely frequent in this field of investi-
by the press and to attract the attention of many people, both scientists and nonscientists. Considering the current tendency to misinterpret scientific data, via the misuse of statistics in particular, I feel that a journal such as PNAS should aim to educate by example, and thus ought to enforce more rigor in the presentation of scientific articles regarding the difference between correlations and proven causality. Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 17 / 21
Cautions Lurking variables
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 ideal height j 60 65 70 75 80
Linear Fit
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 18 / 21
Cautions Lurking variables
62 64 66 68 70 72 74 76 ideal height 60 65 70 75 80
Linear Fit Females Linear Fit Males
Linear Fit Females predicted ideal height = 35.798818 + 0.5469203 own height Linear Fit Males predicted ideal height = 34.971329 + 0.4484906 own height
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 19 / 21
Cautions Correlations based on average data
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 20 / 21
Cautions Extrapolation
Professor Jarad Niemi (STAT226@ISU) M7S3 - Regression Thoughts November 27, 2018 21 / 21