 
              U  6: S  L  R  L  1: I   SLR S  101 Nicole Dalzell June 11, 2015
Review: Murder Example Review: Murder Example 1 Conditions for regression Types of outliers in linear regression 2 Inference for linear regression 3 Understanding regression output from software HT for the slope CI for the slope Statistics 101 U6 - L1: Introduction to SLR Nicole Dalzell
Review: Murder Example CSI lives... Study: maths formula predict how fast urban murder rates climb “A team of mathematicians says it has come up with a formula for predicting the number of homicides in any given city using a set of urban metrics. According to the study, published in the journal PLOS ONE, all it takes is ten of these metrics measured against fluctuations in population size, to be able to predict the future of urban crime. ”We show that well-defined average scaling laws with the population size emerge when investigating the relations between population and number of homicides as well as population and urban metrics,” write the authors. Scaling laws dictate that when population size increases, so do other factors in a neat linear correlation.” http://www.wired.co.uk/news/archive/2013-08/13/predicting-murders-brazil Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 2 / 40
Review: Murder Example What are these magic metrics?? child labour statistics female versus male population size gross domestic product GDP per capita literacy in those over 15 average family income the number of sanitation facilities unemployment levels in over 16s population statistics the number of homicides Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 3 / 40
Review: Murder Example Guessing the correlation Clicker question Which of the following is the best guess for the correlation between annual murders per million and percentage living in poverty? 40 ● ● 35 ● (a) -1.52 annual murders per million 30 ● (b) -0.63 ● ● 25 ● ● ● (c) -0.12 ● 20 ● 15 ● ● ● (d) 0.84 ● ● ● 10 ● ● (e) 0.02 5 ● 14 16 18 20 22 24 26 % in poverty Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 4 / 40
Review: Murder Example Guessing the correlation Clicker question Which of the following is the best guess for the correlation between annual murders per million and percentage living in poverty? 40 ● ● 35 ● (a) -1.52 annual murders per million 30 ● (b) -0.63 ● ● 25 ● ● ● (c) -0.12 ● 20 ● 15 ● ● ● (d) 0.84 ● ● ● 10 ● ● (e) 0.02 5 ● 14 16 18 20 22 24 26 % in poverty Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 4 / 40
Review: Murder Example Guessing the correlation Clicker question Which of the following is the best guess for the correlation between annual murders per million and population size? ● 40 ● ● 35 annual murders per million (a) -0.97 30 ● (b) -0.61 ● ● ● 25 ● ● (c) -0.06 ● 20 ● 15 ● ● ● (d) 0.55 ● ● ● 10 ● ● (e) 0.97 5 ● 2e+06 4e+06 6e+06 8e+06 population Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 5 / 40
Review: Murder Example Guessing the correlation Clicker question Which of the following is the best guess for the correlation between annual murders per million and population size? ● 40 ● ● 35 annual murders per million (a) -0.97 30 ● (b) -0.61 ● ● ● 25 ● ● (c) -0.06 ● 20 ● 15 ● ● ● (d) 0.55 ● ● ● 10 ● ● (e) 0.97 5 ● 2e+06 4e+06 6e+06 8e+06 population Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 5 / 40
Review: Murder Example Spurious correlations Remember: correlation does not always imply causation! http://www.tylervigen.com/ Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 6 / 40
Review: Murder Example Murder Rates and Poverty Rates We want to explore the relationship between annual murders in a state and the poverty rate in the area. What is our response variable? Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 7 / 40
Review: Murder Example Murder Rates and Poverty Rates We want to explore the relationship between annual murders in a state and the poverty rate in the area. What is our response variable? The annual murder count What is our explanatory variable? Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 7 / 40
Review: Murder Example Murder Rates and Poverty Rates We want to explore the relationship between annual murders in a state and the poverty rate in the area. What is our response variable? The annual murder count What is our explanatory variable? The poverty rate in the area Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 7 / 40
Review: Murder Example Conditions for regression Conditions for regression Linearity → randomly scattered residuals around 0 in the residuals plot – important regardless of doing inference 40 ● ● 35 ● annual murders per million 30 Murder Residual Plot ● ● 25 ● ● ● ● ● Residuals ● 5 ● 20 ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● ● −10 ● ● ● ● 10 14 16 18 20 22 24 26 ● ● Percent Poverty 5 ● 14 16 18 20 22 24 26 % in poverty Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 8 / 40
Review: Murder Example Conditions for regression Conditions for regression Nearly normally distributed residuals → histogram or normal probability plot of residuals – important for inference Histogram of Murder Residuals 6 5 Frequency 4 3 2 1 0 −10 −5 0 5 10 Residuals Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 9 / 40
Review: Murder Example Conditions for regression Conditions for regression Constant variability of residuals ( homoscedasticity ) → no fan shape in the residuals plot – important for inference Murder Residual Plot ● ● Residuals ● 5 ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● −10 ● 14 16 18 20 22 24 26 Percent Poverty Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 10 / 40
Review: Murder Example Conditions for regression Linear Regression: Least Squares Line Population data: ˆ y = β 0 + β 1 x Sample data: ˆ y = b 0 + b 1 x 40 ● ● 35 ● annual murders per million 30 ● ● ● 25 ● ● ● ● 20 ● 15 ● ● ● ● ● ● 10 ● ● ● 5 14 16 18 20 22 24 26 % in poverty Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 11 / 40
Review: Murder Example Conditions for regression Linear Regression: Least Squares Line Residuals are the leftovers from the model fit, and calculated as the difference between the observed and predicted y : e i = y i − ˆ y i The least squares line minimizes squared residuals: Population data: ˆ y = β 0 + β 1 x Sample data: ˆ y = b 0 + b 1 x 40 ● ● ● 35 annual murders per million 30 ● ● ● ● 25 ● ● ● 20 ● ● 15 ● ● ● ● ● 10 ● ● ● 5 14 16 18 20 22 24 26 % in poverty Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 12 / 40
Review: Murder Example Conditions for regression Clicker question What is the interpretation of the slope? � murders = − 29 . 91 + 2 . 56 poverty (a) Each additional percentage in those living in poverty increases number of annual murders per million by 2.56. (b) For each percentage increase in those living in poverty, the number of annual murders per million is expected to be higher by 2.56 on average. (c) For each percentage increase in those living in poverty, the number of annual murders per million is expected to be lower by 29.91 on average. (d) For each percentage increase annual murders per million, the percentage of those living in poverty is expected to be higher by 2.56 on average. Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 13 / 40
Review: Murder Example Conditions for regression Clicker question What is the interpretation of the slope? � murders = − 29 . 91 + 2 . 56 poverty (a) Each additional percentage in those living in poverty increases number of annual murders per million by 2.56. (b) For each percentage increase in those living in poverty, the number of annual murders per million is expected to be higher by 2.56 on average. (c) For each percentage increase in those living in poverty, the number of annual murders per million is expected to be lower by 29.91 on average. (d) For each percentage increase annual murders per million, the percentage of those living in poverty is expected to be higher by 2.56 on average. Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 13 / 40
Recommend
More recommend