U 6: S L R L - - PowerPoint PPT Presentation
U 6: S L R L - - PowerPoint PPT Presentation
U 6: S L R L 1: I SLR S 101 Nicole Dalzell June 11, 2015 Review: Murder Example Review:
Review: Murder Example
1
Review: Murder Example Conditions for regression
2
Types of outliers in linear regression
3
Inference for linear regression Understanding regression output from software HT for the slope CI for the slope
Statistics 101 U6 - L1: Introduction to SLR Nicole Dalzell
Review: Murder Example
CSI lives...
Study: maths formula predict how fast urban murder rates climb “A team of mathematicians says it has come up with a formula for predicting the number of homicides in any given city using a set of urban metrics. According to the study, published in the journal PLOS ONE, all it takes is ten of these metrics measured against fluctuations in population size, to be able to predict the future of urban crime. ”We show that well-defined average scaling laws with the population size emerge when investigating the relations between population and number of homicides as well as population and urban metrics,” write the authors. Scaling laws dictate that when population size increases, so do other factors in a neat linear correlation.”
http://www.wired.co.uk/news/archive/2013-08/13/predicting-murders-brazil Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 2 / 40
Review: Murder Example
What are these magic metrics??
child labour statistics female versus male population size gross domestic product GDP per capita literacy in those over 15 average family income the number of sanitation facilities unemployment levels in over 16s population statistics the number of homicides
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 3 / 40
Review: Murder Example
Guessing the correlation
Clicker question Which of the following is the best guess for the correlation between annual murders per million and percentage living in poverty? (a) -1.52 (b) -0.63 (c) -0.12 (d) 0.84 (e) 0.02
- 14
16 18 20 22 24 26 5 10 15 20 25 30 35 40 % in poverty annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 4 / 40
Review: Murder Example
Guessing the correlation
Clicker question Which of the following is the best guess for the correlation between annual murders per million and percentage living in poverty? (a) -1.52 (b) -0.63 (c) -0.12 (d) 0.84 (e) 0.02
- 14
16 18 20 22 24 26 5 10 15 20 25 30 35 40 % in poverty annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 4 / 40
Review: Murder Example
Guessing the correlation
Clicker question Which of the following is the best guess for the correlation between annual murders per million and population size? (a) -0.97 (b) -0.61 (c) -0.06 (d) 0.55 (e) 0.97
- 2e+06
4e+06 6e+06 8e+06 5 10 15 20 25 30 35 40 population annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 5 / 40
Review: Murder Example
Guessing the correlation
Clicker question Which of the following is the best guess for the correlation between annual murders per million and population size? (a) -0.97 (b) -0.61 (c) -0.06 (d) 0.55 (e) 0.97
- 2e+06
4e+06 6e+06 8e+06 5 10 15 20 25 30 35 40 population annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 5 / 40
Review: Murder Example
Spurious correlations
Remember: correlation does not always imply causation! http://www.tylervigen.com/
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 6 / 40
Review: Murder Example
Murder Rates and Poverty Rates
We want to explore the relationship between annual murders in a state and the poverty rate in the area. What is our response variable?
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 7 / 40
Review: Murder Example
Murder Rates and Poverty Rates
We want to explore the relationship between annual murders in a state and the poverty rate in the area. What is our response variable? The annual murder count What is our explanatory variable?
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 7 / 40
Review: Murder Example
Murder Rates and Poverty Rates
We want to explore the relationship between annual murders in a state and the poverty rate in the area. What is our response variable? The annual murder count What is our explanatory variable? The poverty rate in the area
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 7 / 40
Review: Murder Example Conditions for regression
Conditions for regression
Linearity → randomly scattered residuals around 0 in the residuals plot – important regardless of doing inference
- 14
16 18 20 22 24 26 5 10 15 20 25 30 35 40 % in poverty annual murders per million
- 14
16 18 20 22 24 26 −10 5
Murder Residual Plot
Percent Poverty Residuals
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 8 / 40
Review: Murder Example Conditions for regression
Conditions for regression
Nearly normally distributed residuals → histogram or normal probability plot of residuals – important for inference
Histogram of Murder Residuals
Residuals Frequency −10 −5 5 10 1 2 3 4 5 6 Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 9 / 40
Review: Murder Example Conditions for regression
Conditions for regression
Constant variability of residuals (homoscedasticity) → no fan shape in the residuals plot – important for inference
- 14
16 18 20 22 24 26 −10 5
Murder Residual Plot
Percent Poverty Residuals Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 10 / 40
Review: Murder Example Conditions for regression
Linear Regression: Least Squares Line
Population data: ˆ
y = β0 + β1x
Sample data: ˆ
y = b0 + b1x
- 14
16 18 20 22 24 26 5 10 15 20 25 30 35 40 % in poverty annual murders per million Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 11 / 40
Review: Murder Example Conditions for regression
Linear Regression: Least Squares Line
Residuals are the leftovers from the model fit, and calculated as the difference between the observed and predicted y: ei = yi − ˆ
yi
The least squares line minimizes squared residuals: Population data: ˆ
y = β0 + β1x
Sample data: ˆ
y = b0 + b1x
- 14
16 18 20 22 24 26 5 10 15 20 25 30 35 40 % in poverty annual murders per million Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 12 / 40
Review: Murder Example Conditions for regression
Clicker question What is the interpretation of the slope?
- murders = −29.91 + 2.56 poverty
(a) Each additional percentage in those living in poverty increases number of annual murders per million by 2.56. (b) For each percentage increase in those living in poverty, the number of annual murders per million is expected to be higher by 2.56 on average. (c) For each percentage increase in those living in poverty, the number of annual murders per million is expected to be lower by 29.91 on average. (d) For each percentage increase annual murders per million, the percentage of those living in poverty is expected to be higher by 2.56 on average.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 13 / 40
Review: Murder Example Conditions for regression
Clicker question What is the interpretation of the slope?
- murders = −29.91 + 2.56 poverty
(a) Each additional percentage in those living in poverty increases number of annual murders per million by 2.56. (b) For each percentage increase in those living in poverty, the number of annual murders per million is expected to be higher by 2.56 on average. (c) For each percentage increase in those living in poverty, the number of annual murders per million is expected to be lower by 29.91 on average. (d) For each percentage increase annual murders per million, the percentage of those living in poverty is expected to be higher by 2.56 on average.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 13 / 40
Review: Murder Example Conditions for regression
Clicker question Suppose you want to predict annual murder count (per million) for a series of districts that were not included in the dataset. For which of the following districts would you be uncomfortable with your prediction? A district where % in poverty = (a) 5% (b) 15% (c) 20% (d) 26% (e) 40%
- 14
16 18 20 22 24 26 5 10 15 20 25 30 35 40 % in poverty annual murders per million Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 14 / 40
Review: Murder Example Conditions for regression
Clicker question Suppose you want to predict annual murder count (per million) for a series of districts that were not included in the dataset. For which of the following districts would you be uncomfortable with your prediction? A district where % in poverty = (a) 5% (b) 15% (c) 20% (d) 26% (e) 40%
- 14
16 18 20 22 24 26 5 10 15 20 25 30 35 40 % in poverty annual murders per million Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 14 / 40
Review: Murder Example Conditions for regression
R2 assesses model fit – higher the better
R2: percentage of variability in y explained by the model.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 15 / 40
Review: Murder Example Conditions for regression
R2 assesses model fit – higher the better
R2: percentage of variability in y explained by the model.
For single predictor regression: R2 is the square of the correlation coefficient, R.
cor(murder$annual_murders_per_mil, murder$perc_pov)ˆ2 [1] 0.7052275 Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 15 / 40
Review: Murder Example Conditions for regression
R2 assesses model fit – higher the better
R2: percentage of variability in y explained by the model.
For single predictor regression: R2 is the square of the correlation coefficient, R.
cor(murder$annual_murders_per_mil, murder$perc_pov)ˆ2 [1] 0.7052275
For all regression: R2 = SSreg
SStot
m1 = lm(annual_murders_per_mil ˜ perc_pov, data = murder) Analysis of Variance Table Response: annual_murders_per_mil Df Sum Sq Mean Sq F value Pr(>F) perc_pov 1 1308.34 1308.34 43.064 3.638e-06 *** Residuals 18 546.86 30.38 Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 15 / 40
Review: Murder Example Conditions for regression
R2 assesses model fit – higher the better
R2: percentage of variability in y explained by the model.
For single predictor regression: R2 is the square of the correlation coefficient, R.
cor(murder$annual_murders_per_mil, murder$perc_pov)ˆ2 [1] 0.7052275
For all regression: R2 = SSreg
SStot
m1 = lm(annual_murders_per_mil ˜ perc_pov, data = murder) Analysis of Variance Table Response: annual_murders_per_mil Df Sum Sq Mean Sq F value Pr(>F) perc_pov 1 1308.34 1308.34 43.064 3.638e-06 *** Residuals 18 546.86 30.38
R2 = explained variabilty total variability
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 15 / 40
Review: Murder Example Conditions for regression
R2 assesses model fit – higher the better
R2: percentage of variability in y explained by the model.
For single predictor regression: R2 is the square of the correlation coefficient, R.
cor(murder$annual_murders_per_mil, murder$perc_pov)ˆ2 [1] 0.7052275
For all regression: R2 = SSreg
SStot
m1 = lm(annual_murders_per_mil ˜ perc_pov, data = murder) Analysis of Variance Table Response: annual_murders_per_mil Df Sum Sq Mean Sq F value Pr(>F) perc_pov 1 1308.34 1308.34 43.064 3.638e-06 *** Residuals 18 546.86 30.38
R2 = explained variabilty total variability = SSreg SStot
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 15 / 40
Review: Murder Example Conditions for regression
R2 assesses model fit – higher the better
R2: percentage of variability in y explained by the model.
For single predictor regression: R2 is the square of the correlation coefficient, R.
cor(murder$annual_murders_per_mil, murder$perc_pov)ˆ2 [1] 0.7052275
For all regression: R2 = SSreg
SStot
m1 = lm(annual_murders_per_mil ˜ perc_pov, data = murder) Analysis of Variance Table Response: annual_murders_per_mil Df Sum Sq Mean Sq F value Pr(>F) perc_pov 1 1308.34 1308.34 43.064 3.638e-06 *** Residuals 18 546.86 30.38
R2 = explained variabilty total variability = SSreg SStot = 1308.34 1308.34 + 546.86
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 15 / 40
Review: Murder Example Conditions for regression
R2 assesses model fit – higher the better
R2: percentage of variability in y explained by the model.
For single predictor regression: R2 is the square of the correlation coefficient, R.
cor(murder$annual_murders_per_mil, murder$perc_pov)ˆ2 [1] 0.7052275
For all regression: R2 = SSreg
SStot
m1 = lm(annual_murders_per_mil ˜ perc_pov, data = murder) Analysis of Variance Table Response: annual_murders_per_mil Df Sum Sq Mean Sq F value Pr(>F) perc_pov 1 1308.34 1308.34 43.064 3.638e-06 *** Residuals 18 546.86 30.38
R2 = explained variabilty total variability = SSreg SStot = 1308.34 1308.34 + 546.86 = 1308.34 1855.2 ≈ 0.71
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 15 / 40
Review: Murder Example Conditions for regression
Clicker question
R2 for the regression model for predicting an-
nual murders per million based on percent- age living in poverty is roughly 71%. Which
- f the following is the correct interpretation of
this value?
- 14
16 18 20 22 24 26 5 10 15 20 25 30 35 40 % in poverty annual murders per million
(a) 71% of the variability in percentage living in poverty is explained by the model. (b) 84% of the variability in the murder rates is explained by the model, i.e. percentage living in poverty. (c) 71% of the variability in the murder rates is explained by the model, i.e. percentage living in poverty. (d) 71% of the time percentage living in poverty predicts murder rates accurately.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 16 / 40
Review: Murder Example Conditions for regression
Clicker question
R2 for the regression model for predicting an-
nual murders per million based on percent- age living in poverty is roughly 71%. Which
- f the following is the correct interpretation of
this value?
- 14
16 18 20 22 24 26 5 10 15 20 25 30 35 40 % in poverty annual murders per million
(a) 71% of the variability in percentage living in poverty is explained by the model. (b) 84% of the variability in the murder rates is explained by the model, i.e. percentage living in poverty. (c) 71% of the variability in the murder rates is explained by the model, i.e. percentage living in poverty. (d) 71% of the time percentage living in poverty predicts murder rates accurately.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 16 / 40
Types of outliers in linear regression
1
Review: Murder Example Conditions for regression
2
Types of outliers in linear regression
3
Inference for linear regression Understanding regression output from software HT for the slope CI for the slope
Statistics 101 U6 - L1: Introduction to SLR Nicole Dalzell
Types of outliers in linear regression
Outliers
- 5
6 7 8 9 10 20 30 40 % unemployed annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 17 / 40
Types of outliers in linear regression
Outliers
- 5
6 7 8 9 10 20 30 40 % unemployed annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 18 / 40
Types of outliers in linear regression
Outliers
- 5
6 7 8 9 10 20 30 40 % unemployed annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 19 / 40
Types of outliers in linear regression
Outliers
Because this red point is vertically different from the rest of the data, it is called an outlier. We notice that including the outlier in the regression does not change the lines very much.
- 5
6 7 8 9 10 20 30 40 % unemployed annual murders per million Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 20 / 40
Types of outliers in linear regression
Leverage Point
- 5
6 7 8 9 10 11 12 10 20 30 40 50 % unemployed annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 21 / 40
Types of outliers in linear regression
Leverage Point
- 5
6 7 8 9 10 11 12 10 20 30 40 50 % unemployed annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 22 / 40
Types of outliers in linear regression
Leverage Points
Because this red point is horizontally different from the rest of the data, it is called a leverage point. We notice that including the point in the regression changes the line a bit, but not extremely.
b1 = 7.08, b1 = 6.485
- 5
6 7 8 9 10 11 12 10 20 30 40 50 % unemployed annual murders per million Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 23 / 40
Types of outliers in linear regression
High Leverage Point
- 3
4 5 6 7 8 9 5 10 20 30 40 % unemployed annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 24 / 40
Types of outliers in linear regression
High Leverage Points
- 3
4 5 6 7 8 9 5 10 20 30 40 % unemployed annual murders per million
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 25 / 40
Types of outliers in linear regression
High Leverage Points
Because this red point has a drastic impact on the line, it is called an influential point.
b1 = 7.08, b1 = 2.903
- 3
4 5 6 7 8 9 5 10 20 30 40 % unemployed annual murders per million Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 26 / 40
Types of outliers in linear regression
Outliers and Leverage Points
http: //mih5.github.io/statapps/linear regression/linear regression.html
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 27 / 40
Types of outliers in linear regression
Some terminology
Outliers are points that fall away from the cloud of points.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 28 / 40
Types of outliers in linear regression
Some terminology
Outliers are points that fall away from the cloud of points. Outliers that fall horizontally away from the center of the cloud are called leverage points.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 28 / 40
Types of outliers in linear regression
Some terminology
Outliers are points that fall away from the cloud of points. Outliers that fall horizontally away from the center of the cloud are called leverage points. High leverage points that actually influence the slope of the regression line are called influential points.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 28 / 40
Types of outliers in linear regression
Some terminology
Outliers are points that fall away from the cloud of points. Outliers that fall horizontally away from the center of the cloud are called leverage points. High leverage points that actually influence the slope of the regression line are called influential points. In order to determine if a point is influential, visualize the regression line with and without the point. Does the slope of the line change considerably? If so, then the point is influential. If not, then it’s not.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 28 / 40
Types of outliers in linear regression
Types of outliers
Clicker question Which of the below best de- scribes the outlier? (a) influential (b) leverage (c) none of the above (d) there are no outliers
- ●
- ●
- ●
- ●
- ●
- −20
20 40
- ●
- −2
2
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 29 / 40
Types of outliers in linear regression
Types of outliers
Clicker question Which of the below best de- scribes the outlier? (a) influential (b) leverage (c) none of the above (d) there are no outliers
- ●
- ●
- ●
- ●
- ●
- −20
20 40
- ●
- −2
2
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 29 / 40
Types of outliers in linear regression
Types of outliers
Does this outlier influence the slope of the regression line?
- −5
5 10 15
- ●
- −5
5
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 30 / 40
Types of outliers in linear regression
Types of outliers
Does this outlier influence the slope of the regression line? Not really...
- −5
5 10 15
- ●
- −5
5
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 30 / 40
Types of outliers in linear regression
Influential points
Data are available on the log of the surface temperature and the log
- f the light intensity of 47 stars in the star cluster CYG OB1.
- 3.6
3.8 4.0 4.2 4.4 4.6 4.0 4.5 5.0 5.5 6.0 log(temp) log(light intensity)
w/ outliers w/o outliers
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 31 / 40
Inference for linear regression
1
Review: Murder Example Conditions for regression
2
Types of outliers in linear regression
3
Inference for linear regression Understanding regression output from software HT for the slope CI for the slope
Statistics 101 U6 - L1: Introduction to SLR Nicole Dalzell
Inference for linear regression HT for the slope
The Slope Estimate
We recall that b1 is an estimate of the true slope β1. We could find β1 if we had all the data in the population, but since we only have a sample we have to estimate. This means that even if we compute a slope which seems interesting and relevant, it might not be.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 32 / 40
Inference for linear regression HT for the slope
Testing for the slope
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 33 / 40
Inference for linear regression HT for the slope
Testing for the slope
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
We always use a t-test in inference for regression.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 33 / 40
Inference for linear regression HT for the slope
Testing for the slope
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
We always use a t-test in inference for regression.
Remember: Test statistic, T = point estimate−null value
SE Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 33 / 40
Inference for linear regression HT for the slope
Testing for the slope
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
We always use a t-test in inference for regression.
Remember: Test statistic, T = point estimate−null value
SE
Point estimate = b1 is the observed slope.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 33 / 40
Inference for linear regression HT for the slope
Testing for the slope
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
We always use a t-test in inference for regression.
Remember: Test statistic, T = point estimate−null value
SE
Point estimate = b1 is the observed slope.
SEb1 is the standard error associated with the slope.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 33 / 40
Inference for linear regression HT for the slope
Testing for the slope
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
We always use a t-test in inference for regression.
Remember: Test statistic, T = point estimate−null value
SE
Point estimate = b1 is the observed slope.
SEb1 is the standard error associated with the slope.
Degrees of freedom associated with the slope is df = n − 2, where n is the sample size.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 33 / 40
Inference for linear regression HT for the slope
Testing for the slope
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
We always use a t-test in inference for regression.
Remember: Test statistic, T = point estimate−null value
SE
Point estimate = b1 is the observed slope.
SEb1 is the standard error associated with the slope.
Degrees of freedom associated with the slope is df = n − 2, where n is the sample size.
Remember: We lose 1 degree of freedom for each parameter we estimate, and in simple linear regression we estimate 2 parameters, β0 and β1.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 33 / 40
Inference for linear regression HT for the slope
Testing for the slope (cont.)
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 34 / 40
Inference for linear regression HT for the slope
Testing for the slope (cont.)
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
T =
2.559 − 0 0.390
= 6.52
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 34 / 40
Inference for linear regression HT for the slope
Testing for the slope (cont.)
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
T =
2.559 − 0 0.390
= 6.52 df = 20 − 2 = 18
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 34 / 40
Inference for linear regression HT for the slope
Testing for the slope (cont.)
Estimate
- Std. Error
t value Pr(>|t|) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
T =
2.559 − 0 0.390
= 6.52 df = 20 − 2 = 18 p − value = P(|T| > 6.562) < 3.64e − 06
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 34 / 40
Inference for linear regression HT for the slope
Murder data...
Do these data provide convincing evidence that there is a statistically significant relationship between % poverty and murder counts?
Estimate
- Std. Error
t value Pr(> | t |) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 35 / 40
Inference for linear regression HT for the slope
Murder data...
Do these data provide convincing evidence that there is a statistically significant relationship between % poverty and murder counts?
Estimate
- Std. Error
t value Pr(> | t |) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
Yes, the p-value for % percpov is low, indicating that the data provide convincing evidence that the slope parameter is different than 0.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 35 / 40
Inference for linear regression HT for the slope
Murder data...
Do these data provide convincing evidence that there is a statistically significant relationship between % poverty and murder counts?
Estimate
- Std. Error
t value Pr(> | t |) (Intercept)
- 29.901
7.789
- 3.839
0.0000 percpov 2.559 0.390 6.562 3.64e-06
Yes, the p-value for % percpov is low, indicating that the data provide convincing evidence that the slope parameter is different than 0. How reliable is this p-value if these areas are not randomly selected? Not very...
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 35 / 40
Inference for linear regression HT for the slope
Nature or nurture?
In 1966 Cyril Burt published a paper called “The genetic determination of differences in intelligence: A study of monozygotic twins reared apart?” The data consist of IQ scores for [an assumed random sample of] 27 identical twins, one raised by foster parents, the other by the biological parents.
- 70
80 90 100 110 120 130 70 80 90 100 110 120 130 biological IQ foster IQ R = 0.882
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 36 / 40
Inference for linear regression HT for the slope
Clicker question Which of the following is false?
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.20760 9.29990 0.990 0.332 bioIQ 0.90144 0.09633 9.358 1.2e-09 Residual standard error: 7.729 on 25 degrees of freedom Multiple R-squared: 0.7779, Adjusted R-squared: 0.769 F-statistic: 87.56 on 1 and 25 DF, p-value: 1.204e-09
(a) For each 10 point increase in the biological twin’s IQ, we would expect the foster twin’s IQ to increase on average by 9 points. (b) Roughly 78% of the foster twins’ IQs can be accurately predicted by the model. (c) The linear model is
- fosterIQ = 9.2 + 0.9 × bioIQ.
(d) Foster twins with IQs higher than average IQs have biological twins with higher than average IQs as well.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 37 / 40
Inference for linear regression HT for the slope
Clicker question Which of the following is false?
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.20760 9.29990 0.990 0.332 bioIQ 0.90144 0.09633 9.358 1.2e-09 Residual standard error: 7.729 on 25 degrees of freedom Multiple R-squared: 0.7779, Adjusted R-squared: 0.769 F-statistic: 87.56 on 1 and 25 DF, p-value: 1.204e-09
(a) For each 10 point increase in the biological twin’s IQ, we would expect the foster twin’s IQ to increase on average by 9 points. (b) Roughly 78% of the foster twins’ IQs can be accurately predicted by the model. (c) The linear model is
- fosterIQ = 9.2 + 0.9 × bioIQ.
(d) Foster twins with IQs higher than average IQs have biological twins with higher than average IQs as well.
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 37 / 40
Inference for linear regression HT for the slope
Testing for the slope
Clicker question Assuming that these 27 twins comprise a representative sample of all twins separated at birth, we would like to test if these data provide convincing evidence that the IQ of the biological twin is a significant predictor of IQ of the foster twin. What are the appropriate hypothe- ses? (a) H0 : b0 = 0; HA : b0 0 (b) H0 : β1 = 0; HA : β1 0 (c) H0 : b1 = 0; HA : b1 0 (d) H0 : β0 = 0; HA : β0 0
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 38 / 40
Inference for linear regression HT for the slope
Testing for the slope
Clicker question Assuming that these 27 twins comprise a representative sample of all twins separated at birth, we would like to test if these data provide convincing evidence that the IQ of the biological twin is a significant predictor of IQ of the foster twin. What are the appropriate hypothe- ses? (a) H0 : b0 = 0; HA : b0 0 (b) H0 : β1 = 0; HA : β1 0 (c) H0 : b1 = 0; HA : b1 0 (d) H0 : β0 = 0; HA : β0 0
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 38 / 40
Inference for linear regression HT for the slope
Testing for the slope (cont.)
Estimate
- Std. Error
t value Pr(>|t|) (Intercept) 9.2076 9.2999 0.99 0.3316 bioIQ 0.9014 0.0963 9.36 0.0000
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 39 / 40
Inference for linear regression HT for the slope
Testing for the slope (cont.)
Estimate
- Std. Error
t value Pr(>|t|) (Intercept) 9.2076 9.2999 0.99 0.3316 bioIQ 0.9014 0.0963 9.36 0.0000
T =
0.9014 − 0 0.0963
= 9.36
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 39 / 40
Inference for linear regression HT for the slope
Testing for the slope (cont.)
Estimate
- Std. Error
t value Pr(>|t|) (Intercept) 9.2076 9.2999 0.99 0.3316 bioIQ 0.9014 0.0963 9.36 0.0000
T =
0.9014 − 0 0.0963
= 9.36 df = 27 − 2 = 25
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 39 / 40
Inference for linear regression HT for the slope
Testing for the slope (cont.)
Estimate
- Std. Error
t value Pr(>|t|) (Intercept) 9.2076 9.2999 0.99 0.3316 bioIQ 0.9014 0.0963 9.36 0.0000
T =
0.9014 − 0 0.0963
= 9.36 df = 27 − 2 = 25 p − value = P(|T| > 9.36) < 0.01
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 39 / 40
Inference for linear regression CI for the slope
Confidence interval for the slope
Clicker question
Remember that a confidence interval is calculated as point estimate±ME and the degrees of freedom associated with the slope in a simple linear regression is n−2. Which of the below is the correct 95% confidence interval for the slope parameter? Note that the model is based on observations from 27 twins, and that t⋆
25 = 2.06
Estimate
- Std. Error
t value Pr(>|t|) (Intercept) 9.2076 9.2999 0.99 0.3316 bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 2.06 × 9.2999 (b) 0.9014 ± 2.06 × 0.0963 (c) 0.9014 ± 1.96 × 0.0963 (d) 9.2076 ± 2.06 × 0.0963
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 40 / 40
Inference for linear regression CI for the slope
Confidence interval for the slope
Clicker question
Remember that a confidence interval is calculated as point estimate±ME and the degrees of freedom associated with the slope in a simple linear regression is n−2. Which of the below is the correct 95% confidence interval for the slope parameter? Note that the model is based on observations from 27 twins, and that t⋆
25 = 2.06
Estimate
- Std. Error
t value Pr(>|t|) (Intercept) 9.2076 9.2999 0.99 0.3316 bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 2.06 × 9.2999 (b) 0.9014 ± 2.06 × 0.0963 (c) 0.9014 ± 1.96 × 0.0963 (d) 9.2076 ± 2.06 × 0.0963
n = 27 df = 27 − 2 = 25
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 40 / 40
Inference for linear regression CI for the slope
Confidence interval for the slope
Clicker question
Remember that a confidence interval is calculated as point estimate±ME and the degrees of freedom associated with the slope in a simple linear regression is n−2. Which of the below is the correct 95% confidence interval for the slope parameter? Note that the model is based on observations from 27 twins, and that t⋆
25 = 2.06
Estimate
- Std. Error
t value Pr(>|t|) (Intercept) 9.2076 9.2999 0.99 0.3316 bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 2.06 × 9.2999 (b) 0.9014 ± 2.06 × 0.0963 (c) 0.9014 ± 1.96 × 0.0963 (d) 9.2076 ± 2.06 × 0.0963
n = 27 df = 27 − 2 = 25 95% : t⋆
25
= 2.06
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 40 / 40
Inference for linear regression CI for the slope
Confidence interval for the slope
Clicker question
Remember that a confidence interval is calculated as point estimate±ME and the degrees of freedom associated with the slope in a simple linear regression is n−2. Which of the below is the correct 95% confidence interval for the slope parameter? Note that the model is based on observations from 27 twins, and that t⋆
25 = 2.06
Estimate
- Std. Error
t value Pr(>|t|) (Intercept) 9.2076 9.2999 0.99 0.3316 bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 2.06 × 9.2999 (b) 0.9014 ± 2.06 × 0.0963 (c) 0.9014 ± 1.96 × 0.0963 (d) 9.2076 ± 2.06 × 0.0963
n = 27 df = 27 − 2 = 25 95% : t⋆
25
= 2.06 0.9014 ± 2.06 × 0.0963
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 40 / 40
Inference for linear regression CI for the slope
Confidence interval for the slope
Clicker question
Remember that a confidence interval is calculated as point estimate±ME and the degrees of freedom associated with the slope in a simple linear regression is n−2. Which of the below is the correct 95% confidence interval for the slope parameter? Note that the model is based on observations from 27 twins, and that t⋆
25 = 2.06
Estimate
- Std. Error
t value Pr(>|t|) (Intercept) 9.2076 9.2999 0.99 0.3316 bioIQ 0.9014 0.0963 9.36 0.0000
(a) 9.2076 ± 2.06 × 9.2999 (b) 0.9014 ± 2.06 × 0.0963 (c) 0.9014 ± 1.96 × 0.0963 (d) 9.2076 ± 2.06 × 0.0963
n = 27 df = 27 − 2 = 25 95% : t⋆
25
= 2.06 0.9014 ± 2.06 × 0.0963 (0.7 , 1.1)
Statistics 101 (Nicole Dalzell) U6 - L1: Introduction to SLR June 11, 2015 40 / 40