 
              U  7: M  L  R  L  1: I   MLR S  101 Nicole Dalzell June 15, 2015
Announcements Announcements 1 Recap SLR: Categorical Predictors 2 Many variables in a model 3 Adjusted R 2 4 Collinearity and parsimony 5 Statistics 101 U7 - L1: Multiple Linear Regression Nicole Dalzell
Announcements Announcements OH today from 2-3 PM in Old Chem 211A. Problem Set 8 Due tomorrow Lab Due tomorrow by 5 PM on Sakai Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 2 / 30
Announcements Recap % College graduate vs. % Hispanic in LA What can you say about the relationship between of % college gradu- ate and % Hispanic in a sample of 100 zip code areas in LA? Education: College graduate Race/Ethnicity: Hispanic 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Freeways Freeways No data No data 0.0 0.0 Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 3 / 30
Announcements Recap % College educated vs. % Hispanic in LA What can you say about the relationship between of % college gradu- ate and % Hispanic in a sample of 100 zip code areas in LA? 100% % College graduate 75% 50% 25% 0% 0% 25% 50% 75% 100% % Hispanic Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 4 / 30
Announcements Recap % College educated vs. % Hispanic in LA - linear model Participation question Which of the below is the best interpretation of the slope? Estimate Std. Error t value Pr( > | t | ) (Intercept) 0.7290 0.0308 23.68 0.0000 %Hispanic -0.7527 0.0501 -15.01 0.0000 (a) A 1% increase in Hispanic residents in a zip code area in LA is associated with a 75% decrease in % of college grads. (b) A 1% increase in Hispanic residents in a zip code area in LA is associated with a 0.75% decrease in % of college grads. (c) An additional 1% of Hispanic residents decreases the % of college graduates in a zip code area in LA by 0.75%. (d) In zip code areas with no Hispanic residents, % of college graduates is expected to be 75%. Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 5 / 30
Announcements Recap % College educated vs. % Hispanic in LA - linear model Participation question Which of the below is the best interpretation of the slope? Estimate Std. Error t value Pr( > | t | ) (Intercept) 0.7290 0.0308 23.68 0.0000 %Hispanic -0.7527 0.0501 -15.01 0.0000 (a) A 1% increase in Hispanic residents in a zip code area in LA is associated with a 75% decrease in % of college grads. (b) A 1% increase in Hispanic residents in a zip code area in LA is associated with a 0.75% decrease in % of college grads. (c) An additional 1% of Hispanic residents decreases the % of college graduates in a zip code area in LA by 0.75%. (d) In zip code areas with no Hispanic residents, % of college graduates is expected to be 75%. Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 5 / 30
Announcements Recap % College educated vs. % Hispanic in LA - linear model Do these data provide convincing evidence that there is a statistically significant relationship between % Hispanic and % college graduates in zip code areas in LA? Estimate Std. Error t value Pr( > | t | ) (Intercept) 0.7290 0.0308 23.68 0.0000 hispanic -0.7527 0.0501 -15.01 0.0000 How reliable is this p-value if these zip code areas are not randomly selected? Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 6 / 30
Announcements Recap % College educated vs. % Hispanic in LA - linear model Do these data provide convincing evidence that there is a statistically significant relationship between % Hispanic and % college graduates in zip code areas in LA? Estimate Std. Error t value Pr( > | t | ) (Intercept) 0.7290 0.0308 23.68 0.0000 hispanic -0.7527 0.0501 -15.01 0.0000 Yes, the p-value for % Hispanic is low, indicating that the data provide convincing evidence that the slope parameter is different than 0. How reliable is this p-value if these zip code areas are not randomly selected? Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 6 / 30
Announcements Recap % College educated vs. % Hispanic in LA - linear model Do these data provide convincing evidence that there is a statistically significant relationship between % Hispanic and % college graduates in zip code areas in LA? Estimate Std. Error t value Pr( > | t | ) (Intercept) 0.7290 0.0308 23.68 0.0000 hispanic -0.7527 0.0501 -15.01 0.0000 Yes, the p-value for % Hispanic is low, indicating that the data provide convincing evidence that the slope parameter is different than 0. How reliable is this p-value if these zip code areas are not randomly selected? Not very... Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 6 / 30
Announcements Recap Recap Inference for the slope for a SLR model (only one explanatory variable): Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 7 / 30
Announcements Recap Recap Inference for the slope for a SLR model (only one explanatory variable): Hypothesis test: T = b 1 − null value df = n − 2 SE b 1 Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 7 / 30
Announcements Recap Recap Inference for the slope for a SLR model (only one explanatory variable): Hypothesis test: T = b 1 − null value df = n − 2 SE b 1 Confidence interval: b 1 ± t ⋆ df = n − 2 SE b 1 Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 7 / 30
Announcements Recap Recap Inference for the slope for a SLR model (only one explanatory variable): Hypothesis test: T = b 1 − null value df = n − 2 SE b 1 Confidence interval: b 1 ± t ⋆ df = n − 2 SE b 1 The null value is often 0 since we are usually checking for any relationship between the explanatory and the response variable. Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 7 / 30
Announcements Recap Recap Inference for the slope for a SLR model (only one explanatory variable): Hypothesis test: T = b 1 − null value df = n − 2 SE b 1 Confidence interval: b 1 ± t ⋆ df = n − 2 SE b 1 The null value is often 0 since we are usually checking for any relationship between the explanatory and the response variable. The regression output gives b 1 , SE b 1 , and two-tailed p-value for the t -test for the slope where the null value is 0. Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 7 / 30
Announcements Recap Recap Inference for the slope for a SLR model (only one explanatory variable): Hypothesis test: T = b 1 − null value df = n − 2 SE b 1 Confidence interval: b 1 ± t ⋆ df = n − 2 SE b 1 The null value is often 0 since we are usually checking for any relationship between the explanatory and the response variable. The regression output gives b 1 , SE b 1 , and two-tailed p-value for the t -test for the slope where the null value is 0. We rarely do inference on the intercept, so we’ll be focusing on the estimates and inference for the slope. Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 7 / 30
Announcements Recap Caution Always be aware of the type of data you’re working with: random sample, non-random sample, or population. Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 8 / 30
Announcements Recap Caution Always be aware of the type of data you’re working with: random sample, non-random sample, or population. Statistical inference, and the resulting p-values, are meaningless when you already have population data. Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 8 / 30
Announcements Recap Caution Always be aware of the type of data you’re working with: random sample, non-random sample, or population. Statistical inference, and the resulting p-values, are meaningless when you already have population data. If you have a sample that is non-random (biased), the results will be unreliable. Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 8 / 30
Announcements Recap Caution Always be aware of the type of data you’re working with: random sample, non-random sample, or population. Statistical inference, and the resulting p-values, are meaningless when you already have population data. If you have a sample that is non-random (biased), the results will be unreliable. The ultimate goal is to have independent observations – and you know how to check for those by now. Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 8 / 30
SLR: Categorical Predictors Announcements 1 Recap SLR: Categorical Predictors 2 Many variables in a model 3 Adjusted R 2 4 Collinearity and parsimony 5 Statistics 101 U7 - L1: Multiple Linear Regression Nicole Dalzell
SLR: Categorical Predictors Dinosaur Weight What relationship do you see between the weight of dinosaurs and the type of dinosaur? Dinosaur Weight by Type 8e+04 Weight (kg) 4e+04 0e+00 Ornithischian Saurischian Statistics 101 (Nicole Dalzell) U7 - L1: Multiple Linear Regression June 15, 2015 9 / 30
Recommend
More recommend