SLIDE 1
STAT 113: EXAM 2 PRACTICE PROBLEMS
SOLUTION
Inference Foundations. Parameters and Statistics. State whether the quantity described is a param- eter or statistic and give the correct notation. (These are exercises 3.1-3.5 from the text) (1) Average household income for all houses in the US, using data from the US Census (2) Correlation between height and weight for players on the 2010 Brazil World Cup team, using data from all 23 players on the roster (3) Proportion of people who use an electric toothbrush, using data from a sample of 300 adults (4) Proportion of registered voters in a county who voted in the last election, using data from the county voting records (5) Average number of television sets per household in North Carolina, using data from a sample of 1000 households. Sampling Distributions. (6) (Exercise B.10 from the text) The GRE (Graduate Record Exam) is like the SAT exam except it is used for application to graduate school instead of college. The mean GRE scores for all examinees tested between July 1, 2006, and June 20, 2009, are as follows: Verbal 456, Quantitative 590, Analytic Writing 3.8. If we consider the popula- tion to be all people who took the test during this time period, are these parameters or statistics? What notation would be appropriate for each of them? Suppose we take 1000 different random samples, each of size n = 50, from each of the three exam types and record the mean score for each sample. Where would the distribution of sample means be centered for each type of exam?
Date: November 18, 2015.
1
SLIDE 2 2 SOLUTION
These are parameters, since they are computed for all examinees in the time period in question, which is defined to be the population. We could write, for example, µV = 456 µQ = 590 µAW = 3.8 The distribution of sample means for verbal would be centered at the population mean for verbal (456). Likewise for the other tests. Figure 1 contains sample proportions of one value of a binary response vari- able based on many random samples of size n = 35 from a population. The next six questions refer to this figure. Figure 1. Sample Proportions from Samples of Size n = 35 (7) What does one dot on the sampling distribution represent? Each dot comes from a sample of size 35 and represents the sample pro- portion for that sample. (8) Estimate the population proportion from the dotplot. Since sample proportions from many random samples center around the popula- tion parameter, we can infer that the population parameter is near the center of the distribution of sample proportions. That is, it is roughly 0.63 to 0.65 or so. (9) Estimate the standard error of the proportions. The standard error is the standard deviation of the distribution of sample proportions. Visually inspecting the distribution, the points that are about one standard deviation from the mean in either direction appear to be at roughly 0.56 and 0.72 or so, which represents a span of two standard
- deviations. One standard deviation is therefore about 0.08 or so.
SLIDE 3 STAT 113: EXAM 2 PRACTICE PROBLEMS 3
(10) For each of the following sample proportions, indicate whether it is (a) Reasonably likely to occur for a sample of this size, (b) Unusual but might occur occasionally, or (c) Extremely unlikely to occur (i) ˆ p = 0.45 This is an unusual value to get, though it seems to
(ii) ˆ p = 0.98 There are no values this extreme in the sample pro- portions in the plot. It is a full standard deviation past the largest value observed, so it is extremely unlikely to occur. (iii) ˆ p = 0.65 This is right near the center of the distribution and
- ccurs relatively frequently.
(11) If samples of size n = 70 had been used instead of n = 35, which of the following would be true? (a) The sample statistics would be centered at a larger proportion. (b) The sample statistics would be centered at roughly the same proportion. (c) The sample statistics would be centered at a smaller proportion. The sample size does not affect the center of the sampling distribu- tion, so the sample statistics would still be centered at roughly the same value. (12) If samples of size n = 70 had been used instead of n = 35, which of the following would be true? (a) The sample statistics would have more variability. (b) The variability in the sample statistics would be about the same. (c) The sample statistics would have less variability. As the sample size goes up, the variability in the sample statistics goes down. So doubling the sample size would result in less variabil- ity in the sample statistics. Confidence Intervals. (13) (B.11) A recent national telephone survey reports that 57% of those surveyed think violent movies lead to more violence in society. The survey included a random sample of 1000 American adults and re- ports: “The margin of sampling error is ± 3 percentage points with a 95% level of confidence.” (i) Define the relevant population and parameter. Based on the data given, what is the best point estimate for this parameter.
SLIDE 4 4 SOLUTION
(ii) Find and interpret a 95% confidence interval for the parameter defined in (i). (14) (B.16) In a random sample of 450,000 U.S. adults the proportion
- f people who say they exercised at some point in the past 30 days
is ˆ p = 0.726 with a standard error of 0.0007. Find and interpret a 95% CI for the proportion of U.S. adults who have exercised in the last 30 days. A 95% confidence interval is given by ˆ p ± 2 · SE = 0.726 ± 2(0.0007) = 0.726 ± 0.0014, which gives an interval from 0.7246 to 0.7274. We are 95% confident that the proportion of all US adults who have exercised at some point in the last 30 days is between 0.7246 and 0.7274. The confidence interval is very narrow because the sample size (over 450,000) is so large. (15) (modified from 3.65) Identify whether each of the following samples is a valid bootstrap sample from this original sample: 17,10,15,21,13,18. If it could not be obtained, explain why not. (i) 10, 12, 17, 18, 20, 21 (ii) 10, 15, 17 (iii) 18, 13, 21, 17, 15, 13, 10 (iv) 15, 10, 21, 24, 15, 10 (v) 13, 10, 21, 10, 18, 17 Bootstrap samples are the same size as the original sample, and so (ii) and (iii) are not valid. Samples (i) and (iv) are not valid as they contain one or more values not in the original sample (boot- strap samples are drawn with replacement from the original sample). Sample (v) is valid. (16) (modified from 3.69) Figure 2 represents a bootstrap distribution
Give a point estimate for the population correlation, and estimate a 95% confidence interval two ways: (i) by first estimating the standard error, and (ii) directly from the appropriate quantiles of the bootstrap distribution. (17) (B.29) Given a specific sample to estimate a specific parameter from a population, what are the expected similarities and differences in the corresponding sampling distribution (using the given sample size) and bootstrap distribution (using the given sample)? In partic- ular, for each aspect of a distribution listed below, indicate whether the values for the two distributions (sampling distribution and boot- strap distribution) are expected to be approximately the same or
- different. If they are different, explain how.
(i) The shape of the distribution
SLIDE 5 STAT 113: EXAM 2 PRACTICE PROBLEMS 5
Figure 2. A Bootstrap Distribution of Sample Correlations (ii) The center of the distribution (iii) The spread of the distribution (iv) What one value (or dot) in the distribution represents (v) The information needed in order to create the distribution Hypothesis Testing. (17) (modified from 4.21-4.25) The ICUAdmissions dataset contains in- formation about a sample of patients admitted to a hospital Intensive Care Unit (ICU). For each of the research questions below, define any relevant parameters and state the appropriate null and alternative hypotheses. (i) Is there evidence that mean heart rate is higher in male ICU patients than in female ICU patients? (ii) Is there a difference in the proportion who receive CPR based
- n whether the patients race is white or black?
(iii) Is there a positive linear association between systolic blood pressure and heart rate? (iv) Is either gender over-represented in patients to the ICU or is the gender breakdown about equal? (v) Is the average age of ICU patients at this hospital greater than 50? (18) (modified from B.7) How much of an effect does your roommate have on your grades? In particular, does it matter whether your roommate brings a videogame to college? A study examining this
SLIDE 6 6 SOLUTION
question looked at n = 210 students entering Berea College as first- year students in the Fall of 2001 who were randomly assigned a
- roommate. The explanatory variable is whether or not the roommate
brought a videogame to college and the response variable is grade point average (GPA) for the first semester. (i) In conducting a test to see whether GPA is lower on average for students whose roommate brings a videogame to campus, define the parameter(s) of interest and state the null and alter- native hypotheses. (ii) The P-value for the test above is 0.036. What is the conclusion at a 5% significance level? (iii) We are interested in seeing how large the room- mate effect is
- n GPA. A 90% confidence interval for µvµn is (0.315, 0.015),
where µv is the average GPA for first-year students whose roommate brings a videogame to college and n is the average GPA for first-year students whose roommate does not bring a videogame to college. Explain how you can tell just from the confidence interval which group has a higher average GPA. In- terpret the confidence interval in terms of roommates, videogames, and GPA. (19) (modified from 4.37) When getting voters to support a candidate in an election, is there a difference between a recorded phone call from the candidate or a flyer about the candidate sent through the mail? A sample of 500 voters is randomly divided into two groups of 250 each, with one group getting the phone call and one group getting the flyer. The voters are then contacted to see if they plan to vote for the candidate in question. We wish to see if there is evidence that the proportions of support are different between the two methods of campaigning. (i) Define the relevant parameter(s) and state the null and alter- native hypotheses. (ii) Possible sample results are shown in the following table. Com- pute the two sample proportions: ˆ pc, the proportion of voters getting the phone call who say they will vote for the candidate, and ˆ pf, the proportion of voters getting the flyer who say they will vote for the candidate. Is there a difference in the sample proportions? Sample A Will Vote for Candidate Will Not Vote for Candidate Phone Call 152 98 Flyer 145 105
SLIDE 7 STAT 113: EXAM 2 PRACTICE PROBLEMS 7
(iii) A different set of possible sample results are shown in the below
- table. Compute the same two sample proportions for this table.
Which of the two samples seems to offer stronger evidence of a difference in effectiveness between the two campaign methods? Explain your reasoning. Sample A Will Vote for Candidate Will Not Vote for Candidate Phone Call 188 62 Flyer 120 130 (iv) Suppose 5000 voters, rather than 500 voters, were sampled, yielding the following counts. Note that the proportions are the same as in the first table. Which of the two samples (this one
- r the first one) seems to offer stronger evidence of a difference
in effectiveness between the two campaign methods. Explain your reasoning. Sample A Will Vote for Candidate Will Not Vote for Candidate Phone Call 1520 980 Flyer 1450 1050 (20) (modified from 4.52) If a restaurant chain finds significant evidence that the mean arsenic in chicken level is above 80, the chain will stop using that supplier of chicken meat. The hypotheses are H0 : µ = 80 H1 : µ > 80 where µ represents the mean arsenic level in all chicken meat from that supplier. Samples from two different suppliers are analyzed, and the resulting P-values are given: Sample from Supplier A: P-value is 0.0003 Sample from Supplier B: P-value is 0.3500 (i) Interpret each P-value in terms of the probability of the results happening by random chance. The P-values indicate the probability that we would get a sample mean arsenic level as large or larger as the one actually obtained, assuming that the population (or “long run”) mean level of arsenic is actually 80. (ii) Which P-value shows stronger evidence for the alternative hy-
- pothesis. What does this mean in terms of arsenic and chicken?
The P-value for supplier A constitutes much stronger evidence for the alternative. Hence it is more likely that supplier A has chicken with mean arsenic levels above 80.
SLIDE 8 8 SOLUTION
(iii) Which supplier, A or B, shoudl the chain get chickens from in
- rder to avoid too high a level of arsenic?
There is evidence that the chicken from supplier A has a mean arsenic level above 80. There is no significant evidence of the same for supplier B. Although we did not formally test whether the arsenic from supplier A was higher than that from supplier B, it is reasonable to prefer supplier B based on the results of these two tests. (21) (4.61) Using the definition of a P-value, explain why the area in the tail of a randomization distribution is used to compute a p-value. (22) (B.38 and B.47) The Centers for Disease Control and Prevention (CDC) conducted a randomized trial in South Africa designed to test the effectiveness of an inexpensive wipe to be used during childbirth to prevent infections. Half of the mothers were randomly assigned to have their birth canal wiped with a wipe treated with a drug called chlorohexidine before giving birth, and the other half to get wiped with a sterile wipe (a placebo). The response variable is whether
- r not the newborns develop an infection. The CDC hopes to find
- ut whether there is evidence that babies delivered by the women
getting the treated wipe are less likely to develop an infection. (i) Define the relevant parameter(s) and state the null and alter- native hypotheses. We are testing a hypothesis about the difference between two population proportions: pwipe, the pro- portion of babies in the population who would develop an in- fection if the mother’s birth canal is wiped with chlorohexidine, and pplacebo, the population proportion of babies who would de- velop if a placebo is used. The relevant parameter is therefore pwipe − pplacebo. (ii) What is/are the sample statistic(s) to be used to test this claim? We can define ˆ pwipe and ˆ pplacebo as the proportion
- f infections in each sample group. The sample statistic we use
is therefore ˆ pwipe − ˆ pplacebo. (iii) If the results are statistically significant, what would that imply about the wipes and infections? Since this is a randomized experiment, if the results are statistically significant, we can make a causal conclusion and say that the chlorohexidine wipe actually reduced the likelihood of an infection. (Although we might always be wrong, and be making a Type I Error) (iv) If the results are not statistically significant, what would that imply about the wipes and infections? If the difference is not statistically significant, we cannot conclude that chlorohexidine made any difference. However, we also cannot conclude that it
SLIDE 9 STAT 113: EXAM 2 PRACTICE PROBLEMS 9
didn’t. It is possible that we did not have enough data to detect a small difference. Without knowing the actual difference in population proportions, we cannot say how likely it is that we are making a Type II Error. (v) What does it mean to make a Type I error in this situation? (vi) What does it mean to make a Type II error in this situation? (vii) In which of the following two situations should we select a smaller significance level:
- The drug chlorohexidine is very safe and known to have
very few side effects.
- The drug chlorohexidine is relatively new and may have
potentially harmful side effects for the mother and new- born child. (viii) The P-value for the data in this study is 0.32. What is the conclusion of the test? (ix) Does this conclusion mean that the treated wipes do not help prevent infections? Explain. (23) (modified from B.52-B.56) For each situation described below, de- scribe how you might physically create one randomization sample and compute one randomization statistic (without using the com- puter) from a given sample. Be explicit enough that a classmate could follow your instructions (even if it might take a very long time). (i) Testing to see if there is evidence that the proportion of peo- ple who smoke is greater for males than for females. The null hypothesis for this difference in proportions test is that the pro- portions are the same. To create the randomization samples, we match the null hypothesis. In this situation, that means gender doesnt matter in smoking outcomes, so one way to match this is to randomly scramble the yes/no responses from the orig- inal sample to the smoking question and assign them to the
- riginal subjects. Compute the difference in the proportion of
smokers between the two genders in this simulated sample to get the randomization statistic. Other methods, for example sampling with replacement from the pooled data to simulate new samples of male and female responses, are also acceptable. (ii) Testing to see if there is evidence that a correlation between height and salary is significant (that is, different than zero.) The null hypothesis for this correlation test is that the correla- tion is zero. To create the randomization samples, we match the
SLIDE 10 10 SOLUTION
null hypothesis. In this situation, that means height and salary are completely unrelated. We might randomly scramble the height values and assign them to the original subjects/salaries. Compute the correlation, r, between those random heights and the actual salaries. (iii) Testing to see if there is evidence that the percentage of a popu- lation who watch the Home Shopping Network is less than 20%. The null hypothesis for this test for a single proportion is that the population proportion is 0.20. To create the randomization samples, we match the null hypothesis. In this situation, that means we might randomly sample (with replacement) from a set that has 2 “yes” and 8 “no” values, where “yes” represents a person who watches the Home Shopping Network. Use the same sample size as the original sample and compute ˆ p, the proportion of yes responses in the simulated sample. (iv) Testing to see if average sales are higher in stores where cus- tomers are approached by salespeople than in stores where they arent. The null hypothesis for this difference in means test is that the means of the two groups are the same. To create the randomization samples, we match the null hypothesis. In this situation, that means whether or not a customer is approached has no effect on sales. We might randomly scramble the labels for type of store (“approach” or “not approach”) and assign them to the actual sales values. Compute the difference in the mean sales, ¯ xa¯ xna, between the stores assigned to the “ap- proach” group and those randomly put in the “not approach”
- group. Other methods, for example sampling with replacement
from the pooled sales values to simulate new samples of “ap- proach” and “not approach” sales, are also acceptable. (v) Testing to see if there is evidence that the mean time spent studying per week is different between first-year students and upperclass students. The null hypothesis for this difference in means test is that the means of the two groups are the same. To create the randomization samples, we match the null hy- pothesis. In this situation, that means it doesnt matter in studying time whether the person is a first-year student or an upperclass student. We might randomly scramble the labels for type of student (“FY” or “Upper”) and assign them to the actual study time values. Compute the difference in the mean study time, ¯ xfy¯ xu, between the students assigned to the “first-year” group and those randomly put in the “upperclass”
- group. Other methods, for example shifting the two original
samples to a common mean and sampling (with replacement)
SLIDE 11 STAT 113: EXAM 2 PRACTICE PROBLEMS 11
from the respective shifted values to form two new samples, are also acceptable. Theory-Based Tests and Intervals. (24) (modified from 6.5) If many random samples of size 300 are drawn from a population in which 8% of cases have have a particular value
(i) Find the mean and standard deviation of sample proportions (ii) Draw a curve showing the shape of the sampling distribution, including at least three values on the horizontal axis. (25) (6.9) The 2010 US Census reports that, of all the nations occupied housing units, 65.1% are ned by the occupants and 34.9% are rented. If we take random samples of 50 occupied housing units and compute the sample proportion that are owned for each sample, what will be the mean and standard deviation of the distribution of sample proportions? (26) (modified from C.25-C.30) In the following, a standardized test statis- tic is given for a hypothesis test involving proportions (using the standard normal distribution) or means (using the t-distribution and assuming a relatively large sample size). Without using any technol-
- gy or tables, in each case, decide (a) whether the P-value is likely
to be greater than 0.1, between 0.05 and 0.1, between 0.01 and 0.05,
- r less than 0.01, and whether the conclusion of the test likely to
be “Reject H0” or “Do not reject H0” at the 0.1, 0.05 and 0.01 sig- nificance levels. If it makes a difference whether the test is one- or two-tailed, indicate that. (i) z = 2.8 (ii) z = 8.3 (iii) z = 0.54 (iv) z = 2.01 (v) t = 12.2 (vi) t = 0.83 (27) (modified from 6.29) Give an approximate 95% confidence interval for the proportion of the population in Category A given that 23%
- f a sample of 400 are in Category A.
(28) (modified from 6.39(c)) In a survey of 2255 randomly selected US adults (age 18 or older), 1787 of them use the Internet regularly. Of
SLIDE 12
12 SOLUTION
the Internet users, 1054 use a social networking site. Find and inter- pret an (approximate) 95% confidence interval for the proportion of all US adults who use a social networking site. Use the confidence interval to estimate whether it is plausible that 50% of all US adults in 2011 use a social networking site. (29) (modified from 6.65) The percent of US adults who know their neigh- bors appears to be trending up. A survey of 2255 randomly selected US adults conducted in November 2010 found that 51% said they know all or most of their neighbors. (The result in a similar survey conducted in 2008 was 40%.) Does the 2010 survey provide evidence that more than half of US adults know most or all of their neighbors? Use an appropriate hypothesis test and give all details. (30) (modified from 6.85) The Boston Marathon is the worlds oldest an- nual marathon, held every year since 1897. In 2011, 23,879 runners finished the race, with a mean time for all runners of 3:49:54 (about 230 minutes) with standard deviation 0:37:56 (about 38 minutes). Find the mean and standard deviation (in minutes) of the distribu- tion of sample means if we take random samples of Boston marathon finishers of size: (i) n = 10 (ii) n = 100 (iii) n = 1000 and comment on the effect of the sample size on the center and variability of the distribution of sample means. (31) Rank the margins of error that would be associated with the follow- ing standard errors and confidence levels from smallest to largest. (i) A 90% confidence level and a standard error of 0.05 (ii) A 95% confidence level and a standard error of 0.05 (iii) A 99% confidence level and a standard error of 0.05 The margin of error increases with the confidence level, when the standard error is held constant. So, from smallest to largest, we have (i), then (ii), then (iii). (32) Rank the margins of error that would be associated with the follow- ing standard errors from smallest to largest, assuming a confidence level of 95%. (i) A standard error for a proportion of 0.05 (ii) A standard error for a proportion of 0.10
SLIDE 13 STAT 113: EXAM 2 PRACTICE PROBLEMS 13
(iii) A standard error for a proportion of 0.15 The margin of error increases with the standard error when the con- fidence level is held constant. So, from smallest to largest, we have (i), then (ii), then (iii). (33) Rank the margins of error that would be associated with a 95% confidence interval for the following sample sizes, from smallest to largest, assuming a sample proportion of ˆ p = 0.2. (i) A sample size of n = 10 (ii) A sample size of n = 20 (iii) A sample size of n = 30 The standard error decreases as the sample size increases, and there- fore the margin of error decreases as the sample size increases (for a fixed confidence level and sample characteristics) So, from smallest to largest, we have (iii), then (ii), then (i). (34) Rank the 95% margins of error that would be associated with the following standard errors from smallest to largest (Hint: think about what distribution we would use to model the “standardized” boot- strap distribution in each case.) (i) A standard error of 0.05 for a proportion. (ii) An estimated standard error of 0.05, for a mean based on 6
(iii) An estimated standard error of 0.05, for a mean based on 15
This one is tricker. The confidence level and the standard error are held constant here; the only difference is what distribution we use to model the bootstrap distribution. For (i) we can use a standard Normal since we are working with proportions. For (ii) and (iii) we need to use t-distributions, since the standard error is estimated based on the sample standard deviation. The t distribution has fatter tails, and hence 0.025 and 0.975 quantiles that are farther away from 0, compared to the Normal; but as the degrees of freedom increase, the distribution approaches a standard Normal. So from smallest to largest, we have (i), then (iii), then (ii). (35) Rank the P-values associated with the following observed test sta- tistics and null sampling distribution pairs from smallest to largest, assuming a left-tailed test in all cases. (i) zobs = −1.50, and the null sampling distribution is modeled by a Normal.
SLIDE 14 14 SOLUTION
(ii) tobs = −1.50, and the null sampling distribution is modeled by a t-distribution with 5 degrees of freedom. (iii) tobs = −1.50, and the null sampling distribution is modeled by a t-distribution with 10 degrees of freedom. (iv) zobs = 1.50, and the null sampling distribution is modeled by a Normal. (v) tobs = 1.50, and the null sampling distribution is modeled by a t-distribution with 5 degrees of freedom. (vi) tobs = 1.50, and the null sampling distribution is modeled by a t-distribution with 10 degrees of freedom. This one requires a lot of pieces to be put together. As in the previous question, one key point is that the t distribution has fatter tails, and hence larger tail proportions, compared to the Normal; but as the degrees of freedom increase, the distribution approaches a standard
- Normal. Therefore, for a fixed test statistic, the standard Normal
will yield the smallest P-value, followed by a t-distribution with larger degrees of freedom, followed by a t-distribution with smaller degrees of freedom. So we have (i) < (iii) < (ii) and we have (iv) < (vi) < (vi). To compare the first three to the second three, we note that we are doing a left tailed test in both cases. By symmetry, the P-value will be less than 0.5 when the threshold is below the center of the distribution, which is 0 in all cases. But when the threshold is to the right of center, the left tail probability is greater than 0.5. So we have (i) < (iii) < (ii) < (iv) < (vi) < (v). (36) (modified from C.51) In a study designed to examine the effect of the color red on how attractive men perceive women to be, men were randomly divided into two groups and were asked to rate the attractiveness of women on a scale of 1 (not at all attractive) to 9 (extremely attractive). One group of men were shown pictures of women on a white background and the other group were shown the same pictures of women on a red background. The results are shown in the table below. Test to see if men rate women as significantly more attractive (on average) when a red background is used rather than a white background. Show all details and clearly state your conclusion. Color n ¯ x s Red 15 7.2 0.6 White 12 6.1 0.4 This is a hypothesis test for a difference in means. Using µR for the average rating with a red background and µW for the average rating
SLIDE 15 STAT 113: EXAM 2 PRACTICE PROBLEMS 15
with a white background, the hypotheses for the test are: H0 : µR − µW = 0 (µR = µW ) Ha : µR − µW > 0 (µR > µW ) The relevant statistic for this test is ¯ xR¯ xW , where ¯ xR represents the mean rating in the sample with the red background and ¯ xW represents the mean rating in the sample with the white background. The relevant null parameter is zero, since from the null hypothesis we have µRµW = 0. The t test statistic is: t = Sample statistic − Null parameter SE = (¯ xR − ¯ xW ) − 0
R
nR + s2
W
nW
= 7.2 − 6.1
15 + 0.42 12
= 5.69 If the test statistic had a standard Normal distribution under H0 rather than a t-distribution, the P-value would be very very small, and so we would resoundingly reject H0. Since we have a t-distribution with 11 degrees of freedom, the P-value will be somewhat larger, but is still most likely well under 0.05. So we will most likely reject H0.