STAT 113: FINAL EXAM PRACTICE PROBLEMS COLIN REIMER DAWSON, FALL - - PDF document

stat 113 final exam practice problems
SMART_READER_LITE
LIVE PREVIEW

STAT 113: FINAL EXAM PRACTICE PROBLEMS COLIN REIMER DAWSON, FALL - - PDF document

STAT 113: FINAL EXAM PRACTICE PROBLEMS COLIN REIMER DAWSON, FALL 2015 Research Design / Describing Samples. (1) The following measures can be used to describe distributions (either population or sample distributions). For each one describe


slide-1
SLIDE 1

STAT 113: FINAL EXAM PRACTICE PROBLEMS

COLIN REIMER DAWSON, FALL 2015

Research Design / Describing Samples. (1) The following measures can be used to describe distributions (either population or sample distributions). For each one describe conceptu- ally (without mathematical notation, and without simply describing how to calculate it) and as concisely as possible, what information it captures. (a) The mean (b) The median (c) The range (d) The interquartile range (IQR) (e) The variance (f) The standard deviation (2) Describe what it means for a measure to be robust/resistant (two terms for the same thing). For each of the measures above, indicate whether it is or is not relatively robust/resitant. What consider- ations go into choosing whether or not to use a robust/resistant measure? (3) (Modified/abridged from A.3) In a study investigating how students use their laptop computers in class, researchers recruited 45 students at one university in the Northeast who regularly take their laptops to

  • class. On average, the students cycled through 65 active windows per

lecture, with one student averaging 174 active windows per lecture. They found that, on average, 62% of the windows students open in class are completely unrelated to the class, and students had dis- tracting windows open and active 42% of the time, on average. The study included a measure of how each student performed on a test

  • f the relevant material. Not surprisingly, the study finds that the

students who spent more time on distracting websites generally had lower test scores.

Date: December 14, 2015.

1

slide-2
SLIDE 2

2 COLIN REIMER DAWSON, FALL 2015

(a) Identify the cases and sample size for this study. (b) Is this an experiment or an observational study? (c) From the description given, what variables are recorded for each case? Identify each as categorical or quantitative. (d) What graph is most appropriate to display the data about num- ber of active windows open per lecture if we want to quickly determine whether the maximum value (174) is an outlier? (e) The last sentence of the paragraph describes an association. Identify a graph and a statistic that could be used to display and quantify this association, respectively. (f) From the information given, can we conclude that students who allocate their cognitive resources to distracting sites during class get lower grades because of it? Why or why not? (4) (Modified from A.27) The number of consecutive frost-free days in a year is called the growing season. A farmer considering moving to a new region finds that the median growing season for the area for the last 50 years is 275 days while the mean growing season is 240 days. (a) Explain how it is possible for the mean to be so much lower than the median, and describe the distribution of the growing season lengths in this area for the last 50 years. (b) Sketch either a possible histogram or a possible density curve for the shape of this distribution. Label the mean and median on the horizontal axis. Inference Foundations. Study Exam 2 and the practice problems for exam 2. Inference for Correlation and Regression. (1) (modified from D.46) Is depression a possible factor in students miss- ing classes? A study analyzed relationships among various variables pertaining to a population of college students. Two of those variables are DepressionScore, scores on a standard depression scale with higher numbers indicating greater depression, and ClassesMissed, the number of classes missed during the semester. Computer out- put is shown below for a linear regression model used to predict the number of classes missed based on the depression score.

slide-3
SLIDE 3

STAT 113: FINAL EXAM PRACTICE PROBLEMS 3

Coefficients: Estimate

  • Std. Error

t-value P-value (Intercept) 1.77712 0.26714 6.652 1.79e-10 DepressionScore 0.08312 0.03368 2.468 0.0142 Residual standard error: 3.208 on 251 degrees of freedom Multiple R-squared: 0.0237 (a) Interpret the slope of the regression line in the context of de- pression and missed classes. (b) Based on the output above, what can we conclude about the relationship between these variables in the population? (c) Interpret R2 in the context of depression and missed classes. (What does it tell us about the relationship?) (2) (modified from D.50 and D.51) We can use data from a sample of NBA basketball games to construct a regression model to predict points in a season for a player based on the number of free throws

  • made. For our sample data, the number of free throws made in a

season ranges from 16 to 594, while the number of points ranges from 104 to 2161. For the information in (a) and (b), interpret the confidence and prediction interval given in the context of free throws and points scored per season. Make a specific statement about what the value of 95% means in each case. (a) The predicted number of points made for a player who makes 100 free throws in a season is 710.8 points, with a 95% confidence interval of 675.7 to 745.8 points. The prediction interval at the same free throw number is 340.7 to 1080.8 points. (b) The predicted number of points made for a player who makes 400 free throws in a season is 1613.6 points, with a 95% confidence interval of 1559.3 to 1667.9 points. The prediction interval at the same free throw number is 1241.2 to 1986.0 points. (c) Use the information above to find the slope of the regression line. (d) How do you expect the width of the confidence interval for a player who makes 20 free throws in a season to compare to the intervals given in (a) and (b)? Why? Goodness of Fit and Association Tests for Categorical Variables. (1) An Ipsos/Reuters poll conducted between Dec. 5th and 9th of this year asked a random sample of 494 adult Americans identifying as members of the Republican party who their preferred presidential candidate was. Donald Trump was the choice of 183 respondents,

slide-4
SLIDE 4

4 COLIN REIMER DAWSON, FALL 2015

Ben Carson was chosen by 64, Marco Rubio by 59 and Ted Cruz by

  • 54. A total of 104 respondents identified one of the other candidates,

and 30 were undecided. (a) Set aside the undecided respondents and those who identified a candidate outside the top four. Can we conclude that the propor- tion of the population from which the respondents were selected who prefer Trump is higher than the combined proportion who prefer one of Carson, Rubio and Cruz? Use a chi-square test and show all details. (b) Setting aside the Trump voters as well, can we conclude that Carson, Rubio and Cruz are not equally preferred by the popula- tion from which the respondents were selected? Use a chi-square statistic and show all details. (2) On November 15-18, 2012 Gallup conducted a survey of 1,015 ran- domly selected U.S. adults. They were asked whether they planned to go shopping on “Black Friday” (the day after Thanksgiving). The results, broken down by sex (as self-reported by the participants), are summarized in the following two-way table.

Shopping Plans? Yes No Total Sex M 82 433 515 F 100 400 500 Total 182 833 1015

(a) Compute the expected cell count for the Male/Yes Shopping cell, to two decimal places. (b) The appropriate chi-square distribution for this test has 1 degree

  • f freedom ((R − 1)(C − 1) = (2 − 1)(2 − 1) = 1). Explain why

the test has 1 degree of freedom. (c) Here is some computer output for H0 : Planning to shop the Friday after Thanksgiving is unrelated to sex H1 : Sex and planning to shop on the Friday after Thanksgiving are related Chi-Square = 2.866, DF = 1, P-Value = 0.090 What is the test conclusion at the 5% significance level? Do you reject H0? Why or why not? (d) Describe a different approach that could have been used to test these same hypotheses, instead of the chi-square test. Without doing any calculations, what P-value would you expect to get if you did the test this other way?

slide-5
SLIDE 5

STAT 113: FINAL EXAM PRACTICE PROBLEMS 5

Comparing Multiple Means. (1) (Modified from 8.15) A recent study examined the impact of a moth- ers voice on stress levels in young girls. The study included 68 girls ages 7 to 12 who reported good relationships with their mothers. Each girl gave a speech and then solved mental arithmetic problems in front of strangers. Cortisol levels in saliva were measured for all girls and were high, indicating that the girls felt a high level of stress from these activities. (Cortisol is a stress hormone and higher lev- els indicate greater stress.) After the stress-inducing activities, the girls were randomly divided into four equal-sized groups: one group talked to their mothers in person, one group talked to their mothers

  • n the phone, one group sent and received text messages with their

mothers, and one group had no contact with their mothers. Cortisol levels were measured before and after the interaction with mothers and the change in the cortisol level was recorded for each girl. (a) What are the two main variables in this study? Identify each as categorical or quantitative. (b) Is this an experiment or an observational study? (c) The researchers are testing to see if there is a difference in the change in cortisol level depending on the type of interaction with

  • mom. What are the null and alternative hypotheses? Define any

parameters used. (d) How many degrees of freedom are there for estimating between groups variance? How many for estimating within groups (resid- ual) variance? (e) Explain how the amount of variability in cortisol levels within each group affects the F-statistic and the P-value of the test. (2) (Modified from 8.17) Studies have shown that heating the scrotum by just 1 degree Celsius can reduce sperm count and sperm quality, with long-term consequences. Exercise 2.101 on page 87 introduces a study indicating that males sitting with a laptop on their laps have increased scrotal temperatures. Does a lap pad help reduce the temperature increase? Does sitting with legs apart help? The study investigated three conditions: legs together and a laptop computer

  • n the lap, legs apart and a laptop computer on the lap, and legs

together with a lap pad under the laptop computer. Scrotal tem- perature increase over a 60-minute session was measured in degrees Celsius, and the summary statistics are given below.

slide-6
SLIDE 6

6 COLIN REIMER DAWSON, FALL 2015

Condition n Mean

  • Std. Dev.

Legs together 29 2.31 0.96 Lap pad 29 2.18 0.69 Legs apart 29 1.41 0.66 (a) Suppose the temperature increase values within each condition are approximately normally distributed. Does the data appear to satisfy the conditions for an F-test? (b) Use the fact that the total sum of squared deviations (SStotal) of the temperature increase scores from the overall mean across all groups is 66.9 and the sum of squared deviations of temperature increase scores from their respective group means (SSWithin) is 53.2 to test whether there is a difference in mean temperature increase between the three conditions. Show all details of the test, including an ANOVA table. (Hint: Recall the relationship between SSWithin, SSBetween and SSTotal). Practical Integration. (1) In their 1968 paper “Bystander intervention in emergencies: Dif- fusion of responsibility” J. Personal and Social Psych., 8:377-383, Darley and Latan reported the amount of time it took subjects to summon help for a person in trouble. Each subject in the first group thought that he/she was the only the one listening to the person. Group 2 subjects thought that there was one other person listening. Group 3 subjects thought that four other persons were listening. The original variable, time, was transformed into the variable speed, where speed = 100 (1/time). This was done so that the assumptions for ANOVA (normal data with each population having the same SD) would be more nearly satisfied. The rest of this question concerns the variable speed. Group 1 had a sample average speed of .87. For group 2 the sample average was .72, while group 3 had a sample average of .51. The sample sizes were 13, 26, and 13. (a) Suppose you just wanted to compare Group 1 to Group 2. What kind of analysis would you conduct? (b) Suppose you wanted to compare all three groups at once. What kind of analysis would you conduct? For each of the following scenarios, explain how you would analyze the data. You do not need to do the calculations, but identify the relevant parame- ter(s); in cases where you would do a hypothesis test, state hypotheses, test statistics, and explain the process of computing a P-value in cases where you would do a hypothesis test; in cases where you would want to construct a confidence interval, identify the process of finding a margin of error; (these need not be mutually exclusive).

slide-7
SLIDE 7

STAT 113: FINAL EXAM PRACTICE PROBLEMS 7

(2) In a random sample of 27 green “holiday M&Ms,” there are three patterns: a candle, two bells, or a Christmas tree. In the sample there were 8 M&Ms with trees, 9 with candles, and 10 with bells. Are these numbers consistent with the claim that the three patterns are equally likely? (3) (Taken from Journal of Advertising, January 1984:40-44, via Devore and Peck, Statistics, p. 341.) A survey of 154 residents of Washing- ton, DC, found that 58 felt that the use of subliminal advertising was

  • acceptable. (Among other things, it might be of interest to test the

claim that at least half of all DC residents find subliminal advertising acceptable.) (4) A sample of 15 patients was randomly split into two groups as part

  • f a double-blind experiment to compare two pain relievers. The

7 patients in the first group were given Demerol and reported the following numbers of hours of pain relief: 2 6 4 13 5 8 4 The 8 patients in the second group were given an unnamed exper- imental drug and reported the following numbers of hours of pain relief: 8 1 4 2 2 1 3 (5) Medical researchers in Italy were interested in whether the use of condoms reduces the risk of HIV infection. They studied heterosex- ual couples in which one (and only one) partner was HIV positive at the onset of the study. Among 171 couples who always used con- doms, 3 partners became infected with HIV during the study (which lasted 3 years). Among 55 couples who did not always use condoms, 8 partners became infected. (6) (From Consumer Reports, June 1986, pp. 366-367; via Moore and McCabe, Introduction to the Practice of Statistics). Calories (X) and milligrams of sodium (Y ) were measured for a sample of 17 beef hot dogs. The correlation between X and Y is .887. The regression equation is ˆ Y = −228.3 + 4.00X and the SE of the coefficient of X is .4922. (7) Joseph Bresee et al. (“Hepatitis C virus infection associated with ad- ministration of intravenous immune globulin,” J. Amer.Med. Assoc., (1996) 276:1563-1567) studied persons who had recieved intravenous immune globulin (IGIV) to see if they had developed infections of hepatitis C virus (HCV). In part of their analysis, they considered doses of Gammagard (an IGIV product) received by 210 patients.

slide-8
SLIDE 8

8 COLIN REIMER DAWSON, FALL 2015

They divided the patients into 4 groups according to the number

  • f doses of “Gammagard made from unscreened or first-generation

anti-HCV-screened plasma.” Here are the data:

HCV Infection? Yes No Total Doses 0-3 4 44 48 4-20 2 43 45 21-65 7 50 57 > 65 10 41 51 Total 23 178 201