Midterm II Review Sta 101 - Fall 2018 Todays office hours changed - - PowerPoint PPT Presentation

▶

Sep 01, 2023 36 likes •148 views

Announcements Midterm II Review Sta 101 - Fall 2018 Todays office hours changed to 2 - 3pm Office hours Wednesday 2 - 3pm Duke University, Department of Statistical Science No office hours on Thursday Dr. Abrahamsen Slides posted

SLIDE 1

Midterm II Review

Sta 101 - Fall 2018

Duke University, Department of Statistical Science

Dr. Abrahamsen

Slides posted at https://stat.duke.edu/courses/Fall18/sta101.002

Announcements ▶ Today’s office hours changed to 2 - 3pm ▶ Office hours Wednesday 2 - 3pm ▶ No office hours on Thursday

1

Midterm 2 ▶ When: Thursday, Nov 8 - In class ▶ What to bring:

– Scientific calculator (graphing calculator ok, No Phones!) – Cheat sheet (can be typed)

▶ Provided: Z, t and χ2 tables

2

Exam Format ▶ Covers HT from Unit 3, Units 4 and Unit 5 ▶ 3 written questions - 60 pts ▶ 5 T/F questions - 2 pts each ▶ 10 multiple choice questions - 3 pts each

3

SLIDE 2

What should you know?

4

Unit 4.1 - Inference for Numerical Variables ▶ Two mean testing problems

– Independent means – Paired (dependent) means

▶ Conditions

– Independence – Approximate Normality

5

All other details of the inferential framework is the same...

HT : test statistic = point estimate − null SE CI : point estimate ± critical value × SE One mean:

df = n − 1

HT: H0 : µ = µ0 Tdf = ¯

x−µ

s √n

CI: ¯ x ± t⋆

df s √n

Paired means:

df = ndiff − 1

HT: H0 : µdiff = 0 Tdf = ¯

xdiff−0

sdiff

√ndiff

CI: ¯ xdiff ± t⋆

df sdiff √ndiff

Independent means:

df = min(n1 − 1, n2 − 1)

HT: H0 : µ1 − µ2 = 0 Tdf =

¯ x1−¯ x2 √

s2 1 n1 + s2 2 n2

CI: ¯ x1 − ¯ x2 ± t⋆

df

√

s2

1 n1 + s2

2 n2 6

Clicker question

A study examining the relationship between weights of school children and absences found a 95% confidence interval for the difference between the average number of days missed by

verweight and non-overweight children (µoverweight − µnon−overweight)

to be 1.3 days to 2.8 days. According to this interval, we are 95% confident that overweight children on average miss

1. 1.3 days fewer to 2.8 days more
2. 1.3 to 2.8 days more
3. 1.3 to 2.8 days fewer
4. 1.3 days more to 2.8 days fewer

than non-overweight children.

7

SLIDE 3

Unit 4.2 - Bootstrapping ▶ Bootstrapping works as follows:

(1) take a bootstrap sample - a random sample taken with replacement

from the original sample, of the same size as the original sample

(2) calculate the bootstrap statistic - a statistic such as mean, median,

proportion, etc. computed on the bootstrap samples

(3) repeat steps (1) and (2) many times to create a bootstrap distribution - a

distribution of bootstrap statistics

▶ The XX% bootstrap confidence interval can be estimated by

– the cutoff values for the middle XX% of the bootstrap distribution, OR – point estimate ± t⋆SEboot

8

Bootstrap interval, standard error

For a random sample of 20 Horror movies, the dot plot below shows the distribution of 100 bootstrap medians of the Rotten Tomatoes audience scores. The median of the original sample is 43.5 and the bootstrap standard error is 4.88. Estimate the 90% bootstrap confidence interval for the median RT score of horror movies using the standard error method.

bootstrap medians

35 40 45 50 55

9

Unit 4.3: Power

Decision fail to reject H0 reject H0 H0 true 1 − α Type 1 Error, α Truth HA true Type 2 Error, β Power, 1 − β

▶ Type 1 error is rejecting H0 when you shouldn’t have, and the

probability of doing so is α (significance level)

▶ Type 2 error is failing to reject H0 when you should have, and

the probability of doing so is β (a little more complicated to calculate)

▶ Power of a test is the probability of correctly rejecting H0, and

the probability of doing so is 1 − β

▶ In hypothesis testing, we want to keep α and β low, but there

are inherent trade-offs.

10

Example - Medical history surveys

A medical research group is recruiting people to complete short surveys about their medical history. For example, one survey asks for information on a person’s family history in regards to cancer. Another survey asks about what topics were discussed during the person’s last visit to a hospital. So far, on average people complete an average of 4 surveys, with the standard deviation of 2.2 surveys. The research group wants to try a new interface that they think will encourage new enrollees to complete more surveys, where they will randomize a total of 300 enrollees to either get the new interface or the current interface (equally distributed between the two groups). What is the power of the test that can detect an increase

f 0.5 surveys per enrollee for the new interface compared to the old interface?

Assume that the new interface does not affect the standard deviation of completed surveys, and α = 0.05.

11

SLIDE 4

Calculating power

The preceeding question can be rephrased as – How likely is it that we can reject a null hypothesis of H0 : µnew − µcurrent = 0 if the new interface results in an increase of 0.5 surveys per enrollee, on average? Let’s break this down intro two simpler problems:

1. Problem 1: Which values of (¯

xnew − ¯ xcurrent) represent sufficient evidence to reject this H0?

2. Problem 2: What is the probability that we would reject this H0

if ¯ xnew − ¯ xcurrent had come from a distribution with µnew − µcurrent = 0.5, i.e. what is the probability that we can

btain such an observed difference from this distribution?

12

Problem 1

Which values of (¯ xnew interface − ¯ xold interface) represent sufficient evidence to reject H0? H0 : µnew − µcurrent = 0 HA : µnew − µcurrent > 0 nnew = ncurrent = 150

13

Problem 1 - cont.

Clicker question

What is the lowest t-score that will allow us to reject the null hypothesis in favor of the alternative? H0 : µnew − µcurrent = 0 HA : µnew − µcurrent > 0 nnew = ncurrent = 150, α = 0.05

(a) 1.65 (b) 1.66 (c) 1.96 (d) 1.98 (e) 2.63

t* = ?

0.05 14

Problem 1 - cont.

Clicker question Which values of (¯ xnew − ¯ xcurrent) represent sufficient evidence to reject H0? H0 : µnew − µcurrent = 0 HA : µnew − µcurrent > 0 nnew = ncurrent = 150, α = 0.05, snew = 2.2 = scurrent = 2.2

(a) ¯

xnew − ¯ xcurrent < −0.42

(b) ¯

xnew − ¯ xcurrent > −0.42

(c) ¯

xnew − ¯ xcurrent < 0.42

(d) ¯

xnew − ¯ xcurrent > 0.42

(e) ¯

xnew − ¯ xcurrent > 1.66

15

SLIDE 5

Problem 2

Clicker question What is the probability that we would reject this H0 if ¯ xnew − ¯ xcurrent had come from a distribution with µnew − µcurrent = 0.5, i.e. what is the probability that we can obtain such an observed difference from this distribution? H0 : µnew − µcurrent = 0 HA : µnew − µcurrent > 0 nnew = ncurrent = 150, α = 0.05, snew = 2.2 = scurrent = 2.2

(a) 5% (b) 38% (c) 62% (d) 80% (e) 95% 16

Problem 2 - cont.

Clicker question

What is β, the Type 2 error rate?

(a) 5% (b) 38% (c) 62% (d) 80% (e) 95% 17

Unit 4.4: Analysis of VAriance (ANOVA) ▶ ANOVA tests for some difference in means of many different

groups

▶ Conditions

1. Independence:

(a) within group: sampled observations must be independent (b) between group: groups must be independent of each other

2. Approximate normality: distribution should be nearly normal within each

group

3. Equal variance: groups should have roughly equal variability

18

ANOVA tests for some difference in means of many different groups

Null hypothesis: H0 : µplacebo = µpurple = µbrown = . . . = µpeach = µorange.

Clicker question

Which of the following is a correct statement of the alternative hypothesis?

(a) For any two groups, including the placebo group, no two group

means are the same.

(b) For any two groups, not including the placebo group, no two

group means are the same.

(c) Amongst the jelly bean groups, there are at least two groups

that have different group means from each other.

(d) Amongst all groups, there are at least two groups that have

different group means from each other.

19

SLIDE 6

F-statistic: F = SSG / (k − 1) SSE / (n − k) = MSG MSE k: # of groups; n: # of obs.

Df Sum Sq Mean Sq F value Pr(>F) Between groups k − 1 SSG MSG Fobs pobs Within groups n − k SSE MSE Total n − 1 SSG+SSE

Note: F distribution is defined by two dfs: dfG = k − 1 and dfE = n − k The p-value will be given on exam, compare with the standard α level.

20

To identify which means are different, use t-tests and the Bonferroni correction ▶ If the ANOVA yields a significant results, next natural question

is: “Which means are different?”

▶ Use t-tests comparing each pair of means to each other,

– with a common variance (MSE from the ANOVA table) instead of each group’s variances in the calculation of the standard error, – and with a common degrees of freedom (dfE from the ANOVA table)

▶ Compare resulting p-values to a modified significance level

α⋆ = α K where K = k(k−1)

2

is the total number of pairwise tests

21

To identify which means are different, use t-tests and the Bonferroni correction

You will not be asked to perform the actual tests, but you should know:

▶ How to compute the adjusted Bonferonni significance level α∗. ▶ How to compute the standard error for this test. ▶ The associated degrees of freedom for the test statistic.

22

Unit 4.4: ANOVA

Application Exercise 4.4 Df Sum Sq Mean Sq F p- value Rank 2 1.59 0.795 2.74 0.066 Residuals 460 135.07 0.29 Total 462 136.66 What percent of the total variability in evaluation scores is explained by instructor rank?

23

SLIDE 7

Unit 4.4: ANOVA

Application Exercise 4.4 Df Sum Sq Mean Sq F p- value Rank 2 1.59 0.795 2.74 0.066 Residuals 460 135.07 0.29 Total 462 136.66 What significance level should be used for a pair-wise post hoc test comparing the evaluation scores of teaching professors and tenured professors?

24

Unit 5.1: Inference for a Single Proportion

Distribution of ˆ p Central limit theorem for proportions: Sample proportions will be nearly normally distributed with mean equal to the population mean, p, and standard error equal to √

p (1−p) n

. ˆ p ∼ N ( mean = p, SE = √ p (1 − p) n ) Conditions:

▶ Independence: Random sample/assignment + 10% rule ▶ At least 10 successes and failures

25

Unit 5.1: Inference for a Single Proportion

HT vs. CI for a proportion

▶ Success-failure condition:

– CI: At least 10 observed successes and failures – HT: At least 10 expected successes and failures, calculated using the null value

▶ Standard error:

– CI: calculate using observed sample proportion: SE = √

ˆ p(1−ˆ p) n

– HT: calculate using the null value: SE = √

p0(1−p0) n

26

Recap on simulation methods

If the S-F condition is not met

▶ HT: Randomization test – simulate under the assumption that

H0 is true, then find the p-value as proportion of simulations where the simulated ˆ p is at least as extreme as the one

bserved.

▶ CI: Bootstrap interval – resample with replacement from the

riginal sample, and construct interval using percentile or

standard error method.

27

SLIDE 8

Randomization Test

Clicker question

A report on your local TV station says that 60% of the city’s residents support using limited city funds to hire and train more police officers. A second local news station has picked up this story, and they claim that certainly less than 60% of residents support the additional hiring and

training. In order to test this claim the second news station takes a random

sample of 100 residents and finds that 57 of them (57%) support the use

f limited funds to hire additional police officers.

28

Clicker question Which of the following is the correct set-up for calculating the p-value for this test?

(a) Roll a 10-sided die (outcomes 1-10) 100 times and record the proportion of

times you get a 6 or lower. Repeat this many times, and calculate the proportion of simulations where the sample proportion is 57% or less.

(b) Roll a 10-sided die (outcomes 1-10) 100 times and record the proportion of

times you get a 6 or lower. Repeat this many times, and calculate the proportion of simulations where the sample proportion is 60% or less.

(c) In a bag place 100 chips, 57 red and 43 blue. Randomly sample 100 chips,

with replacement, and record the proportion of red chips in the sample. Repeat this many times, and calculate the proportion of samples where 57%

r more of the chips are red.

(d) Randomly sample 100 residents of a nearby city, record how many of the

them who support the hiring and training of additional police officers. Repeat this many times and calculate the proportion of samples where at least 57%

f the residents support additional hiring and training.

29

Unit 5.2: Inference for Two Proportions

CLT also describes the distribution of ˆ p1 − ˆ p2 (ˆ p1−ˆ p2) ∼ N  mean = (p1 − p2), SE = √ p1(1 − p1) n1 + p2(1 − p2) n2   Conditions:

▶ Independence: Random sample/assignment + 10% rule ▶ Sample size / skew: At least 10 successes and failures

30

Unit 5.2: Inference for Two Proportions

For HT where H0 : p1 = p2, pool! As with working with a single proportion,

▶ When doing a HT where H0 : p1 = p2 (almost always for HT),

use expected counts / proportions for S-F condition and calculation of the standard error.

▶ Otherwise use observed counts / proportions for S-F condition

and calculation of the standard error. Expected proportion of success for both groups when H0 : p1 = p2 is defined as the pooled proportion: ˆ ppool = total successes total sample size = suc1 + suc2 n1 + n2

31

SLIDE 9

Summary

Type Parameter Estimator SE Sampling Dist. One mean µ ¯ x s/√n tn−1 Two means Paired data µdiff ¯ xdiff sd/√n tn−1 Two means tdf µ1 − µ2 ¯ x1 − ¯ x2 √

s2 1 n1 + s2 2 n2

for df use Independent min{n1 − 1, n2 − 1} C.I. √

ˆ p(1−ˆ p) n

One prop p ˆ p Z H.T. √

p0(1−p0) n

C.I. √

ˆ p1(1−ˆ p1) n1

+ ˆ

p2(1−ˆ p2) n2

Two prop p1 − p2 ˆ p1 − ˆ p2 Z H.T. √

ˆ ppool(1−ˆ ppool) n1

+

ˆ ppool(1−ˆ ppool) n2

HT : test statistic = point estimate − null SE CI : point estimate ± critical value × SE

32

Unit 5.3: χ2 Tests

Categorical data with more than 2 levels → χ2

▶ one variable: χ2 test of goodness of fit, no CI ▶ two variables: χ2 test of independence, no CI

Conditions for χ2 testing

1. Independence: In addition to what we previously discussed for

independence, each case that contributes a count to the table must be independent of all the other cases in the table.

2. Sample size / distribution: Each cell must have at least 5

expected cases.

33

The χ2 statistic

χ2 statistic: When dealing with counts and investigating how far the

bserved counts are from the expected counts, we use a new test

statistic called the chi-square (χ2) statistic: χ2 =

k

∑

i=1

(O − E)2 E where k = total number of cells Important points:

▶ Use counts (not proportions) in the calculation of the text

statistic, even though we’re truly interested in the proportions for inference

▶ Expected counts are calculated assuming the null hypothesis is

true

34

The χ2 distribution

The χ2 distribution has just one parameter, degrees of freedom (df), which influences the shape, center, and spread of the distribution.

▶ For χ2 GOF test: df = k − 1 ▶ For χ2 independence test: df = (R − 1) × (C − 1)

5 10 15 20 25 Degrees of Freedom 2 4 9

35

SLIDE 10

Example: χ2 Tests for Independence

Example: Does money make people happy? (Data from GSS) Not too Pretty Very Family Income Happy Happy Happy Total Above average 26 233 164 423 Average 117 473 293 883 Below average 172 383 132 687 Total 315 1089 589 1993 We want to test if there is an association between money and happiness. Assumptions:

▶ SRS (OK since GSS is considered a SRS) ▶ The expected cell count ≥ 5 for all cells.

36

Example: χ2 Tests for Independence

Hypothesis Testing: H0 : Happiness is independent of family income HA : Happiness is associated with family income Test Statistic: χ2 = ∑ (observed - expected)2 expected p-value: Computed from χ2 table df = (# rows - 1)(# columns - 1) Income & Happiness Example: df = (3 − 1) × (3 − 1) = 4.

37

Example: χ2 Tests for Independence

Computing the Test Statistic:

▶ Observed counts - given ▶ Expected counts for each cell:

Expected = Row total × Column total Total Expected Counts: Not too Pretty Very Income Happy Happy Happy Total Above Avg

423×315 1993 423×1089 1993 423×589 1993

423 Average

883×315 1993 883×1089 1993 883×589 1993

883 Below Avg

687×315 1993 687×1089 1993 687×589 1993

687 Total 315 1089 589 1993

38

Example: χ2 Tests for Independence

Computing the Test Statistic:

▶ Observed counts - given ▶ Expected counts for each cell:

Expected = Row total × Column total Total Expected Counts: Not too Pretty Very Income Happy Happy Happy Total Above Avg 66.86 231.13 125.01 423 Average 139.56 482.48 260.96 883 Below Avg 108.58 375.39 203.03 687 Total 315 1089 589 1993

39

SLIDE 11

Example: χ2 Tests for Independence

Contribution to Test Statistic for Each Cell: (observed - expected)2 expected

(26−66.86)2 66.86

= 24.97

(223−231.13)2 231.13

= 0.02

(164−125.01)2 125.01

= 12.16

(117−139.56)2 139.56

= 3.65

(473−482.48)2 482.48

= 0.186

(293−260.96)2 260.96

= 3.93

(172−108.58)2 108.58

= 37.04

(383−375.39)2 375.39

= 0.15

(132−203.02)2 203.02

= 24.85 Test Statistic: Add up all values in the table χ2

calc = 24.97 + 0.02 + 12.16 + 3.65 + 0.186 + 3.93

+ 37.04 + 0.15 + 24.85 = 106.96

40

Example: χ2 Tests for Independence

p-value: From χ2 table with (3 − 1) × (3 − 1) = 4, p-value ≈ 0 Conclusion: Reject H0 at all α-levels and conclude that there is an association between Happiness and Income.

41

Example: χ2 Tests for Independence

To see what type of relationship there is between Happiness and Income, compute the residuals. residuals = observed - expected Residuals Not too Pretty Very Income Happy Happy Happy Above Avg 26 − 66.86 233 − 231.13 164 − 125.01 Average 117 − 139.56 473 − 482.48 293 − 260.96 Below Avg 172 − 108.58 383 − 375.39 132 − 203.03

Above Average Income: We observe fewer than expected Not too Happy people and higher than expected Very Happy people. Below Average Income: We observe higher than expected Not too Happy people and fewer than expected Very Happy people.

42

Example: χ2 Tests for Independence

Conclusion from Residuals: We see that less income is associated with lower levels of happiness, higher income with greater happiness. HOWEVER, we can NOT say money makes you happy (no causal effect).

43