[PPT] - Unit 3: Foundations for inference Lecture 3: Decision errors, PowerPoint Presentation

SLIDE 1

Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, power, and bootstrapping

Statistics 101

Thomas Leininger

June 3, 2013

SLIDE 2

Decision errors

1

Decision errors Type 1 and Type 2 errors Error rates & power Power

2

Bootstrapping

3

Randomization testing

Statistics 101 U3 - L4: Decision errors, significance levels, sample size, and power Thomas Leininger

SLIDE 3

Decision errors Type 1 and Type 2 errors

1

Decision errors Type 1 and Type 2 errors Error rates & power Power

2

Bootstrapping

3

Randomization testing

Statistics 101 U3 - L4: Decision errors, significance levels, sample size, and power Thomas Leininger

SLIDE 4

Decision errors Type 1 and Type 2 errors

Decision errors

Hypothesis tests are not flawless. In the court system innocent people are sometimes wrongly convicted and the guilty sometimes walk free. Similarly, we can make a wrong decision in statistical hypothesis tests as well. The difference is that we have the tools necessary to quantify how often we make errors in statistics.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 2 / 23

SLIDE 5

Decision errors Type 1 and Type 2 errors

Decision errors (cont.)

There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but

ur choice might be incorrect.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 3 / 23

SLIDE 6

Decision errors Type 1 and Type 2 errors

Decision errors (cont.)

There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but

ur choice might be incorrect.

Decision fail to reject H0 reject H0

H0 true

Truth

HA true

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 3 / 23

SLIDE 7

Decision errors Type 1 and Type 2 errors

Decision errors (cont.)

There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but

ur choice might be incorrect.

Decision fail to reject H0 reject H0

H0 true

Truth

HA true

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 3 / 23

SLIDE 8

Decision errors Type 1 and Type 2 errors

Decision errors (cont.)

There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but

ur choice might be incorrect.

Decision fail to reject H0 reject H0

H0 true

Truth

HA true

Statistics 101 (Thomas Leininger)

U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 3 / 23

SLIDE 9

Decision errors Type 1 and Type 2 errors

Decision errors (cont.)

There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but

ur choice might be incorrect.

Decision fail to reject H0 reject H0

H0 true

Type 1 Error

Truth

HA true

A Type 1 Error is rejecting the null hypothesis when H0 is true.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 3 / 23

SLIDE 10

Decision errors Type 1 and Type 2 errors

Decision errors (cont.)

There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but

ur choice might be incorrect.

Decision fail to reject H0 reject H0

H0 true

Type 1 Error

Truth

HA true

Type 2 Error

A Type 1 Error is rejecting the null hypothesis when H0 is true.

A Type 2 Error is failing to reject the null hypothesis when HA is true.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 3 / 23

SLIDE 11

Decision errors Type 1 and Type 2 errors

Decision errors (cont.)

There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but

ur choice might be incorrect.

Decision fail to reject H0 reject H0

H0 true

Type 1 Error

Truth

HA true

Type 2 Error

A Type 1 Error is rejecting the null hypothesis when H0 is true.

A Type 2 Error is failing to reject the null hypothesis when HA is true. We (almost) never know if H0 or HA is true, but we need to consider all possibilities.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 3 / 23

SLIDE 12

Decision errors Type 1 and Type 2 errors

Hypothesis Test as a trial

If we again think of a hypothesis test as a criminal trial then it makes sense to frame the verdict in terms of the null and alternative hypotheses:

H0 : Defendant is innocent HA : Defendant is guilty

Which type of error is being committed in the following cirumstances? Declaring the defendant innocent when they are actually guilty Declaring the defendant guilty when they are actually innocent

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 4 / 23

SLIDE 13

Decision errors Type 1 and Type 2 errors

Hypothesis Test as a trial

If we again think of a hypothesis test as a criminal trial then it makes sense to frame the verdict in terms of the null and alternative hypotheses:

H0 : Defendant is innocent HA : Defendant is guilty

Which type of error is being committed in the following cirumstances? Declaring the defendant innocent when they are actually guilty Type 2 error Declaring the defendant guilty when they are actually innocent

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 4 / 23

SLIDE 14

Decision errors Type 1 and Type 2 errors

Hypothesis Test as a trial

If we again think of a hypothesis test as a criminal trial then it makes sense to frame the verdict in terms of the null and alternative hypotheses:

H0 : Defendant is innocent HA : Defendant is guilty

Which type of error is being committed in the following cirumstances? Declaring the defendant innocent when they are actually guilty Type 2 error Declaring the defendant guilty when they are actually innocent Type 1 error

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 4 / 23

SLIDE 15

Decision errors Type 1 and Type 2 errors

Hypothesis Test as a trial

If we again think of a hypothesis test as a criminal trial then it makes sense to frame the verdict in terms of the null and alternative hypotheses:

H0 : Defendant is innocent HA : Defendant is guilty

Which type of error is being committed in the following cirumstances? Declaring the defendant innocent when they are actually guilty Type 2 error Declaring the defendant guilty when they are actually innocent Type 1 error Which error do you think is the worse error to make?

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 4 / 23

SLIDE 16

Decision errors Type 1 and Type 2 errors

Hypothesis Test as a trial

If we again think of a hypothesis test as a criminal trial then it makes sense to frame the verdict in terms of the null and alternative hypotheses:

H0 : Defendant is innocent HA : Defendant is guilty

Which type of error is being committed in the following cirumstances? Declaring the defendant innocent when they are actually guilty Type 2 error Declaring the defendant guilty when they are actually innocent Type 1 error Which error do you think is the worse error to make?

“better that ten guilty persons escape than that one innocent suffer” – William Blackstone

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 4 / 23

SLIDE 17

Decision errors Error rates & power

1

Decision errors Type 1 and Type 2 errors Error rates & power Power

2

Bootstrapping

3

Randomization testing

Statistics 101 U3 - L4: Decision errors, significance levels, sample size, and power Thomas Leininger

SLIDE 18

Decision errors Error rates & power

Type 1 error rate

As a general rule we reject H0 when the p-value is less than 0.05, i.e. we use a significance level of 0.05, α = 0.05.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 5 / 23

SLIDE 19

Decision errors Error rates & power

Type 1 error rate

As a general rule we reject H0 when the p-value is less than 0.05, i.e. we use a significance level of 0.05, α = 0.05. This means that, for those cases where H0 is actually true, we do not want to incorrectly reject it more than 5% of those times.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 5 / 23

SLIDE 20

Decision errors Error rates & power

Type 1 error rate

As a general rule we reject H0 when the p-value is less than 0.05, i.e. we use a significance level of 0.05, α = 0.05. This means that, for those cases where H0 is actually true, we do not want to incorrectly reject it more than 5% of those times. In other words, when using a 5% significance level there is about 5% chance of making a Type 1 error.

P(Type 1 error) = α

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 5 / 23

SLIDE 21

Decision errors Error rates & power

Type 1 error rate

As a general rule we reject H0 when the p-value is less than 0.05, i.e. we use a significance level of 0.05, α = 0.05. This means that, for those cases where H0 is actually true, we do not want to incorrectly reject it more than 5% of those times. In other words, when using a 5% significance level there is about 5% chance of making a Type 1 error.

P(Type 1 error) = α

This is why we prefer to small values of α – increasing α increases the Type 1 error rate.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 5 / 23

SLIDE 22

Decision errors Error rates & power

Filling in the table...

Decision fail to reject H0 reject H0

H0 true

Truth

HA true

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 6 / 23

SLIDE 23

Decision errors Error rates & power

Filling in the table...

Decision fail to reject H0 reject H0

H0 true

Type 1 Error, α Truth

HA true

Type 1 error is rejecting H0 when you shouldn’t have, and the probability of doing so is α (significance level)

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 6 / 23

SLIDE 24

Decision errors Error rates & power

Filling in the table...

Decision fail to reject H0 reject H0

H0 true

Type 1 Error, α Truth

HA true

Type 2 Error, β Type 1 error is rejecting H0 when you shouldn’t have, and the probability of doing so is α (significance level) Type 2 error is failing to reject H0 when you should have, and the probability of doing so is β (a little more complicated to calculate)

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 6 / 23

SLIDE 25

Decision errors Error rates & power

Filling in the table...

Decision fail to reject H0 reject H0

H0 true 1 − α

Type 1 Error, α Truth

HA true

Type 2 Error, β Type 1 error is rejecting H0 when you shouldn’t have, and the probability of doing so is α (significance level) Type 2 error is failing to reject H0 when you should have, and the probability of doing so is β (a little more complicated to calculate) Power of a test is the probability of correctly rejecting H0, and the probability of doing so is 1 − β

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 6 / 23

SLIDE 26

Decision errors Error rates & power

Filling in the table...

Decision fail to reject H0 reject H0

H0 true 1 − α

Type 1 Error, α Truth

HA true

Type 2 Error, β Power, 1 − β Type 1 error is rejecting H0 when you shouldn’t have, and the probability of doing so is α (significance level) Type 2 error is failing to reject H0 when you should have, and the probability of doing so is β (a little more complicated to calculate) Power of a test is the probability of correctly rejecting H0, and the probability of doing so is 1 − β In hypothesis testing, we want to keep α and β low, but there are inherent trade-offs.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 6 / 23

SLIDE 27

Decision errors Error rates & power

A quick example

In a cancer screening, what happens if we conclude a patient has cancer and they do in fact have cancer? What if they didn’t have cancer (but we concluded that they did)? What if we conclude the patient has cancer but we conclude that they do not have cancer?

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 7 / 23

SLIDE 28

Decision errors Error rates & power

Type 2 error rate

If the alternative hypothesis is actually true, what is the chance that we make a Type 2 Error, i.e. we fail to reject the null hypothesis even when we should reject it? The answer is not obvious. If the true population average is very close to the null hypothesis value, it will be difficult to detect a difference (and reject H0). If the true population average is very different from the null hypothesis value, it will be easier to detect a difference. Clearly, β depends on the effect size (δ)

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 8 / 23

SLIDE 29

Decision errors Power

1

Decision errors Type 1 and Type 2 errors Error rates & power Power

2

Bootstrapping

3

Randomization testing

Statistics 101 U3 - L4: Decision errors, significance levels, sample size, and power Thomas Leininger

SLIDE 30

Decision errors Power

Example - Blood Pressure

Blood pressure oscillates with the beating of the heart, and the systolic pressure is defined as the peak pressure when a person is at rest. The average systolic blood pressure for people in the U.S. is about 130 mmHg with a standard deviation of about 25 mmHg. We are interested in finding out if the average blood pressure of employees at a certain company is greater than the national average, so we collect a random sample

f 100 employees and measure their systolic blood pressure. What are the

hypotheses?

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 9 / 23

SLIDE 31

Decision errors Power

Example - Blood Pressure

Blood pressure oscillates with the beating of the heart, and the systolic pressure is defined as the peak pressure when a person is at rest. The average systolic blood pressure for people in the U.S. is about 130 mmHg with a standard deviation of about 25 mmHg. We are interested in finding out if the average blood pressure of employees at a certain company is greater than the national average, so we collect a random sample

f 100 employees and measure their systolic blood pressure. What are the

hypotheses? H0 : µ = 130 HA : µ > 130

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 9 / 23

SLIDE 32

Decision errors Power

Example - Blood Pressure

Blood pressure oscillates with the beating of the heart, and the systolic pressure is defined as the peak pressure when a person is at rest. The average systolic blood pressure for people in the U.S. is about 130 mmHg with a standard deviation of about 25 mmHg. We are interested in finding out if the average blood pressure of employees at a certain company is greater than the national average, so we collect a random sample

f 100 employees and measure their systolic blood pressure. What are the

hypotheses? H0 : µ = 130 HA : µ > 130 We’ll start with a very specific question – “What is the power of this hypothesis test to correctly detect an increase of 2 mmHg in average blood pressure?”

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 9 / 23

SLIDE 33

Decision errors Power

Problem 1

Which values of ¯

x represent sufficient evidence to reject H0?

(Remember H0 : µ = 130, HA : µ > 130)

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 10 / 23

SLIDE 34

Decision errors Power

Problem 1

Which values of ¯

x represent sufficient evidence to reject H0?

(Remember H0 : µ = 130, HA : µ > 130) P(Z > z) < 0.05 ⇒ z > 1.65 ¯ x − µ s/ √n > 1.65 ¯ x > 130 + 1.65 × 2.5 ¯ x > 134.125

130 134.125 0.05

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 10 / 23

SLIDE 35

Decision errors Power

Problem 1

Which values of ¯

x represent sufficient evidence to reject H0?

(Remember H0 : µ = 130, HA : µ > 130) P(Z > z) < 0.05 ⇒ z > 1.65 ¯ x − µ s/ √n > 1.65 ¯ x > 130 + 1.65 × 2.5 ¯ x > 134.125

130 134.125 0.05

Any ¯

x > 134.125 would be sufficient to reject H0 at the 5%

significance level.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 10 / 23

SLIDE 36

Decision errors Power

Problem 2

What is the probability that we would reject H0 if ¯ x did come from N(mean = 132, SE = 2.5).

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 11 / 23

SLIDE 37

Decision errors Power

Problem 2

What is the probability that we would reject H0 if ¯ x did come from N(mean = 132, SE = 2.5).

This is the same as finding the area above ¯ x = 134.125 if ¯ x came from N(132, 2.5).

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 11 / 23

SLIDE 38

Decision errors Power

Problem 2

What is the probability that we would reject H0 if ¯ x did come from N(mean = 132, SE = 2.5).

This is the same as finding the area above ¯ x = 134.125 if ¯ x came from N(132, 2.5). Z = 134.125 − 132 2.5 = 0.85 P(Z > 0.85) = 1 − 0.8023 = 0.1977

132 134.125

0.8023 0.1977

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 11 / 23

SLIDE 39

Decision errors Power

Problem 2

What is the probability that we would reject H0 if ¯ x did come from N(mean = 132, SE = 2.5).

This is the same as finding the area above ¯ x = 134.125 if ¯ x came from N(132, 2.5). Z = 134.125 − 132 2.5 = 0.85 P(Z > 0.85) = 1 − 0.8023 = 0.1977

132 134.125

0.8023 0.1977

The probability of rejecting H0 : µ = 130, if the true average systolic blood pressure of employees at this company is 132 mmHg, is 0.1977 which is the power of this test. Therefore, β = 0.8023 for this test.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 11 / 23

SLIDE 40

Decision errors Power

Putting it all together

Systolic blood pressure 120 125 130 135 140 Null distribution

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 12 / 23

SLIDE 41

Decision errors Power

Putting it all together

Systolic blood pressure 120 125 130 135 140 Null distribution Power distribution

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 12 / 23

SLIDE 42

Decision errors Power

Putting it all together

Systolic blood pressure 120 125 130 135 140 Null distribution Power distribution

0.05 Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 12 / 23

SLIDE 43

Decision errors Power

Putting it all together

Systolic blood pressure 120 125 130 135 140 Null distribution Power distribution

0.05 134.125 Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 12 / 23

SLIDE 44

Decision errors Power

Putting it all together

Systolic blood pressure 120 125 130 135 140 Null distribution Power distribution

0.05 134.125

Power

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 12 / 23

SLIDE 45

Decision errors Power

Achieving desired power

There are several ways to increase power (and hence decrease type 2 error rate):

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 13 / 23

SLIDE 46

Decision errors Power

Achieving desired power

There are several ways to increase power (and hence decrease type 2 error rate):

1

Increase the sample size.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 13 / 23

SLIDE 47

Decision errors Power

Achieving desired power

There are several ways to increase power (and hence decrease type 2 error rate):

1

Increase the sample size.

2

Decrease the standard deviation of the sample, which essentially has the same effect as increasing the sample size (it will decrease the standard error).

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 13 / 23

SLIDE 48

Decision errors Power

Achieving desired power

There are several ways to increase power (and hence decrease type 2 error rate):

1

Increase the sample size.

2

Decrease the standard deviation of the sample, which essentially has the same effect as increasing the sample size (it will decrease the standard error).

3

Increase α, which will make it more likely to reject H0 (but note that this has the side effect of increasing the Type 1 error rate).

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 13 / 23

SLIDE 49

Decision errors Power

Achieving desired power

There are several ways to increase power (and hence decrease type 2 error rate):

1

Increase the sample size.

2

Decrease the standard deviation of the sample, which essentially has the same effect as increasing the sample size (it will decrease the standard error).

3

Increase α, which will make it more likely to reject H0 (but note that this has the side effect of increasing the Type 1 error rate).

4

Consider a larger effect size δ.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 13 / 23

SLIDE 50

Decision errors Power

Choosing sample size for a particular margin of error

If I want to predict the proportion of US voters who approve of Presi- dent Obama and I want to have a margin of error of 2% or less, how many people do I need to sample?

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 14 / 23

SLIDE 51

Decision errors Power

Choosing sample size for a particular margin of error

If I want to predict the proportion of US voters who approve of Presi- dent Obama and I want to have a margin of error of 2% or less, how many people do I need to sample?

1

Given desired error level m, we need m ≥ ME = z⋆ σ

√n.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 14 / 23

SLIDE 52

Decision errors Power

Choosing sample size for a particular margin of error

If I want to predict the proportion of US voters who approve of Presi- dent Obama and I want to have a margin of error of 2% or less, how many people do I need to sample?

1

Given desired error level m, we need m ≥ ME = z⋆ σ

√n.

2

To get m ≥ z⋆ σ

√n, I need

n ≥

z⋆ σ

m 2 .

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 14 / 23

SLIDE 53

Decision errors Power

Choosing sample size for a particular margin of error

If I want to predict the proportion of US voters who approve of Presi- dent Obama and I want to have a margin of error of 2% or less, how many people do I need to sample?

1

Given desired error level m, we need m ≥ ME = z⋆ σ

√n.

2

To get m ≥ z⋆ σ

√n, I need

n ≥

z⋆ σ

m 2 .

Note: This requires an estimate of σ.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 14 / 23

SLIDE 54

Bootstrapping

1

Decision errors Type 1 and Type 2 errors Error rates & power Power

2

Bootstrapping

3

Randomization testing

Statistics 101 U3 - L4: Decision errors, significance levels, sample size, and power Thomas Leininger

SLIDE 55

Bootstrapping

Rent in Durham

A random sample of 10 housing units were chosen on http://raleigh. craigslist.org after subsetting posts with the keyword “durham”. The dot plot below shows the distribution of the rents of these apartments. Can we apply the methods we have learned so far to construct a con- fidence interval using these data. Why or why not?

rent 600 800 1000 1200 1400 1600 1800

Statistics 101 (Thomas Leininger)

U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 15 / 23

SLIDE 56

Bootstrapping

An alternative approach to constructing confidence intervals is bootstrapping. This term comes from the phrase “pulling oneself up by one’s bootstraps”, which is a metaphor for accomplishing an impossible task without any outside help. In this case the impossible task is estimating a population parameter, and we’ll accomplish it using data from only the given sample.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 16 / 23

SLIDE 57

Bootstrapping

Bootstrapping works as follows:

(1) take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the

riginal sample

(2) calculate the bootstrap statistic - a statistic such as mean, median, proportion, etc. computed on the bootstrap samples (3) repeat steps (1) and (2) many times to create a bootstrap distribution - a distribution of bootstrap statistics

The 95% bootstrap confidence interval is estimated by the cutoff values for the middle 95% of the bootstrap distribution.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 17 / 23

SLIDE 58

Bootstrapping

Rent in Durham - bootstrap interval

The dot plot below shows the distribution of means of 100 bootstrap samples from the original sample. Estimate the 90% bootstrap confi- dence interval based on this bootstrap distribution.

bootstrap means 900 1000 1100 1200 1300 1400

Statistics 101 (Thomas Leininger)

U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 18 / 23

SLIDE 59

Bootstrapping

Rent in Durham - bootstrap interval

The dot plot below shows the distribution of means of 100 bootstrap samples from the original sample. Estimate the 90% bootstrap confi- dence interval based on this bootstrap distribution.

bootstrap means 900 1000 1100 1200 1300 1400

1013.9

1354.3

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 18 / 23

SLIDE 60

Randomization testing

1

Decision errors Type 1 and Type 2 errors Error rates & power Power

2

Bootstrapping

3

Randomization testing

Statistics 101 U3 - L4: Decision errors, significance levels, sample size, and power Thomas Leininger

SLIDE 61

Randomization testing

Randomization testing for a mean

We can also use a simulation method to conduct the same test. This is very similar to bootstrapping, i.e. we randomly sample with replacement from the sample, but this time we shift the bootstrap distribution to be centered at the null value. The p-value is then defined as the proportion of simulations that yield a sample mean at least as favorable to the alternative hypothesis as the observed sample mean.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 19 / 23

SLIDE 62

Randomization testing

Rent in Durham - randomization testing

According to rentjungle.com the average rent for an apartment in Durham is $854. Your random sample had a mean of $1143.2. Does this sample provide convincing evidence that the article’s estimate is an underestimate?

H0 : µ = $854 HA : µ > $854

p-value: proportion of simulations where the simulated sample mean is at least as extreme as the one observed. → 3 / 100 = 0.03

randomization means 600 700 800 900 1000 1100 1200

Statistics 101 (Thomas Leininger)

U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 20 / 23

SLIDE 63

Randomization testing

Extra Notes - Calculating Power

Begin by picking a meaningful effect size δ and a significance level α Calculate the range of values for the point estimate beyond which you would reject H0 at the chosen α level. Calculate the probability of observing a value from preceding step if the sample was derived from a population where

¯ x ∼ N(µH0 + δ, SE)

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 21 / 23

SLIDE 64

Randomization testing

Example - Using power to determine sample size

Going back to the blood pressure example, how large a sample would you need if you wanted 90% power to detect a 4 mmHg increase in average blood pressure for the hypothesis that the population average is greater than 130 mmHg at α = 0.05?

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 22 / 23

SLIDE 65

Randomization testing

Example - Using power to determine sample size

Going back to the blood pressure example, how large a sample would you need if you wanted 90% power to detect a 4 mmHg increase in average blood pressure for the hypothesis that the population average is greater than 130 mmHg at α = 0.05? Given: H0 : µ = 130, HA : µ > 130, α = 0.05, β = 0.10, σ = 25, δ = 4

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 22 / 23

SLIDE 66

Randomization testing

Example - Using power to determine sample size

Going back to the blood pressure example, how large a sample would you need if you wanted 90% power to detect a 4 mmHg increase in average blood pressure for the hypothesis that the population average is greater than 130 mmHg at α = 0.05? Given: H0 : µ = 130, HA : µ > 130, α = 0.05, β = 0.10, σ = 25, δ = 4 Step 1: Determine the cutoff – in order to reject H0 at α = 0.05, we need a sample mean that will yield a Z score of at least 1.65. ¯ x > 130 + 1.65 25 √n

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 22 / 23

SLIDE 67

Randomization testing

Example - Using power to determine sample size

Going back to the blood pressure example, how large a sample would you need if you wanted 90% power to detect a 4 mmHg increase in average blood pressure for the hypothesis that the population average is greater than 130 mmHg at α = 0.05? Given: H0 : µ = 130, HA : µ > 130, α = 0.05, β = 0.10, σ = 25, δ = 4 Step 1: Determine the cutoff – in order to reject H0 at α = 0.05, we need a sample mean that will yield a Z score of at least 1.65. ¯ x > 130 + 1.65 25 √n Step 2: Set the probability of obtaining the above ¯ x if the true population is centered at 130 + 4 = 134 to the desired power, and solve for n. P

¯

x > 130 + 1.65 25 √n

= 0.9

P         Z >

130 + 1.65 25

√n

− 134

25 √n

         = P

Z > 1.65 − 4

√n 25

= 0.9

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 22 / 23

SLIDE 68

Randomization testing

Example - Using power to determine sample size (cont.)

You can either directly solve for n, or use computation to calculate power for various n and determine the sample size that yields the desired power:

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 23 / 23

SLIDE 69

Randomization testing

Example - Using power to determine sample size (cont.)

You can either directly solve for n, or use computation to calculate power for various n and determine the sample size that yields the desired power:

200 400 600 800 1000 0.2 0.4 0.6 0.8 1.0 n power

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 23 / 23

SLIDE 70

Randomization testing

Example - Using power to determine sample size (cont.)

You can either directly solve for n, or use computation to calculate power for various n and determine the sample size that yields the desired power:

200 400 600 800 1000 0.2 0.4 0.6 0.8 1.0 n power

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 23 / 23

SLIDE 71

Randomization testing

Example - Using power to determine sample size (cont.)

You can either directly solve for n, or use computation to calculate power for various n and determine the sample size that yields the desired power:

200 400 600 800 1000 0.2 0.4 0.6 0.8 1.0 n power

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 23 / 23

SLIDE 72

Randomization testing

Example - Using power to determine sample size (cont.)

You can either directly solve for n, or use computation to calculate power for various n and determine the sample size that yields the desired power:

200 400 600 800 1000 0.2 0.4 0.6 0.8 1.0 n power

For n = 336, power = 0.9002, therefore we need 336 subjects in our sample to achieve the desired level of power for the given circumstance.

Statistics 101 (Thomas Leininger) U3 - L4: Decision errors, significance levels, sample size, and power June 3, 2013 23 / 23