Unit 3: Foundations for inference Lecture 3: Decision errors, - - PowerPoint PPT Presentation
Unit 3: Foundations for inference Lecture 3: Decision errors, - - PowerPoint PPT Presentation
Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power Statistics 101 Thomas Leininger May 31, 2013 Visualization of the day The Flesch/Flesch-Kincaid readability tests are designed to
Visualization of the day
The Flesch/Flesch-Kincaid readability tests are designed to indicate comprehension difficulty when reading a passage of contemporary academic English.
http://www.guardian.co.uk/world/interactive/2013/feb/12/state-of-the-union-reading-level Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 2 / 12
Video of the day
2013 is the International Year of Statistics
https://www.youtube.com/watch?feature=player embedded&v=nTBZuQR7dRc
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 3 / 12
Two-sided hypothesis testing with p-values
1
Two-sided hypothesis testing with p-values
2
Significance level vs. confidence level
3
Statistical vs. Practical Significance
Statistics 101 U3 - L3: Decision errors, significance levels, sample size, and power Thomas Leininger
Two-sided hypothesis testing with p-values
Two-sided hypothesis testing with p-values
From yesterday:
A poll by the National Sleep Foundation found that college students average about 7 hours of sleep per night. A sample of 169 Duke students yielded an average of 6.88 hours, with a standard deviation of 0.94 hours. Assuming that this is a random sample representative of all Duke students (bit of a leap of faith?), a hypothesis test was conducted to evaluate if Duke students on average sleep less than 7 hours per night. The p-value for this hypothesis test is 0.0485. Which of the following is correct?
If the research question was “Do the data provide convincing evidence that the average amount of sleep Duke students get per night is different than the national average?”, the alternative hypothesis would be different.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 4 / 12
Two-sided hypothesis testing with p-values
Two-sided hypothesis testing with p-values
First scenario (Duke students lower than US average)
H0 : µ = 7 HA : µ < 7
Second scenario (Duke students different than US average)
H0 : µ = 7 HA : µ 7
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 5 / 12
Two-sided hypothesis testing with p-values
Two-sided hypothesis testing with p-values
First scenario (Duke students lower than US average)
H0 : µ = 7 HA : µ < 7
Second scenario (Duke students different than US average)
H0 : µ = 7 HA : µ 7
Hence the p-value would change as well:
6.88 7.00 7.12
p-value
= 0.0485 × 2 = 0.097
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 5 / 12
Two-sided hypothesis testing with p-values
Recap: Hypothesis testing framework
1
Set the hypotheses.
2
Check assumptions and conditions.
3
Calculate a test statistic and a p-value.
4
Make a decision, and interpret it in context of the research question.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 6 / 12
Two-sided hypothesis testing with p-values
Recap: Hypothesis testing for a population mean
1
Set the hypotheses H0 : µ = null value HA : µ < or > or null value
2
Check assumptions and conditions
Independence: random sample/assignment, 10% condition when sampling without replacement Normality: nearly normal population or n ≥ 30, no extreme skew
3
Calculate a test statistic and a p-value (draw a picture!)
Z = ¯ x − µ SE , where SE = s √n
4
Make a decision, and interpret it in context of the research question
If p-value < α, reject H0, data provide evidence for HA If p-value > α, do not reject H0, data do not provide evidence for HA
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 7 / 12
Significance level vs. confidence level
1
Two-sided hypothesis testing with p-values
2
Significance level vs. confidence level
3
Statistical vs. Practical Significance
Statistics 101 U3 - L3: Decision errors, significance levels, sample size, and power Thomas Leininger
Significance level vs. confidence level
Significance level vs. confidence level
Two sided
0.025 0.025 0.95
- 1.96
1.96
Two sided HT with α = 0.05 is equivalent to 95% confidence interval.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 8 / 12
Significance level vs. confidence level
Significance level vs. confidence level
Two sided
0.025 0.025 0.95
- 1.96
1.96
Two sided HT with α = 0.05 is equivalent to 95% confidence interval. One sided
1.65
0.9 0.05 0.05
One sided HT with α = 0.05 is equivalent to 90% confidence interval.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 8 / 12
Significance level vs. confidence level
Agreement of CI and HT
Confidence intervals and hypothesis tests agree, as long as the two methods use equivalent levels of significance / confidence.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 9 / 12
Significance level vs. confidence level
Agreement of CI and HT
Confidence intervals and hypothesis tests agree, as long as the two methods use equivalent levels of significance / confidence.
A two sided hypothesis with threshold of α is equivalent to a confidence interval with CL = 1 − α. A one sided hypothesis with threshold of α is equivalent to a confidence interval with CL = 1 − (2 × α).
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 9 / 12
Significance level vs. confidence level
Agreement of CI and HT
Confidence intervals and hypothesis tests agree, as long as the two methods use equivalent levels of significance / confidence.
A two sided hypothesis with threshold of α is equivalent to a confidence interval with CL = 1 − α. A one sided hypothesis with threshold of α is equivalent to a confidence interval with CL = 1 − (2 × α).
If H0 is rejected, a confidence interval that agrees with the result
- f the hypothesis test should not include the null value.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 9 / 12
Significance level vs. confidence level
Agreement of CI and HT
Confidence intervals and hypothesis tests agree, as long as the two methods use equivalent levels of significance / confidence.
A two sided hypothesis with threshold of α is equivalent to a confidence interval with CL = 1 − α. A one sided hypothesis with threshold of α is equivalent to a confidence interval with CL = 1 − (2 × α).
If H0 is rejected, a confidence interval that agrees with the result
- f the hypothesis test should not include the null value.
If H0 is failed to be rejected, a confidence interval that agrees with the result of the hypothesis test should include the null value.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and power May 31, 2013 9 / 12
Significance level vs. confidence level
Question A 95% confidence interval for the average waiting time at an emer- gency room is (128 minutes, 147 minutes). Which of the following is false? (a) A hypothesis test of HA : µ 120 min at α = 0.05 is equivalent to this CI. (b) A hypothesis test of HA : µ > 120 min at α = 0.025 is equivalent to this CI. (c) This interval does not support the claim that the average wait time is 120 minutes. (d) The claim that the average wait time is 120 minutes would not be rejected using a 90% confidence interval.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 10 / 12
Significance level vs. confidence level
Question A 95% confidence interval for the average waiting time at an emer- gency room is (128 minutes, 147 minutes). Which of the following is false? (a) A hypothesis test of HA : µ 120 min at α = 0.05 is equivalent to this CI. (b) A hypothesis test of HA : µ > 120 min at α = 0.025 is equivalent to this CI. (c) This interval does not support the claim that the average wait time is 120 minutes. (d) The claim that the average wait time is 120 minutes would not be rejected using a 90% confidence interval.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 10 / 12
Statistical vs. Practical Significance
1
Two-sided hypothesis testing with p-values
2
Significance level vs. confidence level
3
Statistical vs. Practical Significance
Statistics 101 U3 - L3: Decision errors, significance levels, sample size, and power Thomas Leininger
Statistical vs. Practical Significance
Sample Size
Question All else held equal, will p-value be lower if n = 100 or n = 10, 000? (a) n = 100 (b) n = 10, 000
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 11 / 12
Statistical vs. Practical Significance
Sample Size
Question All else held equal, will p-value be lower if n = 100 or n = 10, 000? (a) n = 100 (b) n = 10, 000
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 11 / 12
Statistical vs. Practical Significance
Sample Size
Question All else held equal, will p-value be lower if n = 100 or n = 10, 000? (a) n = 100 (b) n = 10, 000 Suppose ¯
x = 50, s = 2, H0 : µ = 49.5, and HA : µ ≥ 49.5.
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 11 / 12
Statistical vs. Practical Significance
Sample Size
Question All else held equal, will p-value be lower if n = 100 or n = 10, 000? (a) n = 100 (b) n = 10, 000 Suppose ¯
x = 50, s = 2, H0 : µ = 49.5, and HA : µ ≥ 49.5. Zn=100 = 50 − 49.5
2 √ 100
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 11 / 12
Statistical vs. Practical Significance
Sample Size
Question All else held equal, will p-value be lower if n = 100 or n = 10, 000? (a) n = 100 (b) n = 10, 000 Suppose ¯
x = 50, s = 2, H0 : µ = 49.5, and HA : µ ≥ 49.5. Zn=100 = 50 − 49.5
2 √ 100
= 50 − 49.5
2 10
= 0.5 0.2 = 2.5,
p-value = 0.0062
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 11 / 12
Statistical vs. Practical Significance
Sample Size
Question All else held equal, will p-value be lower if n = 100 or n = 10, 000? (a) n = 100 (b) n = 10, 000 Suppose ¯
x = 50, s = 2, H0 : µ = 49.5, and HA : µ ≥ 49.5. Zn=100 = 50 − 49.5
2 √ 100
= 50 − 49.5
2 10
= 0.5 0.2 = 2.5,
p-value = 0.0062
Zn=10000 = 50 − 49.5
2 √ 10000
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 11 / 12
Statistical vs. Practical Significance
Sample Size
Question All else held equal, will p-value be lower if n = 100 or n = 10, 000? (a) n = 100 (b) n = 10, 000 Suppose ¯
x = 50, s = 2, H0 : µ = 49.5, and HA : µ ≥ 49.5. Zn=100 = 50 − 49.5
2 √ 100
= 50 − 49.5
2 10
= 0.5 0.2 = 2.5,
p-value = 0.0062
Zn=10000 = 50 − 49.5
2 √ 10000
= 50 − 49.5
2 100
= 0.5 0.02 = 25,
p-value ≈ 0
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 11 / 12
Statistical vs. Practical Significance
Sample Size
Question All else held equal, will p-value be lower if n = 100 or n = 10, 000? (a) n = 100 (b) n = 10, 000 Suppose ¯
x = 50, s = 2, H0 : µ = 49.5, and HA : µ ≥ 49.5. Zn=100 = 50 − 49.5
2 √ 100
= 50 − 49.5
2 10
= 0.5 0.2 = 2.5,
p-value = 0.0062
Zn=10000 = 50 − 49.5
2 √ 10000
= 50 − 49.5
2 100
= 0.5 0.02 = 25,
p-value ≈ 0 As n increases - SE ↓, Z ↑, p-value ↓
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 11 / 12
Statistical vs. Practical Significance
Statistical vs. Practical Significance
Real differences between the point estimate and null value are easier to detect with larger samples. However, very large samples will result in statistical significance even for tiny differences between the sample mean and the null value (effect size), even when the difference is not practically significant. This is especially important to research: if we conduct a study, we want to focus on finding meaningful results (we want
- bserved differences to be real, but also large enough to matter).
The role of a statistician is not just in the analysis of data, but also in planning and design of a study.
“To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.” – R.A. Fisher
Statistics 101 (Thomas Leininger) U3 - L3: Decision errors, significance levels, sample size, and powerMay 31, 2013 12 / 12