Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
STAT 113 Inference Using Normal Approximations Colin Reimer Dawson - - PowerPoint PPT Presentation
STAT 113 Inference Using Normal Approximations Colin Reimer Dawson - - PowerPoint PPT Presentation
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal STAT 113 Inference Using Normal Approximations Colin Reimer Dawson Oberlin College November 4, 2020 1 / 33 Analytic
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 2 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Outline
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 3 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
P-value = Proportion of Randomized Sample Statistics
500 1000 1500 0.45 0.50 0.55 0.60
Proportion of Heads in 500 Flips Number of Simulated Datasets AtLeast270Heads
FALSE TRUE
Figure: Randomization distribution for the number of heads in 500 coin flips, highlighting the one-tailed P-value testing H1 : p > 0.5 for an
- bservation of 270 heads.
4 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Confidence Level ≈ Proportion of Bootstrap Samples
500 1000 1500 2000 0.3 0.4 0.5 0.6 0.7
Mean Mercury Level in 50 Observations (parts per million) Number of Bootstrap Datasets InInterval
FALSE TRUE
Figure: Bootstrap distribution for mean mercury level in fish in Florida Lakes (from FloridaLakes dataset). The middle 95% is highlighted illustrating a 95% confidence interval.
5 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
So what’s with all these bell shapes?
- Q: Why are so many distributions “bell-shaped”?
6 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
So what’s with all these bell shapes?
- Q: Why are so many distributions “bell-shaped”?
- A: The Central Limit Theorem
6 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
So what’s with all these bell shapes?
- Q: Why are so many distributions “bell-shaped”?
- A: The Central Limit Theorem
- One of the most important results in probability
6 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
So what’s with all these bell shapes?
- Q: Why are so many distributions “bell-shaped”?
- A: The Central Limit Theorem
- One of the most important results in probability
For sufficiently large datasets, sampling distributions of sMeans have approximately a Normal (bell-shaped) distribution.
6 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
So what’s with all these bell shapes?
- Q: Why are so many distributions “bell-shaped”?
- A: The Central Limit Theorem
- One of the most important results in probability
For sufficiently large datasets, sampling distributions of sMeans have approximately a Normal (bell-shaped) distribution.
6 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
So what’s with all these bell shapes?
- Q: Why are so many distributions “bell-shaped”?
- A: The Central Limit Theorem
- One of the most important results in probability
For sufficiently large datasets, sampling distributions of sMeans have approximately a Normal (bell-shaped) distribution.
6 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Corollaries of the Central Limit Theorem
This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:
- Sampling distributions of means are approximately Normal
7 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Corollaries of the Central Limit Theorem
This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:
- Sampling distributions of means are approximately Normal
- Sampling distributions of proportions are approximately
Normal 7 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Corollaries of the Central Limit Theorem
This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:
- Sampling distributions of means are approximately Normal
- Sampling distributions of proportions are approximately
Normal
- Sampling distributions of differences of means are
approximately Normal 7 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Corollaries of the Central Limit Theorem
This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:
- Sampling distributions of means are approximately Normal
- Sampling distributions of proportions are approximately
Normal
- Sampling distributions of differences of means are
approximately Normal
- Sampling distributions of differences of proportions is
approximately Normal 7 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Corollaries of the Central Limit Theorem
This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:
- Sampling distributions of means are approximately Normal
- Sampling distributions of proportions are approximately
Normal
- Sampling distributions of differences of means are
approximately Normal
- Sampling distributions of differences of proportions is
approximately Normal
- Sampling distributions of regression slopes are approximately
Normal (when regression conditions are met) 7 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Density Functions
0.000 0.005 0.010 0.015 0.020 0.025 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160 168 176 184 192
Birth Weight (oz) Density
Figure: Densities of Babies’ Birth Weights (Nolan and Speed, 2000)
8 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Proportion = Area Under the Density Curve
Birthweight in oz Density 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160 168 176 184 192 0.005 0.01 0.015 0.02 0.025
Shaded = 0.06 of total
Figure: Approximating birth weight distribution using a Normal. Shaded area
is the proportion of the distribution at or above 148 oz
9 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Outline
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 10 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Normal Distributions
Normal distributions are completely specified by their mean (µ) and their standard deviation (σ). We can write N(0, 1) as shorthand for a Normal with mean 0 and standard deviation 1.
0.0 0.5 1.0 1.5 Some Variable Density −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 N(0, 1) N(2, 1) N(0, 0.5) N(−4, 0.3)
11 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Area Under Normal Curve
0.0 0.1 0.2 0.3 0.4 Some Variable Density −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 3.5
Can we work out these areas without simulation? 12 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
StatKey to the Rescue!
13 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
R Works Too
library(mosaic) ## Area under the curve to the right of 1.5 ## First argument: the cutoff value ## mean, sd: the mean and standard deviation of the Normal ## lower.tail = TRUE/FALSE; area to the left (TRUE) or right (FALSE) xpnorm(1.5, mean = 0, sd = 1, lower.tail = FALSE)
z = 1.5
0.0 0.1 0.2 0.3 0.4 −4 −2 2 4
x density
[1] 0.0668072 ## Creates a plot showing the z-score of the cutoff ## and returns the proportion under the curve on the specified side
14 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Self-Check
Find the specified tail proportions for a Normal distribution (use either StatKey or R, or use both to confirm your results!)
- 1. The proportion of cases above 62 in a N(50,10) distribution
- 2. The proportion of cases below 8 in a N(10,2) distribution.
15 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Finding the Cutoff if We Know the Tail Proportion
16 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Finding Cutoffs in R ...
## xqnorm() takes the tail proportion and gives the cutoff ## First argument: the desired tail proportion ## mean, sd: mean and standard deviation of the Normal ## lower.tail: (optional) set to FALSE if specifying ## right-hand proportion xqnorm(0.05, mean = 29.11, sd = 0.93) # Note the q instead of p
z = −1.64
0.0 0.1 0.2 0.3 0.4 27 29 31 33
x density
[1] 27.58029 ## Makes a plot marking the z-score of the cutoff, ## and returns the actual cutoff value
17 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Finding a Cutoff on the Right
## We set lower.tail = FALSE if the 0.05 should be on the right ## (i.e., on the upper tail instead of the lower one) xqnorm(0.05, mean = 29.11, sd = 0.93, lower.tail = FALSE)
z = 1.64
0.0 0.1 0.2 0.3 0.4 27 29 31 33
x density
[1] 30.63971
18 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Self-Check
Find the following quantities
- 3. The 35th percentile of a N(100,25) distribution (Hint: 35% of
the distribution is to the left of the answer)
- 4. The 95th percentile of a N(10,4) distribution (Hint: 5% of the
distribution is to the right of the answer) 19 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Self-Check
Weights of newborn babies in one community are normally distributed with a mean of 120 ounces and a standard deviation
- f 19.2 ounces.
- 5. What percent of newborns in this community weigh 90
- unces or less?
- 6. What percent of newborns weigh 176 ounces or more?
- 7. If a newborn baby is at the 15th percentile for weight, what
is the baby’s weight in ounces? 20 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Outline
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 21 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Approximating a Randomization Distribution
The 1971 draft had a correlation of 0.014 between birthday (as a day from 1 to 366) and draft position. Using a randomization distribution, we estimate that the standard error of the correlation is 0.053. Find the P-value for the null hypothesis that the draft was random vs the alternative that it was biased toward early or late birthdays by approximating the randomization distribution of the correlation with a Normal curve. 22 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Approximating a Randomization Distribution
The 1971 draft had a correlation of 0.014 between birthday (as a day from 1 to 366) and draft position. Using a randomization distribution, we estimate that the standard error of the correlation is 0.053. Find the P-value for the null hypothesis that the draft was random vs the alternative that it was biased toward early or late birthdays by approximating the randomization distribution of the correlation with a Normal curve. Steps:
- 1. What mean and standard deviation for the Normal?
22 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Approximating a Randomization Distribution
The 1971 draft had a correlation of 0.014 between birthday (as a day from 1 to 366) and draft position. Using a randomization distribution, we estimate that the standard error of the correlation is 0.053. Find the P-value for the null hypothesis that the draft was random vs the alternative that it was biased toward early or late birthdays by approximating the randomization distribution of the correlation with a Normal curve. Steps:
- 1. What mean and standard deviation for the Normal?
- 2. What cutoff to use? Which side of the cutoff?
22 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Approximating a Randomization Distribution
The 1971 draft had a correlation of 0.014 between birthday (as a day from 1 to 366) and draft position. Using a randomization distribution, we estimate that the standard error of the correlation is 0.053. Find the P-value for the null hypothesis that the draft was random vs the alternative that it was biased toward early or late birthdays by approximating the randomization distribution of the correlation with a Normal curve. Steps:
- 1. What mean and standard deviation for the Normal?
- 2. What cutoff to use? Which side of the cutoff?
- 3. Set the cutoff to 0.014, find the area to the right under a
N(0, 0.053) curve (then double it since the test is two-tailed). 22 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
StatKey StatKey
23 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Draft Lottery: R Solution
## Proportion of hypothetical datasets with correlation >= 0.014 ## if H0 is true. ## Mean is 0 since H0 says rho = 0 ## SD is 0.053 since this is the standard error given ## lower.tail = FALSE since we want the proportion of correlations ## at or *above* 0.014 P.right <- xpnorm(0.014, mean = 0, sd = 0.053, lower.tail = FALSE)
z = 0.26
2 4 6 8 −0.2 −0.1 0.0 0.1 0.2
x density
## The two-tailed P-value is double the one-tail value since ## the distribution is symmetric 2 * P.right [1] 0.7916636
24 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Draft Lottery: Conclusions
- If the lottery were random, we would expect a sample
correlation with an absolute value of 0.014 or higher about 79% of the time we did the draft 25 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Draft Lottery: Conclusions
- If the lottery were random, we would expect a sample
correlation with an absolute value of 0.014 or higher about 79% of the time we did the draft
- Since this is well above any reasonable significance level, we
do not have significant evidence to reject the hypothesis that the draft was random 25 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Self-Check
In a dataset of 120 soccer matches played in the Football Association (FA) premier league in Great Britain, the home team won 70 times. Let’s examine whether this data provides evidence of home field advantage
- 8. Create a randomization distribution of simulated sProportions
- f games won by the home team (you can use the do() *
rflip() construction in R we used in Lab 7, or use StatKey) and use it to find a two-tailed P-value.
- 9. Use a normal distribution as a substitute for the randomization
- distribution. What should the mean be?
- 10. What should the standard deviation be?
- 11. Calculate a two-tailed P-value using the Normal model
Compare the answer from the normal distribution to what you found from the randomization distribution. Are the results similar? 26 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Outline
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 27 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
- We have a dataset of 500 Atlanta commute times
- The (sample) mean is 29.11 minutes.
- Using bootstrapping we estimate that the standard error is
0.93 Find a 90% confidence interval by approximating the bootstrap distribution with a Normal. 28 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
- We have a dataset of 500 Atlanta commute times
- The (sample) mean is 29.11 minutes.
- Using bootstrapping we estimate that the standard error is
0.93 Find a 90% confidence interval by approximating the bootstrap distribution with a Normal. Steps:
- 1. What mean and standard deviation for the Normal
replacement for the bootstrap distribution? 28 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
- We have a dataset of 500 Atlanta commute times
- The (sample) mean is 29.11 minutes.
- Using bootstrapping we estimate that the standard error is
0.93 Find a 90% confidence interval by approximating the bootstrap distribution with a Normal. Steps:
- 1. What mean and standard deviation for the Normal
replacement for the bootstrap distribution?
- 2. What percentiles of the distribution do we need for a 90%
confidence level? 28 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
- We have a dataset of 500 Atlanta commute times
- The (sample) mean is 29.11 minutes.
- Using bootstrapping we estimate that the standard error is
0.93 Find a 90% confidence interval by approximating the bootstrap distribution with a Normal. Steps:
- 1. What mean and standard deviation for the Normal
replacement for the bootstrap distribution?
- 2. What percentiles of the distribution do we need for a 90%
confidence level?
- 3. Consult a N(29.11, 0.93) curve and set the area in the tails to
5% each to get the 5th and 95th percentiles for a 90% interval 28 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
StatKey...
29 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Left-hand endpoint in R ...
xqnorm(0.05, mean = 29.11, sd = 0.93) # Note the q instead of p
z = −1.64 0.0 0.1 0.2 0.3 0.4 27 29 31 33 x density
[1] 27.58029
30 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
And the Right-hand Endpoint...
xqnorm(0.95, mean = 29.11, sd = 0.93) # Note the q instead of p
z = 1.64 0.0 0.1 0.2 0.3 0.4 27 29 31 33 x density
[1] 30.63971
31 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal
Self-Check
- 10. Create a bootstrap distribution of sMeans using the data on
prices (in $1,000’s) of used Mustang cars in MustangPrice (this is in Lock5Data if you want to use R, or you can use StatKey), and find a 94% confidence interval for the pMean price using the appropriate percentiles of the bootstrap distribution
- 11. Consider a normal distribution substitute for the bootstrap
- distribution. Find the percentiles of this Normal
distribution that correspond to those you used for the bootstrap distribution to get an alternate 94% confidence interval Compare the answer from the normal distribution to what you found from the bootstrap distribution. Are the results similar? 32 / 33
Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal