[PPT] - STAT 113 Inference Using Normal Approximations Colin Reimer Dawson PowerPoint Presentation

SLIDE 1

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

STAT 113 Inference Using Normal Approximations

Colin Reimer Dawson

Oberlin College

November 4, 2020 1 / 33

SLIDE 2

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 2 / 33

SLIDE 3

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Outline

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 3 / 33

SLIDE 4

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

P-value = Proportion of Randomized Sample Statistics

500 1000 1500 0.45 0.50 0.55 0.60

Proportion of Heads in 500 Flips Number of Simulated Datasets AtLeast270Heads

FALSE TRUE

Figure: Randomization distribution for the number of heads in 500 coin flips, highlighting the one-tailed P-value testing H1 : p > 0.5 for an

bservation of 270 heads.

4 / 33

SLIDE 5

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Confidence Level ≈ Proportion of Bootstrap Samples

500 1000 1500 2000 0.3 0.4 0.5 0.6 0.7

Mean Mercury Level in 50 Observations (parts per million) Number of Bootstrap Datasets InInterval

FALSE TRUE

Figure: Bootstrap distribution for mean mercury level in fish in Florida Lakes (from FloridaLakes dataset). The middle 95% is highlighted illustrating a 95% confidence interval.

5 / 33

SLIDE 6

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

So what’s with all these bell shapes?

Q: Why are so many distributions “bell-shaped”?

6 / 33

SLIDE 7

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

So what’s with all these bell shapes?

Q: Why are so many distributions “bell-shaped”?
A: The Central Limit Theorem

6 / 33

SLIDE 8

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

So what’s with all these bell shapes?

Q: Why are so many distributions “bell-shaped”?
A: The Central Limit Theorem
One of the most important results in probability

6 / 33

SLIDE 9

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

So what’s with all these bell shapes?

Q: Why are so many distributions “bell-shaped”?
A: The Central Limit Theorem
One of the most important results in probability

For sufficiently large datasets, sampling distributions of sMeans have approximately a Normal (bell-shaped) distribution.

6 / 33

SLIDE 10

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

So what’s with all these bell shapes?

Q: Why are so many distributions “bell-shaped”?
A: The Central Limit Theorem
One of the most important results in probability

For sufficiently large datasets, sampling distributions of sMeans have approximately a Normal (bell-shaped) distribution.

6 / 33

SLIDE 11

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

So what’s with all these bell shapes?

Q: Why are so many distributions “bell-shaped”?
A: The Central Limit Theorem
One of the most important results in probability

For sufficiently large datasets, sampling distributions of sMeans have approximately a Normal (bell-shaped) distribution.

6 / 33

SLIDE 12

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Corollaries of the Central Limit Theorem

This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:

Sampling distributions of means are approximately Normal

7 / 33

SLIDE 13

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Corollaries of the Central Limit Theorem

This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:

Sampling distributions of means are approximately Normal
Sampling distributions of proportions are approximately

Normal 7 / 33

SLIDE 14

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Corollaries of the Central Limit Theorem

This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:

Sampling distributions of means are approximately Normal
Sampling distributions of proportions are approximately

Normal

Sampling distributions of differences of means are

approximately Normal 7 / 33

SLIDE 15

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Corollaries of the Central Limit Theorem

This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:

Sampling distributions of means are approximately Normal
Sampling distributions of proportions are approximately

Normal

Sampling distributions of differences of means are

approximately Normal

Sampling distributions of differences of proportions is

approximately Normal 7 / 33

SLIDE 16

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Corollaries of the Central Limit Theorem

This theorem, together with some other properties of Normal distributions, implies that, for large enough datasets:

Sampling distributions of means are approximately Normal
Sampling distributions of proportions are approximately

Normal

Sampling distributions of differences of means are

approximately Normal

Sampling distributions of differences of proportions is

approximately Normal

Sampling distributions of regression slopes are approximately

Normal (when regression conditions are met) 7 / 33

SLIDE 17

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Density Functions

0.000 0.005 0.010 0.015 0.020 0.025 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160 168 176 184 192

Birth Weight (oz) Density

Figure: Densities of Babies’ Birth Weights (Nolan and Speed, 2000)

8 / 33

SLIDE 18

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Proportion = Area Under the Density Curve

Birthweight in oz Density 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160 168 176 184 192 0.005 0.01 0.015 0.02 0.025

Shaded = 0.06 of total

Figure: Approximating birth weight distribution using a Normal. Shaded area

is the proportion of the distribution at or above 148 oz

9 / 33

SLIDE 19

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Outline

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 10 / 33

SLIDE 20

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Normal Distributions

Normal distributions are completely specified by their mean (µ) and their standard deviation (σ). We can write N(0, 1) as shorthand for a Normal with mean 0 and standard deviation 1.

0.0 0.5 1.0 1.5 Some Variable Density −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 N(0, 1) N(2, 1) N(0, 0.5) N(−4, 0.3)

11 / 33

SLIDE 21

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Area Under Normal Curve

0.0 0.1 0.2 0.3 0.4 Some Variable Density −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 3.5

Can we work out these areas without simulation? 12 / 33

SLIDE 22

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

StatKey to the Rescue!

13 / 33

SLIDE 23

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

R Works Too

library(mosaic) ## Area under the curve to the right of 1.5 ## First argument: the cutoff value ## mean, sd: the mean and standard deviation of the Normal ## lower.tail = TRUE/FALSE; area to the left (TRUE) or right (FALSE) xpnorm(1.5, mean = 0, sd = 1, lower.tail = FALSE)

z = 1.5

0.0 0.1 0.2 0.3 0.4 −4 −2 2 4

x density

[1] 0.0668072 ## Creates a plot showing the z-score of the cutoff ## and returns the proportion under the curve on the specified side

14 / 33

SLIDE 24

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Self-Check

Find the specified tail proportions for a Normal distribution (use either StatKey or R, or use both to confirm your results!)

1. The proportion of cases above 62 in a N(50,10) distribution
2. The proportion of cases below 8 in a N(10,2) distribution.

15 / 33

SLIDE 25

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Finding the Cutoff if We Know the Tail Proportion

16 / 33

SLIDE 26

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Finding Cutoffs in R ...

## xqnorm() takes the tail proportion and gives the cutoff ## First argument: the desired tail proportion ## mean, sd: mean and standard deviation of the Normal ## lower.tail: (optional) set to FALSE if specifying ## right-hand proportion xqnorm(0.05, mean = 29.11, sd = 0.93) # Note the q instead of p

z = −1.64

0.0 0.1 0.2 0.3 0.4 27 29 31 33

x density

[1] 27.58029 ## Makes a plot marking the z-score of the cutoff, ## and returns the actual cutoff value

17 / 33

SLIDE 27

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Finding a Cutoff on the Right

## We set lower.tail = FALSE if the 0.05 should be on the right ## (i.e., on the upper tail instead of the lower one) xqnorm(0.05, mean = 29.11, sd = 0.93, lower.tail = FALSE)

z = 1.64

0.0 0.1 0.2 0.3 0.4 27 29 31 33

x density

[1] 30.63971

18 / 33

SLIDE 28

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Self-Check

Find the following quantities

3. The 35th percentile of a N(100,25) distribution (Hint: 35% of

the distribution is to the left of the answer)

4. The 95th percentile of a N(10,4) distribution (Hint: 5% of the

distribution is to the right of the answer) 19 / 33

SLIDE 29

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Self-Check

Weights of newborn babies in one community are normally distributed with a mean of 120 ounces and a standard deviation

f 19.2 ounces.
5. What percent of newborns in this community weigh 90
unces or less?
6. What percent of newborns weigh 176 ounces or more?
7. If a newborn baby is at the 15th percentile for weight, what

is the baby’s weight in ounces? 20 / 33

SLIDE 30

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Outline

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 21 / 33

SLIDE 31

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Approximating a Randomization Distribution

The 1971 draft had a correlation of 0.014 between birthday (as a day from 1 to 366) and draft position. Using a randomization distribution, we estimate that the standard error of the correlation is 0.053. Find the P-value for the null hypothesis that the draft was random vs the alternative that it was biased toward early or late birthdays by approximating the randomization distribution of the correlation with a Normal curve. 22 / 33

SLIDE 32

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Approximating a Randomization Distribution

The 1971 draft had a correlation of 0.014 between birthday (as a day from 1 to 366) and draft position. Using a randomization distribution, we estimate that the standard error of the correlation is 0.053. Find the P-value for the null hypothesis that the draft was random vs the alternative that it was biased toward early or late birthdays by approximating the randomization distribution of the correlation with a Normal curve. Steps:

1. What mean and standard deviation for the Normal?

22 / 33

SLIDE 33

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Approximating a Randomization Distribution

The 1971 draft had a correlation of 0.014 between birthday (as a day from 1 to 366) and draft position. Using a randomization distribution, we estimate that the standard error of the correlation is 0.053. Find the P-value for the null hypothesis that the draft was random vs the alternative that it was biased toward early or late birthdays by approximating the randomization distribution of the correlation with a Normal curve. Steps:

1. What mean and standard deviation for the Normal?
2. What cutoff to use? Which side of the cutoff?

22 / 33

SLIDE 34

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Approximating a Randomization Distribution

The 1971 draft had a correlation of 0.014 between birthday (as a day from 1 to 366) and draft position. Using a randomization distribution, we estimate that the standard error of the correlation is 0.053. Find the P-value for the null hypothesis that the draft was random vs the alternative that it was biased toward early or late birthdays by approximating the randomization distribution of the correlation with a Normal curve. Steps:

1. What mean and standard deviation for the Normal?
2. What cutoff to use? Which side of the cutoff?
3. Set the cutoff to 0.014, find the area to the right under a

N(0, 0.053) curve (then double it since the test is two-tailed). 22 / 33

SLIDE 35

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

StatKey StatKey

23 / 33

SLIDE 36

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Draft Lottery: R Solution

## Proportion of hypothetical datasets with correlation >= 0.014 ## if H0 is true. ## Mean is 0 since H0 says rho = 0 ## SD is 0.053 since this is the standard error given ## lower.tail = FALSE since we want the proportion of correlations ## at or above 0.014 P.right <- xpnorm(0.014, mean = 0, sd = 0.053, lower.tail = FALSE)

z = 0.26

2 4 6 8 −0.2 −0.1 0.0 0.1 0.2

x density

## The two-tailed P-value is double the one-tail value since ## the distribution is symmetric 2 * P.right [1] 0.7916636

24 / 33

SLIDE 37

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Draft Lottery: Conclusions

If the lottery were random, we would expect a sample

correlation with an absolute value of 0.014 or higher about 79% of the time we did the draft 25 / 33

SLIDE 38

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Draft Lottery: Conclusions

If the lottery were random, we would expect a sample

correlation with an absolute value of 0.014 or higher about 79% of the time we did the draft

Since this is well above any reasonable significance level, we

do not have significant evidence to reject the hypothesis that the draft was random 25 / 33

SLIDE 39

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Self-Check

In a dataset of 120 soccer matches played in the Football Association (FA) premier league in Great Britain, the home team won 70 times. Let’s examine whether this data provides evidence of home field advantage

8. Create a randomization distribution of simulated sProportions
f games won by the home team (you can use the do() *

rflip() construction in R we used in Lab 7, or use StatKey) and use it to find a two-tailed P-value.

9. Use a normal distribution as a substitute for the randomization
distribution. What should the mean be?
10. What should the standard deviation be?
11. Calculate a two-tailed P-value using the Normal model

Compare the answer from the normal distribution to what you found from the randomization distribution. Are the results similar? 26 / 33

SLIDE 40

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Outline

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal 27 / 33

SLIDE 41

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

We have a dataset of 500 Atlanta commute times
The (sample) mean is 29.11 minutes.
Using bootstrapping we estimate that the standard error is

0.93 Find a 90% confidence interval by approximating the bootstrap distribution with a Normal. 28 / 33

SLIDE 42

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

We have a dataset of 500 Atlanta commute times
The (sample) mean is 29.11 minutes.
Using bootstrapping we estimate that the standard error is

0.93 Find a 90% confidence interval by approximating the bootstrap distribution with a Normal. Steps:

1. What mean and standard deviation for the Normal

replacement for the bootstrap distribution? 28 / 33

SLIDE 43

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

We have a dataset of 500 Atlanta commute times
The (sample) mean is 29.11 minutes.
Using bootstrapping we estimate that the standard error is

0.93 Find a 90% confidence interval by approximating the bootstrap distribution with a Normal. Steps:

1. What mean and standard deviation for the Normal

replacement for the bootstrap distribution?

2. What percentiles of the distribution do we need for a 90%

confidence level? 28 / 33

SLIDE 44

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

We have a dataset of 500 Atlanta commute times
The (sample) mean is 29.11 minutes.
Using bootstrapping we estimate that the standard error is

0.93 Find a 90% confidence interval by approximating the bootstrap distribution with a Normal. Steps:

1. What mean and standard deviation for the Normal

replacement for the bootstrap distribution?

2. What percentiles of the distribution do we need for a 90%

confidence level?

3. Consult a N(29.11, 0.93) curve and set the area in the tails to

5% each to get the 5th and 95th percentiles for a 90% interval 28 / 33

SLIDE 45

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

StatKey...

29 / 33

SLIDE 46

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Left-hand endpoint in R ...

xqnorm(0.05, mean = 29.11, sd = 0.93) # Note the q instead of p

z = −1.64 0.0 0.1 0.2 0.3 0.4 27 29 31 33 x density

[1] 27.58029

30 / 33

SLIDE 47

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

And the Right-hand Endpoint...

xqnorm(0.95, mean = 29.11, sd = 0.93) # Note the q instead of p

z = 1.64 0.0 0.1 0.2 0.3 0.4 27 29 31 33 x density

[1] 30.63971

31 / 33

SLIDE 48

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Self-Check

10. Create a bootstrap distribution of sMeans using the data on

prices (in $1,000’s) of used Mustang cars in MustangPrice (this is in Lock5Data if you want to use R, or you can use StatKey), and find a 94% confidence interval for the pMean price using the appropriate percentiles of the bootstrap distribution

11. Consider a normal distribution substitute for the bootstrap
distribution. Find the percentiles of this Normal

distribution that correspond to those you used for the bootstrap distribution to get an alternate 94% confidence interval Compare the answer from the normal distribution to what you found from the bootstrap distribution. Are the results similar? 32 / 33

SLIDE 49

Analytic Approximations Normal Distributions Hypothesis Tests Using a Normal Confidence Intervals Using a Normal

Summary

P-values Using a Normal

If we can approximate a randomization distribution with a Normal, we can compute P-values.

Confidence Intervals Using a Normal

If we can approximate a bootstrap distribution with a Normal, we can construct a confidence interval.

The Missing Piece (For Later)

We need to know what standard error to use, since this sets the standard deviation of our Normal replacement for randomization/bootstrap distributions 33 / 33