[PPT] - Lecture 3: The Normal Distribution and Statistical Inference Ani PowerPoint Presentation

SLIDE 1

Lecture 3: The Normal Distribution and Statistical Inference

Ani Manichaikul amanicha@jhsph.edu 19 April 2007

1 / 62

SLIDE 2

A Review and Some Connections

The Normal Distribution The Central Limit Theorem Estimates of means and proportions: uses and properties Confidence intervals and Hypothesis tests

2 / 62

SLIDE 3

The Normal Distribution

Probability distribution for continuous data Under certain conditions, can be used to approximate binomial probabilities

np>5 n(1-p)>5

Characterized by a symmetric bell-shaped curve (Gaussian curve)

Symmetric about its mean µ

3 / 62

SLIDE 4

Normal Distribution

Takes on values between −∞ and +∞ Mean = Median = Mode Area under curve equals 1 Parameters

µ = mean σ = standard deviation

4 / 62

SLIDE 5

Normal Distribution

Normal Density −∞ µ +∞

Notation for Normal random variable: X ∼ N(µ, σ2)

5 / 62

SLIDE 6

Formula: Normal Distribution

The normal probability distribution is given by: f (x) = 1 √ 2πσ · e−(x−µ)2/2σ2, −∞ < x < +∞ π ≈ 3.14 and e ≈ 2.72 are mathematical constants µ, σ are mean and SD parameters of the distribution

6 / 62

SLIDE 7

Standard Normal

The standard normal distribution has parameters µ = 0 and σ = 1 Its density function is written as: f (x) = 1 √ 2π · e−x2/2, −∞ < x < +∞ We typically use the letter Z to denote a standard normal random variable (Z ∼ N(0, 1)) If X ∼ N(µ, σ), then X−µ

σ

∼ N(0, 1)

7 / 62

SLIDE 8

68-95-99.7 Rule I

68% of density is within one standard deviation of the mean

8 / 62

SLIDE 9

68-95-99.7 Rule II

95% of density is within two standard deviations of the mean

9 / 62

SLIDE 10

68-95-99.7 Rule III

99.7% of density is within three standard deviations of the mean

10 / 62

SLIDE 11

Different Means

Normal Density µ1 µ2 µ3

Three normal distributions with different means µ1 < µ2 < µ3

11 / 62

SLIDE 12

Different Standard Deviations

Normal Density σ1 σ2 σ3

Three normal distributions with different standard deviations σ1 < σ2 < σ3

12 / 62

SLIDE 13

Standard Normal

−4 −2 2 4 µ=0 Normal Density σ=1

13 / 62

SLIDE 14

Example: Birthweights I

Birthweights (in grams) of infants in a population

14 / 62

SLIDE 15

Example: Birthweights II

Continuous data Mean = Median = Mode = 3000 = µ Standard deviation = 1000 = σ The area under the curve represents the probability (proportion) of infants with birthweights between certain values

15 / 62

SLIDE 16

Normal Probabilities

16 / 62

SLIDE 17

Calculating Probabilities

Equivalent to finding area under the curve Continuous distribution, so we cannot use sums to find probabilities Performing the integration is not necessary since tables and computers are available

17 / 62

SLIDE 18

Z Tables

18 / 62

SLIDE 19

Normal Table

19 / 62

SLIDE 20

Looking up z=2.22

20 / 62

SLIDE 21

Looking up z=-0.67

21 / 62

SLIDE 22

Example: Birthweights

22 / 62

SLIDE 23

Question I

What is the probability of an infant weighing more than 5000g? P(X > 5000) = P(X − µ σ > 5000 − 3000 1000 ) = P(Z > 2) = 0.0228

23 / 62

SLIDE 24

Question II

What is the probability of an infant weighing between 2500 and 4000g? P(2500 < X < 4000) = P(2500 − 3000 1000 < X − µ σ < 4000 − 3000 1000 ) = P(−0.5 < Z < 1) = 1 − P(Z > 1) − P(Z < −0.5) = 1 − 0.1587 − 0.3085 = 0.5328

24 / 62

SLIDE 25

Question III

What is the probability of an infant weighing less than 3500g? P(X < 3500) = P(X − µ σ < 3500 − 3000 1000 ) = P(Z < 0.5) = 1 − P(Z > 0.5) = 1 − 0.3085 = 0.6915

25 / 62

SLIDE 26

Statistical Inference

Populations and samples Sampling distributions

26 / 62

SLIDE 27

Definitions

Statistical inference is “the attempt to reach a conclusion concerning all members of a class from observations of only some of them.” (Runes 1959) A population is a collection of observations A parameter is a numerical descriptor of a population A sample is a part or subset of a population A statistic is a numerical descriptor of the sample

27 / 62

SLIDE 28

Population

Population size = N µ = mean, a measure of center σ2 = variance, a measure of dispersion σ = standard deviation

28 / 62

SLIDE 29

Sample Estimates

Sample size = n ¯ X = sample mean s2 = sample variance s = sample standard deviation Population: parameters Sample: statistics

29 / 62

SLIDE 30

Estimating µ

Usually µ is unknown and we would like to estimate it We use ¯ X to estimate µ We know the sampling distribution of ¯ X

30 / 62

SLIDE 31

Sampling Distribution

The distribution of all possible values of some statistic, computed from samples of the same size randomly drawn from the same population, is called the sampling distribution of that statistic

31 / 62

SLIDE 32

Sampling Distribution of ¯ X

When sampling from a normally distributed population ¯ X will be normally distributed The mean of the distribution of ¯ X is equal to the true mean µ

f the population from which the samples were drawn

The variance of the distribution is σ2/n, where σ2 is the variance of the population and n is the sample size We can write: ¯ X ∼ N(µ, σ2/n) When sampling is from a population whose distribution is not normal and the sample size is large, use the Central Limit Theorem

32 / 62

SLIDE 33

The Central Limit Theorem (CLT)

Given a population of any distribution with mean, µ, and variance, σ2, the sampling distribution of ¯ X, computed from samples of size n from this population, will be approximately N(µ, σ2/n) when the sample size is large In general, this applies when n ≥ 25 The approximation of normality becomes better as n increases

33 / 62

SLIDE 34

What about for Binomial RVs? I

First, recall that a Binomial variable is just the sum of n Bernoulli variable: Sn = n

i=1 Xi

Notation:

Sn ∼ Binomial(n,p) Xi ∼ Bernoulli(p) = Binomial(1, p) for i = 1, . . . , n

34 / 62

SLIDE 35

What about for Binomial RVs? II

In this case, we want to estimate p by ˆ p where ˆ p = Sn n = n

i=1 Xi

n = ¯ X ˆ p is just a sample mean! So we can use the central limit theorem when n is large

35 / 62

SLIDE 36

Binomial CLT

For a Bernoulli variable

µ = mean = p σ2 = variance = p(1-p)

¯ X ≈ N(µ, σ2/n) as before Equivalently, ˆ p ≈ N(p, p(1−p)

n

)

36 / 62

SLIDE 37

Notation I

Often we are interested in detecting a difference between two populations Differences in average income by neighborhood Differences in disease cure rates by age

37 / 62

SLIDE 38

Notation II

Population 1: Size = N1 Mean = µ1 Standard deviation = σ1 Population 2: Size = N2 Mean = µ2 Standard deviation = σ2 Samples of size n1 from Population 1: Mean = µ ¯

X1 = µ1

Standard deviation = σ1/√n1 = σX1 Samples of size n2 from Population 2: Mean = µ ¯

X2 = µ2

Standard deviation = σ2/√n2 = σX2

38 / 62

SLIDE 39

Notation III

Now by CLT, for large n: ¯ X1 ∼ N(µ1, σ2

1/n1)

¯ X2 ∼ N(µ2, σ2

2/n2)

and ¯ X1 − ¯ X2 ≈ N(µ1 − µ2, σ2

1

n1 + σ2

2

n2 )

39 / 62

SLIDE 40

Difference in proportions?

We’re done if the underlying variable is continuous. What if the underlying variable is Binomial? Then ¯ X1 − ¯ X2 ≈ N(µ1 − µ2, σ2

1

n1 + σ2

2

n2 )

is replaced by: ˆ p1 − ˆ p2 ≈ N(p1 − p2, p1(1 − p1) n1 + p2(1 − p2) n2 )

40 / 62

SLIDE 41

Sampling Distributions

Sampling Distribution Statistic Mean Variance ¯ X µ

σ2 n

¯ X1 − ¯ X2 µ1 - µ2

σ2

1

n1 + σ2

2

n2

ˆ p p

pq n

nˆ p np npq ˆ p1 − ˆ p2 p1 − p2

p1q1 n1 + p2q2 n2

41 / 62

SLIDE 42

Statistical inference

Two methods

Estimation Hypothesis testing

Both make use of sampling distributions Remember to use CLT

42 / 62

SLIDE 43

Estimation

Point estimation An estimator of a population parameter: a statistic (e.g. ¯ x, ˆ p) An estimate of a population parameter: the value of the estimator for a particular sample Interval estimation A point estimate plus an interval that expresses the uncertainty or variability associated with the estimate

43 / 62

SLIDE 44

Hypothesis Testing

Given the observed data, do we reject or accept a pre-specified null hypothesis in favor of an alternative? “Significance testing”

44 / 62

SLIDE 45

Point Estimation

¯ X is a point estimator of µ ¯ X1 − ¯ X2 is a point estimator of µ1 − µ2 ˆ p is a point estimator of p ˆ p1 − ˆ p2 is a point estimator of p1 − p2 We know the sampling distribution of these statistics, e.g. ¯ X ∼ N(µ ¯

X = µ, σ ¯ X = σ

√n) If σ is not known, we can use s, the sample standard deviation, as a point estimator of σ

45 / 62

SLIDE 46

Interval Estimation

100(1 − α)% Confidence interval: estimate ± (tabled value of z or t) · (standard error) Plugging in the values, we get ¯ X ± zα/2 × σ ¯

X = [L, U]

46 / 62

SLIDE 47

Confidence Interval

We are saying that P(−zα/2 ≤ Z ≤ zα/2) = 1 − α P(−zα/2 ≤ ¯ X − µ σ ¯

X

≤ zα/2) = 1 − α P(−zα/2 · σ ¯

X ≤ ¯

X − µ ≤ zα/2 · σ ¯

X)

= 1 − α After some algebra: P( ¯ X − zα/2 · σ ¯

X ≤ µ ≤ ¯

X + zα/2 · σ ¯

X)

= 1 − α P(L ≤ µ ≤ U) = 1 − α

47 / 62

SLIDE 48

CI for mean

A confidence interval for µ is given by the interval estimate ¯ X ± z(α/2) · σ ¯

X

when the population variance σ2 is known

48 / 62

SLIDE 49

Interpretation

Before the data are observed, the probability is at least (1 − α) that [L, U] will contain µ, the population parameter In repeated sampling from a normally distributed population, 100(1 − α)% of all intervals of the form above will include the the population mean µ After the data are observed, the constructed interval [L, U] either contains the true mean or it does not (no probability involved anymore)

49 / 62

SLIDE 50

Known Variance

Sampling from a normally distributed population with known variance (σ2 known) Confidence interval: ¯ X ± z(α/2) · σ ¯

X

What if σ2 is unknown?

50 / 62

SLIDE 51

The t-distribution

t Density df=2 df=5 df=20

t =

¯ X−µ s/√n

51 / 62

SLIDE 52

Use Sample Variance I

Sampling from a normally distributed population with population variance unknown We can make use of the sample variance s2 Now we construct the confidence interval as:

¯ X ± z(α/2) · s ¯

X when n is “large”

¯ X ± t(α/2,n−1) · s ¯

X when n is “small”

52 / 62

SLIDE 53

Use Sample Variance II

Estimate σ2 with s2 Here, s ¯

X = s √n and tα/2 has n-1 degrees of freedom

The distribution of ¯ X is not quite normal, so we need the t-distribution

53 / 62

SLIDE 54

Properties of the t-distribution

mean = median = mode = 0 Symmetric about the mean t ranges from −∞ to +∞ Family of distributions determined by n − 1, the degrees of freedom The t distribution approaches the normal distribution as n − 1 approaches ∞

54 / 62

SLIDE 55

Comparing t with normal

Density

Std. normal

t with df=2

55 / 62

SLIDE 56

Confidence intervals for means

Population Sample Population 95% Confidence Distribution Size Variance Interval Normal Any σ2 known ¯ X ± 1.96σ/√n Any σ2 unknown, use s2 ¯ X ± t0.025,n−1s/√n Not Normal/ Large σ2 known ¯ X ± 1.96σ/√n Unknown Large σ2 unknown, use s2 ¯ X ± 1.96s/√n Small Any Non-parametric methods Binomial Large

ˆ

p ± 1.96

ˆ

p(1 − ˆ p)/n Small

Exact methods

56 / 62

SLIDE 57

Confidence Intervals for Differences in Means

This is a bit tricky Recall that formulas for CIs for a single mean depend on

whether or not σ2 is known the sample size

For a difference in means, the formula for a CI depends on

whether or not the variances are assumed to be equal when variance are unknown sample sizes in each group

57 / 62

SLIDE 58

Equal Variances I

When variances are assumed to be equal: The standard error of the difference is estimated by:

s2

p

n1 + s2

p

n2 Here, s2

p is the pooled variance

58 / 62

SLIDE 59

Equal Variances II

s2

p = (n1 − 1)s2 1 + (n2 − 1)s2 2

n1 + n2 − 2 where df = n1 + n2 − 2 Recall, n1 is the size of sample 1, and n2 is the size of sample 2

59 / 62

SLIDE 60

Unequal Variances

When variances are assumed to be unequal: The standard error of the difference is estimated by:

s2

1

n1 + s2

2

n2 Here, df = ν and ν =

s2

1

n1 + s2

2

n2 (s2

1/n1)2

n1−1

+ (s2

2/n2)2

n2−1

60 / 62

SLIDE 61

Confidence intervals for difference of means

Population Sample Population 95% Confidence Distribution Size Variances Interval Normal Any known ( ¯ X1 − ¯ X2) ± 1.96

σ2

1

n1 + σ2

2

n2

Any unknown, ( ¯ X1 − ¯ X2) ± t0.025,n1+n2−2

s2

p

n1 + s2

p

n2

σ2

1 = σ2 2

Any unknown, ( ¯ X1 − ¯ X2) ± t0.025,ν

s2

1

n1 + s2

2

n2

σ2

1 = σ2 2

Large known ( ¯ X1 − ¯ X2) ± 1.96

σ2

1

n1 + σ2

2

n2

Not Normal/ Large unknown, ( ¯ X1 − ¯ X2) ± 1.96

s2

p

n1 + s2

p

n2

Unknown σ2

1 = σ2 2

Large unknown, ( ¯ X1 − ¯ X2) ± 1.96

s2

1

n1 + s2

2

n2

σ2

1 = σ2 2

Small Any Non-parametric methods

61 / 62

SLIDE 62

Confidence intervals for difference of proportions

Population Sample 95% Confidence Distribution Size Interval Binomial Large (ˆ p1 − ˆ p2) ± 1.96

ˆ

p1(1−ˆ p1) n1

+ ˆ

p2(1−ˆ p2) n2

Small Exact methods

62 / 62