Interval Estimation Edwin Leuven Interval estimation While an - - PowerPoint PPT Presentation

interval estimation
SMART_READER_LITE
LIVE PREVIEW

Interval Estimation Edwin Leuven Interval estimation While an - - PowerPoint PPT Presentation

Interval Estimation Edwin Leuven Interval estimation While an estimator may be unbiased or consistent, a given estimate will never equal the true value this point estimate does not give a sense of closeness, while the variance


slide-1
SLIDE 1

Interval Estimation

Edwin Leuven

slide-2
SLIDE 2

Interval estimation

While an estimator may be unbiased or consistent, a given estimate will never equal the true value

◮ this point estimate does not give a sense of “closeness”, while ◮ the variance estimate does not give a sense of location

We could try to combine both in statements like: “We are confident that θ lies somewhere between . . . and . . . ” where we would like to

  • 1. give a specific interval, and
  • 2. be precise about how confident we are

This is the aim of a confidence interval (CI), which is a particular type of probability interval

2/36

slide-3
SLIDE 3

Interval estimation

Formally we define a confidence interval as follows Pr(ˆ L < θ < ˆ U) = 1 − α where we construct estimates of a lower bound ˆ L and an upper bound ˆ U such that the interval [ˆ L, ˆ U] covers the parameter of interest with probability 1 − α We call 1 − α the confidence level CI’s are random intervals because they differ across random samples The confidence level is thus a probability relative to the sampling distribution!

3/36

slide-4
SLIDE 4

Probability intervals

We will consider probability intervals for continuous r.v. X Pr(a < X < b) with density f (x) and support on the real line this equals Pr(a < X < b) =

b

a

f (x)dx =

b

−∞

f (x)dx −

a

−∞

f (x)dx = F(b) − F(a) Note that since X is continuous Pr(X = x) = 0 and Pr(a < X < b) = Pr(a ≤ X < b) = Pr(a < X ≤ b) = Pr(a ≤ X ≤ b)

4/36

slide-5
SLIDE 5

Probability intervals (X ∼ χ2(3))

X Density a b Area = Pr(a < X < b) = F(b) − F(a)

5/36

slide-6
SLIDE 6

Probability intervals

We often need to compute either p = F(a) ≡ Pr(X ≤ a)

  • r

a = F −1(p) In R we can do this using the pxxx and qxxx functions, f.e. for the normal distribution: pnorm(1.96) ## [1] 0.9750021 qnorm(0.975) ## [1] 1.959964

6/36

slide-7
SLIDE 7

What is the probability that our estimator is close to θ?

We can write this as: Pr(|ˆ θ − θ| < ε) = Pr(θ − ε < ˆ θ < θ + ε)

θ ^ Density θ − ε θ θ + ε Area = ?

7/36

slide-8
SLIDE 8

Interval estimation

The probability that our estimator is no further than ε from θ equals Pr(θ − ε < ˆ θ < θ + ε) which is the probability that the r.v. ˆ θ is in a fixed interval with unknown boundaries Note though that we can rewrite this as follows Pr(θ − ε < ˆ θ < θ + ε) = Pr(ˆ θ − ε < θ < ˆ θ + ε) which is the probability that the random interval (ˆ θ − ε, ˆ θ + ε) covers the fixed number θ How do we construct such intervals and compute their corresponding confidence levels?

8/36

slide-9
SLIDE 9

CI for the mean – Normal data, variance known

Let X ∼ N(µ, σ2) then ¯ X ∼ N(µ, σ2/n) Now consider taking a random sample of size n, then Pr(µ − ε < ¯ X < µ + ε) = Pr

ε σ/√n < ¯ X − µ σ/√n < ε σ/√n

  • ε

σ/√n

  • − Φ

ε σ/√n

  • =2Φ
  • ε

σ/√n

  • − 1 = 1 − α

For a given confidence level 1 − α we get the following ε ε = z1−α/2 · σ/√n where z1−α/2 = Φ−1(1 − α/2)

9/36

slide-10
SLIDE 10

CI for the mean – Normal data, variance known

x Φ(x) zα 2 µ z1−α 2 α 2 1 − α 2

10/36

slide-11
SLIDE 11

CI for the mean – Normal data, variance known

Since Pr(µ − ε < ¯ X < µ + ε) = Pr(¯ X − ε < µ < ¯ X + ε), the following is a (1 − α)100% CI: (¯ X − z1−α/2 · σ/√n, ¯ X + z1−α/2 · σ/√n) For a given sample, µ will be either inside or outside this interval But before drawing the sample, there is a (1 − α)100% chance that an interval constructed this way will cover the true parameter µ

11/36

slide-12
SLIDE 12

CI for the mean – Normal data, variance known

For example if we set the confidence level at 1 − α = 0.90, then z1−0.10/2 = Φ−1(0.95) = −Φ−1(0.05) ≈ 1.645 qnorm(.95) ## [1] 1.6448536 With n = 10 random draws from X ∼ N(µ, 1) mean(rnorm(10)) ## [1] -0.38315741 we get the following 90% confidence interval: (−0.38 − 1.645 · 1/ √ 10, −0.38 + 1.645 · 1/ √ 10) ≈ (−0.90, 0.14)

12/36

slide-13
SLIDE 13

CI for the mean – Normal data, variance known

We know that we need to cover the true parameter 90% of the time: n = 10; nrep = 1e5; z = qnorm(0.95) cover = rep(F, nrep) for(i in 1:nrep) { x = rnorm(n, 0, 1) m = mean(x); se = 1 / sqrt(n) ci0 = m

  • z * se; ci1 = m

+ z * se; cover[i] = ci0 < 0 & 0 < ci1 } mean(cover) ## [1] 0.90217

13/36

slide-14
SLIDE 14

90% CI for the mean, n = 10

−2 −1 1 2 10 20 30 40 50 CI Sample nr. −2 −1 1 2 10 20 30 40 50

14/36

slide-15
SLIDE 15

90% CI for the mean, n = 40

−2 −1 1 2 10 20 30 40 50 CI Sample nr. −2 −1 1 2 10 20 30 40 50

15/36

slide-16
SLIDE 16

90% CI for the mean, n = 160

−2 −1 1 2 10 20 30 40 50 CI Sample nr. −2 −1 1 2 10 20 30 40 50

16/36

slide-17
SLIDE 17

90% CI for the mean, n = 10

−2 −1 1 2 10 20 30 40 50 CI Sample nr. −2 −1 1 2 10 20 30 40 50

17/36

slide-18
SLIDE 18

95% CI for the mean, n = 10

−2 −1 1 2 10 20 30 40 50 CI Sample nr. −2 −1 1 2 10 20 30 40 50

18/36

slide-19
SLIDE 19

99% CI for the mean, n = 10

−2 −1 1 2 10 20 30 40 50 CI Sample nr. −2 −1 1 2 10 20 30 40 50

19/36

slide-20
SLIDE 20

Computing sample size

Suppose you plan to collect data, and you want to know the sample size you need to achieve a certain level of confidence in our interval estimate Since ε = z1−α/2 · σ/√n Solving for n, we obtain n =

z1−α/2 · σ

ε

2

Note that we need a larger sample (n increases) if

◮ we require greater precision (ε decreases) ◮ we want to be more confident (α decreases) ◮ there is more dispersion in the population (σ increases)

20/36

slide-21
SLIDE 21

CI for the Variance – Normal data

The sample variance S2 = 1 n − 1

n

  • i=1

(Xi − ¯ X)2 is our estimator for the population variance When the Xi follow a normal distribution then (n − 1)S2/σ2 follows a so-called Chi-squared distribution with n − 1 degrees of freedom:

Chi-squared distribution

If Zi ∼ N(0, 1) and V = k

i=1 Z 2 i then

V ∼ χ2(k) where k are the degrees of freedom. E[V ] = k and Var(V ) = 2k.

21/36

slide-22
SLIDE 22

CI for the Variance – χ2 distribution

X Density 2 4 6 8 χ2(1) χ2(2) χ2(3) χ2(9)

22/36

slide-23
SLIDE 23

CI for the Variance – Normal data

Because the Chi-square distribution is asymmetric we need to make sure that we set the boundaries of the CI such that we have α/2 probability mass on each side: Pr(cn−1

.025 < (n − 1)S2/σ2 < cn−1 .975 ) = 0.95

where we can compute cn−1

p

in R using qchisq(p,k) We can rewrite the above as Pr((n − 1)S2/cn−1

.975 < σ2 < (n − 1)S2/cn−1 .025 ) = 0.95

23/36

slide-24
SLIDE 24

CI for the Variance – Normal data

S2 Density 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5

24/36

slide-25
SLIDE 25

CI for the Variance – Normal data

If α = .05 we know that we need to cover the true parameter 95%

  • f the time:

n = 4; nrep = 1e5 cover = rep(F, nrep) for(i in 1:nrep) { v = var(rnorm(n, 1, 10)) ci0 = (n - 1) * v / qchisq(.975, n - 1) ci1 = (n - 1) * v / qchisq(.025, n - 1) cover[i] = ci0 < 100 & 100 < ci1 } mean(cover) ## [1] 0.94955

25/36

slide-26
SLIDE 26

CI for the mean – Normal data, variance unknown

We considered X ∼ N(µ, σ2) and assumed we knew σ2 In practice we probably don’t know σ2, but we have an estimator S2 = 1 n − 1

n

  • i=1

(Xi − ¯ X)2 A simple solution is to replace σ with S:

  • ¯

x − zα/2S/√n, ¯ x + z1−α/2S/√n

  • But how does this work?

26/36

slide-27
SLIDE 27

CI for the mean – Normal data, variance unknown

If α = .1 we know that we need to cover the true parameter 90% of the time: n = 10; nrep = 1e5; z = qnorm(.95) cover = rep(F, nrep) for(i in 1:nrep) { x = rnorm(n, 1, 10) m = mean(x); se = sd(x) / sqrt(n) ci0 = m

  • z * se; ci1 = m

+ z * se; cover[i] = ci0 < 1 & 1 < ci1 } mean(cover) ## [1] 0.86477

27/36

slide-28
SLIDE 28

Student’s t-distribution

It turns out that Z = (¯ X − µ)/(S/√n) does not follow a Normal but a so-called t-distribution with n − 1 degrees of freedom

t distribution

If Z ∼ N(0, 1) and V ∼ χ2(k), Z and V are independent, and T = Z

  • V /k

then T ∼ t(k) where k are the degrees of freedom. E[T] = k and Var(T) = 2k.

28/36

slide-29
SLIDE 29

Student’s t-distribution

x Density N(0, 1) = t(∞) t(4) t(1)

29/36

slide-30
SLIDE 30

Student’s t-distribution

k (degrees of freedom) t1−α 2(k) z0.95 z0.975 z0.995 3 4 5 2 5 10 20 50

30/36

slide-31
SLIDE 31

CI for the mean – Normal data, variance unknown

We know that we need to cover the true parameter 95% of the time: n = 10; nrep = 1e5; z = qt(.975, n - 1) cover = rep(FALSE, nrep) for(i in 1:nrep) { x = rnorm(n, 1, 10) m = mean(x); se = sd(x) / sqrt(n) ci0 = m

  • z * se; ci1 = m

+ z * se; cover[i] = ci0 < 1 & 1 < ci1 } mean(cover) ## [1] 0.95102

31/36

slide-32
SLIDE 32

Computing sample size – unknown variance

We saw that n = z2

1−α/2 · σ2

ε2 This means that without knowing the population variance σ2 we cannot set the sample size When X is Binomial we know that Var(X) = p(1 − p), while this depends on p which is unknown, we know that 0 ≤ p(1 − p) ≤ 0.25 This means that n = z2

1−α/2 · σ2

ε2 ≤ z2

1−α/2 · 0.25

ε2

32/36

slide-33
SLIDE 33

CI for the mean – Non Normal data

Up until now we assumed that our data came from a Normal distribution It turns out that as long as our sample is large enough the Normal distribution is a good approximation thanks to the Central Limit Theorem

Central Limit Theorem

Let X1, . . . , Xn be i.i.d. random variables with E[Xi] = µ and Var(Xi) = σ2 < ∞ then ¯ X − µ σ/√n → N(0, 1) Consider the sampling distribution of ¯ X when X ∼ χ2(3)

33/36

slide-34
SLIDE 34

CI for the mean – Non Normal data

x Density E[X]

34/36

slide-35
SLIDE 35

CI’s for statistics other than the mean

Beyond the scope of this course, but some pointers:

◮ when sampling distributions are known, use these, otherwise

CI’s when n is large

◮ use the bootstrap

CI when n is small

◮ rely on non-parametric or permutation tests

35/36

slide-36
SLIDE 36

Summary

Confidence Intervals (CI’s) are random intervals that cover the true parameter with a given probability We call this probability the confidence level, and 0.95 is commonly used Pay attention to the interpretation!!

◮ Before drawing the sample, there is a (1 − α)100% chance that

an interval constructed this way will cover the true parameter We saw how to construct CI’s for the mean For a given confidence level we can set CI widths by choosing the appropriate sample sizes

36/36