Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell - - PowerPoint PPT Presentation

โ–ถ
gov 2000 4 sums means and limit theorems
SMART_READER_LITE
LIVE PREVIEW

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell - - PowerPoint PPT Presentation

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60 1. Sums and Means of Random Variables 2. Useful Inequalities 3. Law of Large Numbers 4. Central Limit Theorem 5. More Exotic CLTs* 6. Wrap-up 2 / 60 Where


slide-1
SLIDE 1

Gov 2000: 4. Sums, Means, and Limit Theorems

Matthew Blackwell

Fall 2016

1 / 60

slide-2
SLIDE 2
  • 1. Sums and Means of Random Variables
  • 2. Useful Inequalities
  • 3. Law of Large Numbers
  • 4. Central Limit Theorem
  • 5. More Exotic CLTs*
  • 6. Wrap-up

2 / 60

slide-3
SLIDE 3

Where are we? Where are we going?

  • Probability: formal way to quantify uncertain
  • utcomes/random variables.
  • Last week: how to work with multiple r.v.s at the same time.
  • This week: applying those ideas to study large random

samples

3 / 60

slide-4
SLIDE 4

Large random samples

  • In real data, we will have a set of ๐‘œ measurements on a

variable: ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ๐‘œ

  • Or we might have a set of ๐‘œ measurements on two variables:

(๐‘Œ1, ๐‘1), (๐‘Œ2, ๐‘2), โ€ฆ , (๐‘Œ๐‘œ, ๐‘๐‘œ)

  • Empirical analyses: sums or means of these ๐‘œ measurements

โ–ถ Almost all statistical procedures involve a sum/mean. โ–ถ What are the properties of these sums and means? โ–ถ Can they tell us anything about the distribution of ๐‘Œ๐‘—?

  • Asymptotics: what can we learn as ๐‘œ gets big?

4 / 60

slide-5
SLIDE 5

1/ Sums and Means of Random Variables

5 / 60

slide-6
SLIDE 6

Sums and means are random variables

  • If ๐‘Œ1 and ๐‘Œ2 are r.v.s, then ๐‘Œ1 + ๐‘Œ2 is a r.v.

โ–ถ Has a mean ๐”ฝ[๐‘Œ1 + ๐‘Œ2] and a variance ๐•Ž[๐‘Œ1 + ๐‘Œ2]

  • The sample mean is a function of sums and so it is a r.v. too:

ฬ… ๐‘Œ = ๐‘Œ1 + ๐‘Œ2 2

6 / 60

slide-7
SLIDE 7

Distribution of sums/means

๐‘Œ1 ๐‘Œ2 ๐‘Œ1 + ๐‘Œ2 ฬ… ๐‘Œ draw 1 20 71 91 45.5 draw 2 12 66 78 39 draw 3 59 75 134 67 draw 4 3 58 61 30.5 โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ distribution

  • f the sum

distribution

  • f the mean

7 / 60

slide-8
SLIDE 8

Independent and identical r.v.s

  • We often will work with independent and identically

distributed r.v.s, ๐‘Œ1, โ€ฆ , ๐‘Œ๐‘œ

โ–ถ Random sample of ๐‘œ respondents on a survey question. โ–ถ Written โ€œi.i.d.โ€

  • Independent: ๐‘Œ๐‘— โŸ‚

โŸ‚ ๐‘Œ๐‘˜ for all ๐‘— โ‰  ๐‘˜

  • Identically distributed: ๐‘”๐‘Œ๐‘—(๐‘ฆ) is the same for all ๐‘—

โ–ถ ๐”ฝ[๐‘Œ๐‘—] = ๐œˆ for all ๐‘— โ–ถ ๐•Ž[๐‘Œ๐‘—] = ๐œ2 for all ๐‘— 8 / 60

slide-9
SLIDE 9

Distribution of the sample mean

  • Sample mean of i.i.d. r.v.s: ๐‘Œ๐‘œ = 1

๐‘œ โˆ‘๐‘œ ๐‘—=1 ๐‘Œ๐‘—

  • ๐‘Œ๐‘œ is a random variable, what is its distribution?

โ–ถ What is the expectation of this distribution, ๐”ฝ[๐‘Œ๐‘œ]? โ–ถ What is the variance of this distribution, ๐•Ž[๐‘Œ๐‘œ]? โ–ถ What is the p.d.f. of the distribution?

  • How do they relate to the expectation, variance of ๐‘Œ1, โ€ฆ , ๐‘Œ๐‘œ?

9 / 60

slide-10
SLIDE 10

Properties of the sample mean

Mean and variance of the sample mean

Suppose that ๐‘Œ1, โ€ฆ , ๐‘Œ๐‘œ is are i.i.d. r.v.s with ๐”ฝ[๐‘Œ๐‘—] = ๐œˆ and ๐•Ž[๐‘Œ๐‘—] = ๐œ2. Then: ๐”ฝ[๐‘Œ๐‘œ] = ๐œˆ ๐•Ž[๐‘Œ๐‘œ] = ๐œ2 ๐‘œ

  • Key insights:

โ–ถ Sample mean get the right answer on average โ–ถ Variance of ๐‘Œ๐‘œ depends on the variance of ๐‘Œ๐‘— and the sample

size

โ–ถ Not dependent on the (full) distribution of ๐‘Œ๐‘—!

  • Standard error of the sample mean: โˆš๐•Ž[๐‘Œ๐‘œ] = ๐œ

โˆš๐‘œ

  • Youโ€™ll prove both of these facts in this weekโ€™s HW.

10 / 60

slide-11
SLIDE 11

2/ Useful Inequalities

11 / 60

slide-12
SLIDE 12

Why inequalities?

  • Behavior of r.v.s depend on their distribution, but we often

donโ€™t know (or donโ€™t want to assume) a distribution.

  • Today, weโ€™ll discuss results for r.v.s with any distribution

subject to some restrictions like fjnite variance.

  • Why study these?

โ–ถ Build toward massively important results like LLN โ–ถ Inequalities used regularly throughout statistics โ–ถ Gives us some practice with proofs/analytic reasoning 12 / 60

slide-13
SLIDE 13

Markov Inequality

Markov Inequality

Suppose that ๐‘Œ is r.v. such that โ„™(๐‘Œ โ‰ฅ 0) = 1. Then, for every real number ๐‘ข > 0, โ„™(๐‘Œ โ‰ฅ ๐‘ข) โ‰ค ๐”ฝ[๐‘Œ] ๐‘ข .

  • For instance, if we know that ๐”ฝ[๐‘Œ] = 1, then

โ„™(๐‘Œ โ‰ฅ 100) โ‰ค 0.01

  • Once we know the mean of a r.v., it limits how much

probability can be in the tail.

13 / 60

slide-14
SLIDE 14

Markov Inequality Proof

  • For discrete ๐‘Œ:

๐”ฝ[๐‘Œ] = โˆ‘

๐‘ฆ

๐‘ฆ๐‘”๐‘Œ(๐‘ฆ) = โˆ‘

๐‘ฆ<๐‘ข

๐‘ฆ๐‘”๐‘Œ(๐‘ฆ) + โˆ‘

๐‘ฆโ‰ฅ๐‘ข

๐‘ฆ๐‘”๐‘Œ(๐‘ฆ)

  • Because ๐‘Œ is nonnegative, ๐”ฝ[๐‘Œ] โ‰ฅ โˆ‘๐‘ฆโ‰ฅ๐‘ข ๐‘ฆ๐‘”๐‘Œ(๐‘ฆ)
  • Since ๐‘ฆ โ‰ฅ ๐‘ข, then โˆ‘๐‘ฆโ‰ฅ๐‘ข ๐‘ฆ๐‘”๐‘Œ(๐‘ฆ) โ‰ฅ โˆ‘๐‘ฆโ‰ฅ๐‘ข ๐‘ข๐‘”๐‘Œ(๐‘ฆ)
  • But this is just โˆ‘๐‘ฆโ‰ฅ๐‘ข ๐‘ข๐‘”๐‘Œ(๐‘ฆ) = ๐‘ข โˆ‘๐‘ฆโ‰ฅ๐‘ข ๐‘”๐‘Œ(๐‘ฆ) = ๐‘ขโ„™(๐‘Œ โ‰ฅ ๐‘ข)
  • Implies ๐”ฝ[๐‘Œ] โ‰ฅ ๐‘ขโ„™(๐‘Œ โ‰ฅ ๐‘ข)

14 / 60

slide-15
SLIDE 15

Chebyshev Inequality

Chebyshev Inequality

Suppose that ๐‘Œ is r.v. for which ๐•Ž[๐‘Œ] < โˆž. Then, for every real number ๐‘ข > 0, โ„™(|๐‘Œ โˆ’ ๐”ฝ[๐‘Œ]| โ‰ฅ ๐‘ข) โ‰ค ๐•Ž[๐‘Œ] ๐‘ข2 .

  • The variance places limits on how far an observation can be

from its mean.

15 / 60

slide-16
SLIDE 16

Proof of Chebyshev

  • Let ๐‘ = (๐‘Œ โˆ’ ๐”ฝ[๐‘Œ])2

โ–ถ โ‡ โ„™(๐‘ โ‰ฅ 0) = 1 (nonnegative) โ–ถ ๐”ฝ[๐‘] = ๐”ฝ[(๐‘Œ โˆ’ ๐”ฝ[๐‘Œ])2] = ๐•Ž[๐‘Œ] (defjnition of variance)

  • Note that if |๐‘Œ โˆ’ ๐”ฝ[๐‘Œ]| โ‰ฅ ๐‘ข then ๐‘ โ‰ฅ ๐‘ข2 because we just

squared both sides.

  • Thus, โ„™(|๐‘Œ โˆ’ ๐”ฝ[๐‘Œ]| โ‰ฅ ๐‘ข) = โ„™(๐‘ โ‰ฅ ๐‘ข2)
  • Apply Markovโ€™s inequality:

โ„™(|๐‘Œ โˆ’ ๐”ฝ[๐‘Œ]| โ‰ฅ ๐‘ข) = โ„™(๐‘ โ‰ฅ ๐‘ข2) โ‰ค ๐”ฝ[๐‘] ๐‘ข2 = ๐•Ž[๐‘Œ] ๐‘ข2

16 / 60

slide-17
SLIDE 17

Application: planning a survey

  • Suppose we want to estimate the proportion of voters who will

vote for Donald Trump, ๐‘ž, from a random sample of size ๐‘œ.

โ–ถ ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ๐‘œ indicating voting intention for Trump for each

respondent.

โ–ถ By our earlier, calculation, ๐”ฝ[๐‘Œ๐‘œ] = ๐‘ž and ๐•Ž[๐‘Œ๐‘œ] = ๐œ2

๐‘œ

โ–ถ Since this is a Bernoulli r.v., we have ๐œ2 = ๐‘ž(1 โˆ’ ๐‘ž)

  • What does ๐‘œ need to be to have at least 0.95 probability that

๐‘Œ๐‘œ is within 0.02 of the true ๐‘ž?

โ–ถ How to guarantee a margin of error of ยฑ 2 percentage points? 17 / 60

slide-18
SLIDE 18

Application: planning a survey

  • What does ๐‘œ have to be so that

โ„™(|๐‘Œ๐‘œ โˆ’ ๐‘ž| โ‰ค 0.02) โ‰ฅ 0.95 โŸบ โ„™(|๐‘Œ๐‘œ โˆ’ ๐‘ž| โ‰ฅ 0.02) โ‰ค 0.05

  • Applying Chebyshev:

โ„™(|๐‘Œ๐‘œ โˆ’ ๐‘ž| โ‰ฅ 0.02) โ‰ค ๐•Ž[๐‘Œ๐‘œ] 0.022 = ๐‘ž(1 โˆ’ ๐‘ž) 0.0004๐‘œ

  • We donโ€™t know ๐•Ž[๐‘Œ๐‘—] = ๐‘ž(1 โˆ’ ๐‘ž), but:

โ–ถ Conservative to use largest possible variance. โ–ถ It canโ€™t be bigger than ๐‘ž(1 โˆ’ ๐‘ž) โ‰ค (1/2) โ‹… (1/2) = (1/4)

โ„™(|๐‘Œ๐‘œ โˆ’ ๐‘ž| โ‰ฅ 0.02) โ‰ค ๐‘ž(1 โˆ’ ๐‘ž) 0.0004๐‘œ โ‰ค 1 0.0016๐‘œ

  • We want this probability to be bounded by 0.05 so we need

(1/0.0016๐‘œ) โ‰ค 0.05, which gives us ๐‘œ โ‰ฅ 12, 500!!

18 / 60

slide-19
SLIDE 19

Application: planning a survey

  • Do we really need ๐‘œ โ‰ฅ 12, 500 to get a margin of error of ยฑ2

percentage points?

  • No! Chebyshev provides a bound that is guaranteed to hold,

but actual probabilities are much smaller.

โ–ถ Weโ€™re also using the โ€œworst-caseโ€ variance of 0.25.

  • Letโ€™s simulate 1000 samples of size ๐‘œ = 12500 with ๐‘ž = 0.4

and show the distribution of the means.

โ–ถ What proportion of these are within 0.02 of ๐‘ž? 19 / 60

slide-20
SLIDE 20

Application: planning a survey

nsims <- 1000 holder <- rep(NA, times = nsims) for (i in 1:nsims) { this.samp <- rbinom(n = 12500, size = 1, prob = 0.4) holder[i] <- mean(this.samp) } mean(abs(holder - 0.4) > 0.02) ## [1] 0

  • 0.03
  • 0.02
  • 0.01

0.00 0.01 0.02 0.03 20 40 60 80 xn โˆ’ p Density

20 / 60

slide-21
SLIDE 21

3/ Law of Large Numbers

21 / 60

slide-22
SLIDE 22

Current knowledge

  • For i.i.d. r.v.s, ๐‘Œ1, โ€ฆ , ๐‘Œ๐‘œ, with ๐”ฝ[๐‘Œ๐‘—] = ๐œˆ and ๐•Ž[๐‘Œ๐‘—] = ๐œ2 we

know that:

โ–ถ Expectation is ๐”ฝ[๐‘Œ๐‘œ] = ๐”ฝ[๐‘Œ๐‘—] = ๐œˆ โ–ถ Variance is ๐•Ž[๐‘Œ๐‘œ] = ๐œ2

๐‘œ where ๐œ2 = ๐•Ž[๐‘Œ๐‘—]

โ–ถ Some bounds on tail probabilities from Chebyshev. โ–ถ None of these rely on a specifjc distribution for ๐‘Œ๐‘—!

  • Can we say more about the distribution of the sample mean?
  • Yes, but we need to think about how ๐‘Œ๐‘œ changes as ๐‘œ gets big.

22 / 60

slide-23
SLIDE 23

Sequence of sample means

  • What can we say about the sample mean ๐‘œ gets large?
  • Need to think about sequences of sample means with

increasing ๐‘œ: ๐‘Œ1 = ๐‘Œ1 ๐‘Œ2 = (1/2) โ‹… (๐‘Œ1 + ๐‘Œ2) ๐‘Œ3 = (1/3) โ‹… (๐‘Œ1 + ๐‘Œ2 + ๐‘Œ3) ๐‘Œ4 = (1/4) โ‹… (๐‘Œ1 + ๐‘Œ2 + ๐‘Œ3 + ๐‘Œ4) ๐‘Œ5 = (1/5) โ‹… (๐‘Œ1 + ๐‘Œ2 + ๐‘Œ3 + ๐‘Œ4 + ๐‘Œ5) โ‹ฎ ๐‘Œ๐‘œ = (1/๐‘œ) โ‹… (๐‘Œ1 + ๐‘Œ2 + ๐‘Œ3 + ๐‘Œ4 + ๐‘Œ5 + โ‹ฏ + ๐‘Œ๐‘œ)

  • Note: this is a sequence of random variables!

23 / 60

slide-24
SLIDE 24

Convergence in Probability

Convergence in probability

A sequence of random variables, ๐‘Ž1, ๐‘Ž2, โ€ฆ, is said to converge in probability to a value ๐‘ if for every ๐œ > 0, โ„™(|๐‘Ž๐‘œ โˆ’ ๐‘| > ๐œ) โ†’ 0, as ๐‘œ โ†’ โˆž. We write this ๐‘Ž๐‘œ

๐‘ž

โ†’ ๐‘.

  • Basically: probability that ๐‘Ž๐‘œ lies outside any (teeny, tiny)

interval around ๐‘ approaches 0 as ๐‘œ โ†’ โˆž

  • Wooldridge writes plim(๐‘Ž๐‘œ) = ๐‘ if ๐‘Ž๐‘œ

๐‘ž

โ†’ ๐‘.

24 / 60

slide-25
SLIDE 25

Law of large numbers

Theorem: Weak Law of Large Numbers

Let ๐‘Œ1, โ€ฆ , ๐‘Œ๐‘œ be a an i.i.d. draws from a distribution with mean ๐œˆ and fjnite variance ๐œ2. Let ๐‘Œ๐‘œ = 1

๐‘œ โˆ‘๐‘œ ๐‘—=1 ๐‘Œ๐‘—. Then, ๐‘Œ๐‘œ ๐‘ž

โ†’ ๐œˆ.

  • Intuition: The probability of ๐‘Œ๐‘œ being โ€œfar awayโ€ from ๐œˆ goes

to 0 as ๐‘œ gets big.

โ–ถ The distribution of ๐‘Œ๐‘œ โ€œcollapsesโ€ on ๐œˆ

  • No assumptions about the distribution of ๐‘Œ๐‘— beyond i.i.d. and

a fjnite variance!

25 / 60

slide-26
SLIDE 26

LLN proof

  • Proof: by Chebyshev and properties of probabilities, we have

0 โ‰ค โ„™(|๐‘Œ๐‘œ โˆ’ ๐œˆ| โ‰ฅ ๐œ) โ‰ค ๐•Ž[๐‘Œ๐‘œ] ๐œ2 = ๐œ2 ๐‘œ๐œ2

  • As ๐‘œ โ†’ โˆž, we know that ๐œ2/๐‘œ๐œ2 โ†’ 0 which by the sandwich

theorem implies lim

๐‘œโ†’โˆž โ„™(|๐‘Œ๐‘œ โˆ’ ๐œˆ| > ๐œ) = 0

26 / 60

slide-27
SLIDE 27

LLN by simulation in R

  • Draw difgerent sample sizes from Exponential distribution with

rate 0.5

  • โ‡ ๐”ฝ[๐‘Œ๐‘—] = 2

nsims <- 10000 holder <- matrix(NA, nrow = nsims, ncol = 6) for (i in 1:nsims) { s5 <- rexp(n = 5, rate = 0.5) s15 <- rexp(n = 15, rate = 0.5) s30 <- rexp(n = 30, rate = 0.5) s100 <- rexp(n = 100, rate = 0.5) s1000 <- rexp(n = 1000, rate = 0.5) s10000 <- rexp(n = 10000, rate = 0.5) holder[i, 1] <- mean(s5) holder[i, 2] <- mean(s15) holder[i, 3] <- mean(s30) holder[i, 4] <- mean(s100) holder[i, 5] <- mean(s1000) holder[i, 6] <- mean(s10000) }

27 / 60

slide-28
SLIDE 28

LLN in action

1 2 3 4 1 2 3 4 5 6 Density n = 15

  • Distribution of ๐‘Œ15

28 / 60

slide-29
SLIDE 29

LLN in action

1 2 3 4 1 2 3 4 5 6 Density n = 30

  • Distribution of ๐‘Œ30

29 / 60

slide-30
SLIDE 30

LLN in action

1 2 3 4 1 2 3 4 5 6 Density n = 100

  • Distribution of ๐‘Œ100

30 / 60

slide-31
SLIDE 31

LLN in action

1 2 3 4 1 2 3 4 5 6 Density n = 1000

  • Distribution of ๐‘Œ1000

31 / 60

slide-32
SLIDE 32

Properties of convergence in probability

  • 1. if ๐‘Œ๐‘œ

๐‘ž

โ†’ ๐‘‘, then ๐‘•(๐‘Œ๐‘œ)

๐‘ž

โ†’ ๐‘•(๐‘‘) for any continuous function ๐‘•.

  • 2. if ๐‘Œ๐‘œ

๐‘ž

โ†’ ๐‘ and ๐‘Ž๐‘œ

๐‘ž

โ†’ ๐‘, then

โ–ถ ๐‘Œ๐‘œ + ๐‘Ž๐‘œ

๐‘ž

โ†’ ๐‘ + ๐‘

โ–ถ ๐‘Œ๐‘œ๐‘Ž๐‘œ

๐‘ž

โ†’ ๐‘๐‘

โ–ถ ๐‘Œ๐‘œ/๐‘Ž๐‘œ

๐‘ž

โ†’ ๐‘/๐‘ if ๐‘ > 0

  • Thus, by LLN:

โ–ถ (๐‘Œ๐‘œ)

2 ๐‘ž

โ†’ ๐œˆ2

โ–ถ log(๐‘Œ๐‘œ)

๐‘ž

โ†’ log(๐œˆ)

32 / 60

slide-33
SLIDE 33

4/ Central Limit Theorem

33 / 60

slide-34
SLIDE 34

Current knowledge

  • For i.i.d. r.v.s, ๐‘Œ1, โ€ฆ , ๐‘Œ๐‘œ, with ๐”ฝ[๐‘Œ๐‘—] = ๐œˆ and ๐•Ž[๐‘Œ๐‘—] = ๐œ2 we

know that:

โ–ถ ๐”ฝ[๐‘Œ๐‘œ] = ๐œˆ and ๐•Ž[๐‘Œ๐‘œ] = ๐œ2

๐‘œ

โ–ถ ๐‘Œ๐‘œ converges to ๐œˆ as ๐‘œ gets big โ–ถ Chebyshev provides some bounds on probabilities. โ–ถ Still no distributional assumptions about ๐‘Œ๐‘—!

  • Can we say more?

โ–ถ Can we approximate Pr(๐‘ < ๐‘Œ๐‘œ < ๐‘)? โ–ถ What family of distributions (Binomial, Uniform, Gamma,

etc)?

  • Again, need to analyze when ๐‘œ is large.

34 / 60

slide-35
SLIDE 35

Convergence in Distribution

Convergence in distribution

Let ๐‘Ž1, ๐‘Ž2, โ€ฆ, be a sequence of r.v.s, and for ๐‘œ = 1, 2, โ€ฆ let ๐บ๐‘œ(๐‘จ) be the c.d.f. of ๐‘Ž๐‘œ. Then it is said that ๐‘Ž1, ๐‘Ž2, โ€ฆ converges in distribution to r.v. ๐‘‹ with c.d.f. ๐บ๐‘‹ if lim

๐‘œโ†’โˆž ๐บ๐‘œ(๐‘ฆ) = ๐บ๐‘‹(๐‘ฆ),

which we write as ๐‘Ž๐‘œ

๐‘’

โ†’ ๐‘‹.

  • Basically: when ๐‘œ is big, the distribution of ๐‘Ž๐‘œ is very similar

to the distribution of ๐‘‹

  • We use c.d.f.s here to avoid messy details with discrete vs

continuous.

  • If ๐‘Œ๐‘œ

๐‘ž

โ†’ ๐‘Œ, then ๐‘Œ๐‘œ

๐‘’

โ†’ ๐‘Œ

35 / 60

slide-36
SLIDE 36

Standardizing an r.v.

  • Common to standardize a r.v. by subtracting its expectation

and dividing by its standard deviation: ๐‘Ž = ๐‘Œ โˆ’ ๐”ฝ[๐‘Œ] โˆš๐•Ž[๐‘Œ]

  • Possible to show that for any ๐‘Œ, we have (try to prove these

to yourself):

โ–ถ ๐”ฝ[๐‘Ž] = 0 โ–ถ ๐•Ž[๐‘Ž] = 1

  • Sometimes called a z-score.

36 / 60

slide-37
SLIDE 37

Central Limit Theorem

Central Limit Theorem

Let ๐‘Œ1, โ€ฆ , ๐‘Œ๐‘œ be i.i.d. r.v.s from a distribution with mean ๐œˆ and variance 0 < ๐œ2 < โˆž. Then, ๐‘Œ๐‘œ โˆ’ ๐œˆ ๐œ/โˆš๐‘œ

๐‘’

โ†’ ๐‘‚(0, 1).

  • Distribution free! We donโ€™t have to make specifjc assumptions

about the distribution of ๐‘Œ๐‘—

  • Implies that ๐‘Œ๐‘œ โˆผ ๐‘‚(๐œˆ, ๐œ2/๐‘œ)

โ–ถ โ‡ easy approximations to probability statements about ๐‘Œ๐‘œ

when ๐‘œ is big!

37 / 60

slide-38
SLIDE 38

CLT by simulation in R

set.seed(2138) nsims <- 10000 holder2 <- matrix(NA, nrow = nsims, ncol = 6) for (i in 1:nsims) { s5 <- rbinom(n = 5, size = 1, prob = 0.25) s15 <- rbinom(n = 15, size = 1, prob = 0.25) s30 <- rbinom(n = 30, size = 1, prob = 0.25) s100 <- rbinom(n = 100, size = 1, prob = 0.25) s1000 <- rbinom(n = 1000, size = 1, prob = 0.25) s10000 <- rbinom(n = 10000, size = 1, prob = 0.25) holder2[i, 1] <- mean(s5) holder2[i, 2] <- mean(s15) holder2[i, 3] <- mean(s30) holder2[i, 4] <- mean(s100) holder2[i, 5] <- mean(s1000) holder2[i, 6] <- mean(s10000) }

38 / 60

slide-39
SLIDE 39

CLT in action

  • 3
  • 2
  • 1

1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Density n = 5

  • Distribution of ๐‘Œ5โˆ’๐œˆ

๐œ/โˆš5

39 / 60

slide-40
SLIDE 40

CLT in action

  • 3
  • 2
  • 1

1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Density n = 15

  • Distribution of ๐‘Œ15โˆ’๐œˆ

๐œ/โˆš15

40 / 60

slide-41
SLIDE 41

CLT in action

  • 3
  • 2
  • 1

1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Density n = 30

  • Distribution of ๐‘Œ30โˆ’๐œˆ

๐œ/โˆš30

41 / 60

slide-42
SLIDE 42

CLT in action

  • 3
  • 2
  • 1

1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Density n = 100

  • Distribution of ๐‘Œ100โˆ’๐œˆ

๐œ/โˆš100

42 / 60

slide-43
SLIDE 43

CLT in action

  • 3
  • 2
  • 1

1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Density n = 10000

  • Distribution of ๐‘Œ10000โˆ’๐œˆ

๐œ/โˆš10000

43 / 60

slide-44
SLIDE 44

Empirical Rule for the Normal Distribution

  • 4
  • 2

2 4

  • If ๐‘Ž โˆผ ๐‘‚(0, 1), then the following are roughly true:

44 / 60

slide-45
SLIDE 45

Empirical Rule for the Normal Distribution

  • 4
  • 2

2 4 0.68

  • If ๐‘Ž โˆผ ๐‘‚(0, 1), then the following are roughly true:
  • Roughly 68% of the distribution of ๐‘Ž is between -1 and 1.

45 / 60

slide-46
SLIDE 46

Empirical Rule for the Normal Distribution

  • 4
  • 2

2 4 0.95

  • If ๐‘Ž โˆผ ๐‘‚(0, 1), then the following are roughly true:
  • Roughly 68% of the distribution of ๐‘Ž is between -1 and 1.
  • Roughly 95% of the distribution of ๐‘Ž is between -2 and 2.

46 / 60

slide-47
SLIDE 47

Empirical Rule for the Normal Distribution

  • 4
  • 2

2 4 0.997

  • If ๐‘Ž โˆผ ๐‘‚(0, 1), then the following are roughly true:
  • Roughly 68% of the distribution of ๐‘Ž is between -1 and 1.
  • Roughly 95% of the distribution of ๐‘Ž is between -2 and 2.
  • Roughly 99.7% of the distribution of ๐‘Ž is between -3 and 3.

47 / 60

slide-48
SLIDE 48

Simulating the empirical rule

  • Actual probability of ๐‘Ž โˆผ ๐‘‚(0, 1) between โˆ’2 and 2:

pnorm(2) - pnorm(-2) ## [1] 0.9545

  • Simulated probability of ๐‘Œ๐‘œโˆ’๐œˆ

๐œ/โˆš๐‘œ between โˆ’2 and 2:

โ–ถ ๐‘œ = 15 โ‡ 0.9683 โ–ถ ๐‘œ = 30 โ‡ 0.9666 โ–ถ ๐‘œ = 100 โ‡ 0.9523 โ–ถ ๐‘œ = 1000 โ‡ 0.9551 โ–ถ ๐‘œ = 10000 โ‡ 0.9546

  • Quality of the approximation depends on the underlying

distribution of the ๐‘Œ๐‘—

โ–ถ Obviously if ๐‘Œ๐‘— โˆผ ๐‘‚(0, 1) itโ€™s going to be perfect with ๐‘œ = 1 48 / 60

slide-49
SLIDE 49

Slustskyโ€™s Theorem

  • Let ๐‘Œ1, ๐‘Œ2, โ€ฆ converge in distribution to some r.v. ๐‘Œ
  • Let ๐‘1, ๐‘2, โ€ฆ converge in probability to some number, ๐‘‘
  • Slutskyโ€™s Theorem gives the following result:
  • 1. ๐‘Œ๐‘œ๐‘๐‘œ converges in distribution to ๐‘‘๐‘Œ
  • 2. ๐‘Œ๐‘œ + ๐‘๐‘œ converges in distribution to ๐‘Œ + ๐‘‘
  • Extremely useful when trying to fjgure out what the

large-sample distribution of an estimator is.

49 / 60

slide-50
SLIDE 50

Application: planning a survey

  • Trump example: we want the the probability of being within

0.02 from the true ๐‘ž to be 95%.

  • โ‡ we want ๐‘œ such that:

โ„™(|๐‘Œ๐‘œ โˆ’ ๐‘ž| > 0.02) โ‰ค 0.05

  • By the CLT, if ๐‘œ is large, then

๐‘Œ๐‘œ โˆ’ ๐‘ž โ‰ˆ ๐‘‚ (0, ๐œ2/๐‘œ)

  • We know ๐œ2 โ‰ค 1/4, so to be conservative:

โ–ถ ๐‘Œ๐‘œ โˆ’ ๐‘ž โ‰ˆ ๐‘‚ (0, 1

4๐‘œ)

โ–ถ Standardizing โ‡ ๐‘Ž = (๐‘Œ๐‘œโˆ’๐‘ž)

1/โˆš4๐‘œ = 2โˆš๐‘œ(๐‘Œ๐‘œ โˆ’ ๐‘ž) โ‰ˆ ๐‘‚(0, 1)

  • Easier to work with standardized r.v.:

โ„™(|๐‘Œ๐‘œ โˆ’ ๐‘ž| > 0.02) โ‰ค 0.05 โŸบ โ„™(|๐‘Ž| > 0.02 ร— 2โˆš๐‘œ) โ‰ค 0.05

50 / 60

slide-51
SLIDE 51

Application: planning a survey

  • We want:

โ„™(|๐‘Ž| > 0.04โˆš๐‘œ) โ‰ค 0.05 โ„™(๐‘Ž < โˆ’0.04โˆš๐‘œ) + โ„™(๐‘Ž > 0.04โˆš๐‘œ) โ‰ค 0.05

  • The standard normal is symmetric around 0, so:

โ–ถ Upper tail probs = lower tail probs โ–ถ โ„™(๐‘Ž < โˆ’0.04โˆš๐‘œ) = โ„™(๐‘Ž > 0.04โˆš๐‘œ)

  • Allow us to simplify:

2 ร— โ„™(๐‘Ž < โˆ’0.04โˆš๐‘œ) โ‰ค 0.05 โ„™(๐‘Ž < โˆ’0.04โˆš๐‘œ) โ‰ค 0.025

  • To solve for ๐‘œ, we need to know ๐‘Ÿ such that โ„™(๐‘Ž โ‰ค ๐‘Ÿ) = 0.025

โ–ถ Inverse of the c.d.f. called the quantile: ๐‘Ÿ = ๐บโˆ’1(0.025) โ–ถ ๐‘Ÿ = ๐บโˆ’1(๐‘ž) is the (smallest) value of the r.v. such that

โ„™(๐‘Œ โ‰ค ๐‘Ÿ) = ๐บ(๐‘Ÿ) โ‰ฅ ๐‘ž

51 / 60

slide-52
SLIDE 52

Application: planning a survey

  • We can use the qnorm() function in R:

qnorm(0.025, mean = 0, sd = 1) ## [1] -1.96

  • if โˆ’0.04โˆš๐‘œ โ‰ค ๐‘Ÿ, then โ„™ (๐‘Ž < โˆ’0.04โˆš๐‘œ) โ‰ค 0.025
  • So, we need โˆ’0.04โˆš๐‘œ โ‰ค โˆ’1.96 or ๐‘œ > 2401
  • Much lower than the 12,500 from Chebyshev.

52 / 60

slide-53
SLIDE 53

Application: planning a survey

nsims <- 1000 holder <- rep(NA, times = nsims) for (i in 1:nsims) { this.samp <- rbinom(n = 2401, size = 1, prob = 0.4) holder[i] <- mean(this.samp) } mean(abs(holder - 0.4) > 0.02) ## [1] 0.052

  • 0.03
  • 0.02
  • 0.01

0.00 0.01 0.02 0.03 10 20 30 40 xn โˆ’ p Density

53 / 60

slide-54
SLIDE 54

5/ More Exotic CLTs*

54 / 60

slide-55
SLIDE 55

CLT for non-iid r.v.s

  • What if we donโ€™t have i.i.d. r.v.s? Does the CLT still apply?
  • Let ๐‘Œ1, ๐‘Œ2, โ€ฆ be independent (but not identically distributed)

with means ๐”ฝ[๐‘Œ๐‘—] = ๐œˆ๐‘— and variances ๐•Ž[๐‘Œ๐‘—] = ๐œ2

๐‘— .

  • Scaled and centered:

๐‘๐‘œ = โˆ‘๐‘œ

๐‘—=1 ๐‘Œ๐‘— โˆ’ โˆ‘๐‘œ ๐‘—=1 ๐œˆ๐‘—

(โˆ‘๐‘œ

๐‘—=1 ๐œ2 ๐‘— ) 1/2

โ–ถ No need to divide by ๐‘œ because there are ๐‘œ entries in the sum

โˆ‘๐‘œ

๐‘—=1 ๐œˆ๐‘—

  • Easy to show that ๐”ฝ[๐‘๐‘œ] = 0 and ๐•Ž[๐‘๐‘œ] = 1. Does the CLT

apply?

55 / 60

slide-56
SLIDE 56

Liapounov CLT

Liapounov CLT

Suppose that the r.v.s ๐‘Œ1, ๐‘Œ2, โ€ฆ are independent and that ๐”ฝ[|๐‘Œ๐‘— โˆ’ ๐œˆ๐‘—|3] < โˆž for ๐‘— = 1, 2, โ€ฆ. Also, suppose that lim

๐‘œโ†’โˆž

โˆ‘๐‘œ

๐‘—=1 ๐”ฝ [|๐‘Œ๐‘— โˆ’ ๐œˆ๐‘—|3]

(โˆ‘๐‘œ

๐‘—=1 ๐œ2 ๐‘— ) 3/2

= 0. Then, ๐‘๐‘œ = โˆ‘๐‘œ

๐‘—=1 ๐‘Œ๐‘— โˆ’ โˆ‘๐‘œ ๐‘—=1 ๐œˆ๐‘—

(โˆ‘๐‘œ

๐‘—=1 ๐œ2 ๐‘— ) 1/2 ๐‘’

โ†’ ๐‘‚(0, 1)

  • Key condition: there isnโ€™t one r.v.s in the sequence that is

โ€œtoo bigโ€ that could dominate the sum

56 / 60

slide-57
SLIDE 57

CLT for dependent sequences

  • We have shown the CLT for i.i.d. and for independent r.v.s.

What about dependent sequences?

  • CLT works for a dependent sequence ๐‘Œ1, ๐‘Œ2, โ€ฆ.

โ–ถ What does dependent sequence mean? Cov[๐‘Œ๐‘—, ๐‘Œ๐‘˜] โ‰  0

  • Key condition for dependent CLT: r.v.s arenโ€™t โ€œtoo correlatedโ€
  • Overall conditions for CLT to hold: the sum/mean of many,

not too correlated, not too big r.v.s

57 / 60

slide-58
SLIDE 58

6/ Wrap-up

58 / 60

slide-59
SLIDE 59

Limitations of asymptotics

  • These results are practically and theoretically very useful.
  • But remember that they are approximations
  • We donโ€™t live in asymptopiaโ€”๐‘œ is always fjnite.
  • Asymptotics often give reasonable answers, but you can check

with simulations.

59 / 60

slide-60
SLIDE 60

Review

  • Sums and means of r.v.s are themselves r.v.s
  • Learned about the distribution of the sample mean of i.i.d.

r.v.s

โ–ถ Expectation ๐”ฝ[๐‘Œ๐‘œ] = ๐œˆ โ–ถ Variance ๐•Ž[๐‘Œ๐‘œ] = ๐œ2/๐‘œ โ–ถ Converges in probability to true mean (LLN) โ–ถ Converges in distribution to a normal distribution (CLT)

  • Ahead: generalizing these ideas to arbitrary estimators of

parameters.

60 / 60