gov 2000 4 sums means and limit theorems
play

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell - PowerPoint PPT Presentation

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60 1. Sums and Means of Random Variables 2. Useful Inequalities 3. Law of Large Numbers 4. Central Limit Theorem 5. More Exotic CLTs* 6. Wrap-up 2 / 60 Where


  1. Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60

  2. 1. Sums and Means of Random Variables 2. Useful Inequalities 3. Law of Large Numbers 4. Central Limit Theorem 5. More Exotic CLTs* 6. Wrap-up 2 / 60

  3. Where are we? Where are we going? outcomes/random variables. samples 3 / 60 • Probability: formal way to quantify uncertain • Last week: how to work with multiple r.v.s at the same time. • This week: applying those ideas to study large random

  4. Large random samples variable: 𝑌 1 , 𝑌 2 , … , 𝑌 𝑜 (𝑌 1 , 𝑍 1 ), (𝑌 2 , 𝑍 2 ), … , (𝑌 𝑜 , 𝑍 𝑜 ) 4 / 60 • In real data, we will have a set of 𝑜 measurements on a • Or we might have a set of 𝑜 measurements on two variables: • Empirical analyses: sums or means of these 𝑜 measurements ▶ Almost all statistical procedures involve a sum/mean. ▶ What are the properties of these sums and means? ▶ Can they tell us anything about the distribution of 𝑌 𝑗 ? • Asymptotics: what can we learn as 𝑜 gets big?

  5. 1/ Sums and Means of Random Variables 5 / 60

  6. Sums and means are random variables ̅ 2 6 / 60 • If 𝑌 1 and 𝑌 2 are r.v.s, then 𝑌 1 + 𝑌 2 is a r.v. ▶ Has a mean 𝔽[𝑌 1 + 𝑌 2 ] and a variance 𝕎[𝑌 1 + 𝑌 2 ] • The sample mean is a function of sums and so it is a r.v. too: 𝑌 = 𝑌 1 + 𝑌 2

  7. Distribution of sums/means ⋮ draw 4 3 58 61 30.5 ⋮ ⋮ 134 ⋮ ⋮ distribution of the sum distribution of the mean 67 75 𝑌 1 59 𝑌 2 ̅ 𝑌 draw 1 20 71 91 45.5 draw 2 12 66 78 39 draw 3 7 / 60 𝑌 1 + 𝑌 2

  8. Independent and identical r.v.s distributed r.v.s, 𝑌 1 , … , 𝑌 𝑜 8 / 60 • We often will work with independent and identically ▶ Random sample of 𝑜 respondents on a survey question. ▶ Written “i.i.d.” • Independent: 𝑌 𝑗 ⟂ ⟂ 𝑌 𝑘 for all 𝑗 ≠ 𝑘 • Identically distributed: 𝑔 𝑌 𝑗 (𝑦) is the same for all 𝑗 ▶ 𝔽[𝑌 𝑗 ] = 𝜈 for all 𝑗 ▶ 𝕎[𝑌 𝑗 ] = 𝜏 2 for all 𝑗

  9. Distribution of the sample mean 9 / 60 • Sample mean of i.i.d. r.v.s: 𝑌 𝑜 = 1 𝑜 ∑ 𝑜 𝑗=1 𝑌 𝑗 • 𝑌 𝑜 is a random variable, what is its distribution? ▶ What is the expectation of this distribution, 𝔽[𝑌 𝑜 ] ? ▶ What is the variance of this distribution, 𝕎[𝑌 𝑜 ] ? ▶ What is the p.d.f. of the distribution? • How do they relate to the expectation, variance of 𝑌 1 , … , 𝑌 𝑜 ?

  10. Properties of the sample mean Mean and variance of the sample mean 𝕎[𝑌 𝑗 ] = 𝜏 2 . Then: 𝔽[𝑌 𝑜 ] = 𝜈 𝑜 size √𝑜 10 / 60 Suppose that 𝑌 1 , … , 𝑌 𝑜 is are i.i.d. r.v.s with 𝔽[𝑌 𝑗 ] = 𝜈 and 𝕎[𝑌 𝑜 ] = 𝜏 2 • Key insights: ▶ Sample mean get the right answer on average ▶ Variance of 𝑌 𝑜 depends on the variance of 𝑌 𝑗 and the sample ▶ Not dependent on the (full) distribution of 𝑌 𝑗 ! • Standard error of the sample mean: √𝕎[𝑌 𝑜 ] = 𝜏 • You’ll prove both of these facts in this week’s HW.

  11. 2/ Useful Inequalities 11 / 60

  12. Why inequalities? don’t know (or don’t want to assume) a distribution. subject to some restrictions like fjnite variance. 12 / 60 • Behavior of r.v.s depend on their distribution, but we often • Today, we’ll discuss results for r.v.s with any distribution • Why study these? ▶ Build toward massively important results like LLN ▶ Inequalities used regularly throughout statistics ▶ Gives us some practice with proofs/analytic reasoning

  13. Markov Inequality Markov Inequality Suppose that 𝑌 is r.v. such that ℙ(𝑌 ≥ 0) = 1 . Then, for every real number 𝑢 > 0 , 𝑢 . ℙ(𝑌 ≥ 100) ≤ 0.01 probability can be in the tail. 13 / 60 ℙ(𝑌 ≥ 𝑢) ≤ 𝔽[𝑌] • For instance, if we know that 𝔽[𝑌] = 1 , then • Once we know the mean of a r.v., it limits how much

  14. Markov Inequality Proof 𝔽[𝑌] = ∑ 𝑦 𝑦𝑔 𝑌 (𝑦) = ∑ 𝑦<𝑢 𝑦𝑔 𝑌 (𝑦) + ∑ 𝑦≥𝑢 𝑦𝑔 𝑌 (𝑦) 14 / 60 • For discrete 𝑌 : • Because 𝑌 is nonnegative, 𝔽[𝑌] ≥ ∑ 𝑦≥𝑢 𝑦𝑔 𝑌 (𝑦) • Since 𝑦 ≥ 𝑢 , then ∑ 𝑦≥𝑢 𝑦𝑔 𝑌 (𝑦) ≥ ∑ 𝑦≥𝑢 𝑢𝑔 𝑌 (𝑦) • But this is just ∑ 𝑦≥𝑢 𝑢𝑔 𝑌 (𝑦) = 𝑢 ∑ 𝑦≥𝑢 𝑔 𝑌 (𝑦) = 𝑢ℙ(𝑌 ≥ 𝑢) • Implies 𝔽[𝑌] ≥ 𝑢ℙ(𝑌 ≥ 𝑢)

  15. Chebyshev Inequality Chebyshev Inequality Suppose that 𝑌 is r.v. for which 𝕎[𝑌] < ∞ . Then, for every real number 𝑢 > 0 , 𝑢 2 . from its mean. 15 / 60 ℙ(|𝑌 − 𝔽[𝑌]| ≥ 𝑢) ≤ 𝕎[𝑌] • The variance places limits on how far an observation can be

  16. Proof of Chebyshev squared both sides. 𝑢 2 𝑢 2 16 / 60 • Let 𝑍 = (𝑌 − 𝔽[𝑌]) 2 ▶ ⇝ ℙ(𝑍 ≥ 0) = 1 (nonnegative) ▶ 𝔽[𝑍] = 𝔽[(𝑌 − 𝔽[𝑌]) 2 ] = 𝕎[𝑌] (defjnition of variance) • Note that if |𝑌 − 𝔽[𝑌]| ≥ 𝑢 then 𝑍 ≥ 𝑢 2 because we just • Thus, ℙ(|𝑌 − 𝔽[𝑌]| ≥ 𝑢) = ℙ(𝑍 ≥ 𝑢 2 ) • Apply Markov’s inequality: ℙ(|𝑌 − 𝔽[𝑌]| ≥ 𝑢) = ℙ(𝑍 ≥ 𝑢 2 ) ≤ 𝔽[𝑍] = 𝕎[𝑌]

  17. Application: planning a survey vote for Donald Trump, 𝑞 , from a random sample of size 𝑜 . respondent. 𝑜 17 / 60 • Suppose we want to estimate the proportion of voters who will ▶ 𝑌 1 , 𝑌 2 , … , 𝑌 𝑜 indicating voting intention for Trump for each ▶ By our earlier, calculation, 𝔽[𝑌 𝑜 ] = 𝑞 and 𝕎[𝑌 𝑜 ] = 𝜏 2 ▶ Since this is a Bernoulli r.v., we have 𝜏 2 = 𝑞(1 − 𝑞) • What does 𝑜 need to be to have at least 0.95 probability that 𝑌 𝑜 is within 0.02 of the true 𝑞 ? ▶ How to guarantee a margin of error of ± 2 percentage points?

  18. Application: planning a survey 0.0004𝑜 (1/0.0016𝑜) ≤ 0.05 , which gives us 𝑜 ≥ 12, 500 !! 0.0016𝑜 1 18 / 60 • What does 𝑜 have to be so that ℙ(|𝑌 𝑜 − 𝑞| ≤ 0.02) ≥ 0.95 ⟺ ℙ(|𝑌 𝑜 − 𝑞| ≥ 0.02) ≤ 0.05 • Applying Chebyshev: ℙ(|𝑌 𝑜 − 𝑞| ≥ 0.02) ≤ 𝕎[𝑌 𝑜 ] 0.02 2 = 𝑞(1 − 𝑞) • We don’t know 𝕎[𝑌 𝑗 ] = 𝑞(1 − 𝑞) , but: ▶ Conservative to use largest possible variance. ▶ It can’t be bigger than 𝑞(1 − 𝑞) ≤ (1/2) ⋅ (1/2) = (1/4) ℙ(|𝑌 𝑜 − 𝑞| ≥ 0.02) ≤ 𝑞(1 − 𝑞) 0.0004𝑜 ≤ • We want this probability to be bounded by 0.05 so we need

  19. Application: planning a survey percentage points? but actual probabilities are much smaller. and show the distribution of the means. 19 / 60 • Do we really need 𝑜 ≥ 12, 500 to get a margin of error of ±2 • No! Chebyshev provides a bound that is guaranteed to hold, ▶ We’re also using the “worst-case” variance of 0.25. • Let’s simulate 1000 samples of size 𝑜 = 12500 with 𝑞 = 0.4 ▶ What proportion of these are within 0.02 of 𝑞 ?

  20. Application: planning a survey } nsims <- 1000 ## [1] 0 mean(abs(holder - 0.4) > 0.02) 20 / 60 holder <- rep(NA, times = nsims) for (i in 1:nsims) { this.samp <- rbinom(n = 12500, size = 1, prob = 0.4) holder[i] <- mean(this.samp) 80 60 Density 40 20 0 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 x n − p

  21. 3/ Law of Large Numbers 21 / 60

  22. Current knowledge know that: 22 / 60 • For i.i.d. r.v.s, 𝑌 1 , … , 𝑌 𝑜 , with 𝔽[𝑌 𝑗 ] = 𝜈 and 𝕎[𝑌 𝑗 ] = 𝜏 2 we ▶ Expectation is 𝔽[𝑌 𝑜 ] = 𝔽[𝑌 𝑗 ] = 𝜈 ▶ Variance is 𝕎[𝑌 𝑜 ] = 𝜏 2 𝑜 where 𝜏 2 = 𝕎[𝑌 𝑗 ] ▶ Some bounds on tail probabilities from Chebyshev. ▶ None of these rely on a specifjc distribution for 𝑌 𝑗 ! • Can we say more about the distribution of the sample mean? • Yes, but we need to think about how 𝑌 𝑜 changes as 𝑜 gets big.

  23. Sequence of sample means increasing 𝑜 : ⋮ 23 / 60 • What can we say about the sample mean 𝑜 gets large? • Need to think about sequences of sample means with 𝑌 1 = 𝑌 1 𝑌 2 = (1/2) ⋅ (𝑌 1 + 𝑌 2 ) 𝑌 3 = (1/3) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 ) 𝑌 4 = (1/4) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 + 𝑌 4 ) 𝑌 5 = (1/5) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 + 𝑌 4 + 𝑌 5 ) 𝑌 𝑜 = (1/𝑜) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 + 𝑌 4 + 𝑌 5 + ⋯ + 𝑌 𝑜 ) • Note: this is a sequence of random variables!

  24. Convergence in Probability Convergence in probability A sequence of random variables, 𝑎 1 , 𝑎 2 , … , is said to converge in probability to a value 𝑐 if for every 𝜁 > 0 , as 𝑜 → ∞ . We write this 𝑎 𝑜 𝑞 → 𝑐 . interval around 𝑐 approaches 0 as 𝑜 → ∞ 𝑞 → 𝑐 . 24 / 60 ℙ(|𝑎 𝑜 − 𝑐| > 𝜁) → 0, • Basically: probability that 𝑎 𝑜 lies outside any (teeny, tiny) • Wooldridge writes plim (𝑎 𝑜 ) = 𝑐 if 𝑎 𝑜

  25. Law of large numbers Theorem: Weak Law of Large Numbers 𝑞 → 𝜈 . to 0 as 𝑜 gets big. a fjnite variance! 25 / 60 Let 𝑌 1 , … , 𝑌 𝑜 be a an i.i.d. draws from a distribution with mean 𝜈 and fjnite variance 𝜏 2 . Let 𝑌 𝑜 = 1 𝑜 ∑ 𝑜 𝑗=1 𝑌 𝑗 . Then, 𝑌 𝑜 • Intuition: The probability of 𝑌 𝑜 being “far away” from 𝜈 goes ▶ The distribution of 𝑌 𝑜 “collapses” on 𝜈 • No assumptions about the distribution of 𝑌 𝑗 beyond i.i.d. and

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend