Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell - PowerPoint PPT Presentation

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60

1. Sums and Means of Random Variables 2. Useful Inequalities 3. Law of Large Numbers 4. Central Limit Theorem 5. More Exotic CLTs* 6. Wrap-up 2 / 60

Where are we? Where are we going? outcomes/random variables. samples 3 / 60 • Probability: formal way to quantify uncertain • Last week: how to work with multiple r.v.s at the same time. • This week: applying those ideas to study large random

Large random samples variable: 𝑌 1 , 𝑌 2 , … , 𝑌 𝑜 (𝑌 1 , 𝑍 1 ), (𝑌 2 , 𝑍 2 ), … , (𝑌 𝑜 , 𝑍 𝑜 ) 4 / 60 • In real data, we will have a set of 𝑜 measurements on a • Or we might have a set of 𝑜 measurements on two variables: • Empirical analyses: sums or means of these 𝑜 measurements ▶ Almost all statistical procedures involve a sum/mean. ▶ What are the properties of these sums and means? ▶ Can they tell us anything about the distribution of 𝑌 𝑗 ? • Asymptotics: what can we learn as 𝑜 gets big?

1/ Sums and Means of Random Variables 5 / 60

Sums and means are random variables ̅ 2 6 / 60 • If 𝑌 1 and 𝑌 2 are r.v.s, then 𝑌 1 + 𝑌 2 is a r.v. ▶ Has a mean 𝔽[𝑌 1 + 𝑌 2 ] and a variance 𝕎[𝑌 1 + 𝑌 2 ] • The sample mean is a function of sums and so it is a r.v. too: 𝑌 = 𝑌 1 + 𝑌 2

Distribution of sums/means ⋮ draw 4 3 58 61 30.5 ⋮ ⋮ 134 ⋮ ⋮ distribution of the sum distribution of the mean 67 75 𝑌 1 59 𝑌 2 ̅ 𝑌 draw 1 20 71 91 45.5 draw 2 12 66 78 39 draw 3 7 / 60 𝑌 1 + 𝑌 2

Independent and identical r.v.s distributed r.v.s, 𝑌 1 , … , 𝑌 𝑜 8 / 60 • We often will work with independent and identically ▶ Random sample of 𝑜 respondents on a survey question. ▶ Written “i.i.d.” • Independent: 𝑌 𝑗 ⟂ ⟂ 𝑌 𝑘 for all 𝑗 ≠ 𝑘 • Identically distributed: 𝑔 𝑌 𝑗 (𝑦) is the same for all 𝑗 ▶ 𝔽[𝑌 𝑗 ] = 𝜈 for all 𝑗 ▶ 𝕎[𝑌 𝑗 ] = 𝜏 2 for all 𝑗

Distribution of the sample mean 9 / 60 • Sample mean of i.i.d. r.v.s: 𝑌 𝑜 = 1 𝑜 ∑ 𝑜 𝑗=1 𝑌 𝑗 • 𝑌 𝑜 is a random variable, what is its distribution? ▶ What is the expectation of this distribution, 𝔽[𝑌 𝑜 ] ? ▶ What is the variance of this distribution, 𝕎[𝑌 𝑜 ] ? ▶ What is the p.d.f. of the distribution? • How do they relate to the expectation, variance of 𝑌 1 , … , 𝑌 𝑜 ?

Properties of the sample mean Mean and variance of the sample mean 𝕎[𝑌 𝑗 ] = 𝜏 2 . Then: 𝔽[𝑌 𝑜 ] = 𝜈 𝑜 size √𝑜 10 / 60 Suppose that 𝑌 1 , … , 𝑌 𝑜 is are i.i.d. r.v.s with 𝔽[𝑌 𝑗 ] = 𝜈 and 𝕎[𝑌 𝑜 ] = 𝜏 2 • Key insights: ▶ Sample mean get the right answer on average ▶ Variance of 𝑌 𝑜 depends on the variance of 𝑌 𝑗 and the sample ▶ Not dependent on the (full) distribution of 𝑌 𝑗 ! • Standard error of the sample mean: √𝕎[𝑌 𝑜 ] = 𝜏 • You’ll prove both of these facts in this week’s HW.

2/ Useful Inequalities 11 / 60

Why inequalities? don’t know (or don’t want to assume) a distribution. subject to some restrictions like fjnite variance. 12 / 60 • Behavior of r.v.s depend on their distribution, but we often • Today, we’ll discuss results for r.v.s with any distribution • Why study these? ▶ Build toward massively important results like LLN ▶ Inequalities used regularly throughout statistics ▶ Gives us some practice with proofs/analytic reasoning

Markov Inequality Markov Inequality Suppose that 𝑌 is r.v. such that ℙ(𝑌 ≥ 0) = 1 . Then, for every real number 𝑢 > 0 , 𝑢 . ℙ(𝑌 ≥ 100) ≤ 0.01 probability can be in the tail. 13 / 60 ℙ(𝑌 ≥ 𝑢) ≤ 𝔽[𝑌] • For instance, if we know that 𝔽[𝑌] = 1 , then • Once we know the mean of a r.v., it limits how much

Markov Inequality Proof 𝔽[𝑌] = ∑ 𝑦 𝑦𝑔 𝑌 (𝑦) = ∑ 𝑦<𝑢 𝑦𝑔 𝑌 (𝑦) + ∑ 𝑦≥𝑢 𝑦𝑔 𝑌 (𝑦) 14 / 60 • For discrete 𝑌 : • Because 𝑌 is nonnegative, 𝔽[𝑌] ≥ ∑ 𝑦≥𝑢 𝑦𝑔 𝑌 (𝑦) • Since 𝑦 ≥ 𝑢 , then ∑ 𝑦≥𝑢 𝑦𝑔 𝑌 (𝑦) ≥ ∑ 𝑦≥𝑢 𝑢𝑔 𝑌 (𝑦) • But this is just ∑ 𝑦≥𝑢 𝑢𝑔 𝑌 (𝑦) = 𝑢 ∑ 𝑦≥𝑢 𝑔 𝑌 (𝑦) = 𝑢ℙ(𝑌 ≥ 𝑢) • Implies 𝔽[𝑌] ≥ 𝑢ℙ(𝑌 ≥ 𝑢)

Chebyshev Inequality Chebyshev Inequality Suppose that 𝑌 is r.v. for which 𝕎[𝑌] < ∞ . Then, for every real number 𝑢 > 0 , 𝑢 2 . from its mean. 15 / 60 ℙ(|𝑌 − 𝔽[𝑌]| ≥ 𝑢) ≤ 𝕎[𝑌] • The variance places limits on how far an observation can be

Proof of Chebyshev squared both sides. 𝑢 2 𝑢 2 16 / 60 • Let 𝑍 = (𝑌 − 𝔽[𝑌]) 2 ▶ ⇝ ℙ(𝑍 ≥ 0) = 1 (nonnegative) ▶ 𝔽[𝑍] = 𝔽[(𝑌 − 𝔽[𝑌]) 2 ] = 𝕎[𝑌] (defjnition of variance) • Note that if |𝑌 − 𝔽[𝑌]| ≥ 𝑢 then 𝑍 ≥ 𝑢 2 because we just • Thus, ℙ(|𝑌 − 𝔽[𝑌]| ≥ 𝑢) = ℙ(𝑍 ≥ 𝑢 2 ) • Apply Markov’s inequality: ℙ(|𝑌 − 𝔽[𝑌]| ≥ 𝑢) = ℙ(𝑍 ≥ 𝑢 2 ) ≤ 𝔽[𝑍] = 𝕎[𝑌]

Application: planning a survey vote for Donald Trump, 𝑞 , from a random sample of size 𝑜 . respondent. 𝑜 17 / 60 • Suppose we want to estimate the proportion of voters who will ▶ 𝑌 1 , 𝑌 2 , … , 𝑌 𝑜 indicating voting intention for Trump for each ▶ By our earlier, calculation, 𝔽[𝑌 𝑜 ] = 𝑞 and 𝕎[𝑌 𝑜 ] = 𝜏 2 ▶ Since this is a Bernoulli r.v., we have 𝜏 2 = 𝑞(1 − 𝑞) • What does 𝑜 need to be to have at least 0.95 probability that 𝑌 𝑜 is within 0.02 of the true 𝑞 ? ▶ How to guarantee a margin of error of ± 2 percentage points?

Application: planning a survey 0.0004𝑜 (1/0.0016𝑜) ≤ 0.05 , which gives us 𝑜 ≥ 12, 500 !! 0.0016𝑜 1 18 / 60 • What does 𝑜 have to be so that ℙ(|𝑌 𝑜 − 𝑞| ≤ 0.02) ≥ 0.95 ⟺ ℙ(|𝑌 𝑜 − 𝑞| ≥ 0.02) ≤ 0.05 • Applying Chebyshev: ℙ(|𝑌 𝑜 − 𝑞| ≥ 0.02) ≤ 𝕎[𝑌 𝑜 ] 0.02 2 = 𝑞(1 − 𝑞) • We don’t know 𝕎[𝑌 𝑗 ] = 𝑞(1 − 𝑞) , but: ▶ Conservative to use largest possible variance. ▶ It can’t be bigger than 𝑞(1 − 𝑞) ≤ (1/2) ⋅ (1/2) = (1/4) ℙ(|𝑌 𝑜 − 𝑞| ≥ 0.02) ≤ 𝑞(1 − 𝑞) 0.0004𝑜 ≤ • We want this probability to be bounded by 0.05 so we need

Application: planning a survey percentage points? but actual probabilities are much smaller. and show the distribution of the means. 19 / 60 • Do we really need 𝑜 ≥ 12, 500 to get a margin of error of ±2 • No! Chebyshev provides a bound that is guaranteed to hold, ▶ We’re also using the “worst-case” variance of 0.25. • Let’s simulate 1000 samples of size 𝑜 = 12500 with 𝑞 = 0.4 ▶ What proportion of these are within 0.02 of 𝑞 ?

Application: planning a survey } nsims <- 1000 ## [1] 0 mean(abs(holder - 0.4) > 0.02) 20 / 60 holder <- rep(NA, times = nsims) for (i in 1:nsims) { this.samp <- rbinom(n = 12500, size = 1, prob = 0.4) holder[i] <- mean(this.samp) 80 60 Density 40 20 0 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 x n − p

3/ Law of Large Numbers 21 / 60

Current knowledge know that: 22 / 60 • For i.i.d. r.v.s, 𝑌 1 , … , 𝑌 𝑜 , with 𝔽[𝑌 𝑗 ] = 𝜈 and 𝕎[𝑌 𝑗 ] = 𝜏 2 we ▶ Expectation is 𝔽[𝑌 𝑜 ] = 𝔽[𝑌 𝑗 ] = 𝜈 ▶ Variance is 𝕎[𝑌 𝑜 ] = 𝜏 2 𝑜 where 𝜏 2 = 𝕎[𝑌 𝑗 ] ▶ Some bounds on tail probabilities from Chebyshev. ▶ None of these rely on a specifjc distribution for 𝑌 𝑗 ! • Can we say more about the distribution of the sample mean? • Yes, but we need to think about how 𝑌 𝑜 changes as 𝑜 gets big.

Sequence of sample means increasing 𝑜 : ⋮ 23 / 60 • What can we say about the sample mean 𝑜 gets large? • Need to think about sequences of sample means with 𝑌 1 = 𝑌 1 𝑌 2 = (1/2) ⋅ (𝑌 1 + 𝑌 2 ) 𝑌 3 = (1/3) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 ) 𝑌 4 = (1/4) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 + 𝑌 4 ) 𝑌 5 = (1/5) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 + 𝑌 4 + 𝑌 5 ) 𝑌 𝑜 = (1/𝑜) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 + 𝑌 4 + 𝑌 5 + ⋯ + 𝑌 𝑜 ) • Note: this is a sequence of random variables!

Convergence in Probability Convergence in probability A sequence of random variables, 𝑎 1 , 𝑎 2 , … , is said to converge in probability to a value 𝑐 if for every 𝜁 > 0 , as 𝑜 → ∞ . We write this 𝑎 𝑜 𝑞 → 𝑐 . interval around 𝑐 approaches 0 as 𝑜 → ∞ 𝑞 → 𝑐 . 24 / 60 ℙ(|𝑎 𝑜 − 𝑐| > 𝜁) → 0, • Basically: probability that 𝑎 𝑜 lies outside any (teeny, tiny) • Wooldridge writes plim (𝑎 𝑜 ) = 𝑐 if 𝑎 𝑜

Law of large numbers Theorem: Weak Law of Large Numbers 𝑞 → 𝜈 . to 0 as 𝑜 gets big. a fjnite variance! 25 / 60 Let 𝑌 1 , … , 𝑌 𝑜 be a an i.i.d. draws from a distribution with mean 𝜈 and fjnite variance 𝜏 2 . Let 𝑌 𝑜 = 1 𝑜 ∑ 𝑜 𝑗=1 𝑌 𝑗 . Then, 𝑌 𝑜 • Intuition: The probability of 𝑌 𝑜 being “far away” from 𝜈 goes ▶ The distribution of 𝑌 𝑜 “collapses” on 𝜈 • No assumptions about the distribution of 𝑌 𝑗 beyond i.i.d. and

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell - PowerPoint PPT Presentation

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60 1. Sums and Means of Random Variables 2. Useful Inequalities 3. Law of Large Numbers 4. Central Limit Theorem 5. More Exotic CLTs* 6. Wrap-up 2 / 60 Where

Dedekind sums ingredients Dedekind sums Fourier- Dedekind sums Restricted partition Mirco

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums A: 3 1 1 7 2 5

Limit theorems for BSDE with local time applications to nonlinear PDE Mhamed Eddahbi

Math 211 Math 211 Lecture #39 Limit Sets April 25, 2001 2 Limit Sets Limit Sets The

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Cool theorems proved by undergraduates Ken Ono Emory University Cool theorems proved by

4th Quarter 2000 4th Quarter 2000 November 28, 2000 November 28, 2000 Investor Community

Limit theorems and statistical inference for ergodic solutions of L evy driven SDEs Alexei

limit theorems Consider random variables X 1 + X 2 + . . . + X n and n 1 X X i n i =1 Law of

Discrete rough paths and limit theorems Samy Tindel Purdue University Durham Symposium 2017

Limit theorems for adaptive MCMC algorithms Gersende FORT LTCI CNRS - TELECOM ParisTech In

Wild fires 1950 1950 2000 2000 250 1950 1950 2000 2000 30 40 50 20 10 0 350 200

Many Theorems and a Few Stories John Garnett Seoul 5/12/17 1 OUTLINE I. Extension Theorems.

Sylow Theorems Andrew Clarey Looking at the Structure of Arbitrary Groups Definitions/ Theorems

Algorithmic Meta-Theorems for Restrictions of Treewidth Michael Lampis Computer Science Dept.

Central Limit Theorem Learning Objectives At the end of this lecture, the student should be able

Ce Central Li Limit The heorem number of measurements needed to estimate the mean

CS70: Jean Walrand: Lecture 27. Continuous Probability Normal Distribution. For any and , a

Precision, Accuracy, Standard Error & the Central Limit Theorem He uses statistics as a

Introductory Statistics Day 17 Sampling distributions and The Central Limit Theorem Facial

Convergence of discrete random processes DS GA 1002 Statistical and Mathematical Models

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Statistics and Data Analysis Distributions and Sampling (2) Ling-Chieh Kung Department of

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell - PowerPoint PPT Presentation

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60 1. Sums and Means of Random Variables 2. Useful Inequalities 3. Law of Large Numbers 4. Central Limit Theorem 5. More Exotic CLTs* 6. Wrap-up 2 / 60 Where

Dedekind sums ingredients Dedekind sums Fourier- Dedekind sums Restricted partition Mirco

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums A: 3 1 1 7 2 5

Limit theorems for BSDE with local time applications to nonlinear PDE Mhamed Eddahbi

Math 211 Math 211 Lecture #39 Limit Sets April 25, 2001 2 Limit Sets Limit Sets The

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Cool theorems proved by undergraduates Ken Ono Emory University Cool theorems proved by

4th Quarter 2000 4th Quarter 2000 November 28, 2000 November 28, 2000 Investor Community

Limit theorems and statistical inference for ergodic solutions of L evy driven SDEs Alexei

limit theorems Consider random variables X 1 + X 2 + . . . + X n and n 1 X X i n i =1 Law of

Discrete rough paths and limit theorems Samy Tindel Purdue University Durham Symposium 2017

Limit theorems for adaptive MCMC algorithms Gersende FORT LTCI CNRS - TELECOM ParisTech In

Wild fires 1950 1950 2000 2000 250 1950 1950 2000 2000 30 40 50 20 10 0 350 200

Many Theorems and a Few Stories John Garnett Seoul 5/12/17 1 OUTLINE I. Extension Theorems.

Sylow Theorems Andrew Clarey Looking at the Structure of Arbitrary Groups Definitions/ Theorems

Algorithmic Meta-Theorems for Restrictions of Treewidth Michael Lampis Computer Science Dept.

Central Limit Theorem Learning Objectives At the end of this lecture, the student should be able

Ce Central Li Limit The heorem number of measurements needed to estimate the mean

CS70: Jean Walrand: Lecture 27. Continuous Probability Normal Distribution. For any and , a

Precision, Accuracy, Standard Error &amp; the Central Limit Theorem He uses statistics as a

Introductory Statistics Day 17 Sampling distributions and The Central Limit Theorem Facial

Convergence of discrete random processes DS GA 1002 Statistical and Mathematical Models

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Statistics and Data Analysis Distributions and Sampling (2) Ling-Chieh Kung Department of

Precision, Accuracy, Standard Error & the Central Limit Theorem He uses statistics as a