gov 2000 5 estimation and statistical inference
play

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell - PowerPoint PPT Presentation

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56 1. Point Estimation 2. Properties of Estimators 3. Interval Estimation 4. Where Do Estimators Come From?* 5. Wrap up 2 / 56 check it out. Housekeeping


  1. Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56

  2. 1. Point Estimation 2. Properties of Estimators 3. Interval Estimation 4. Where Do Estimators Come From?* 5. Wrap up 2 / 56

  3. check it out. Housekeeping course this week. 3 / 56 • This Thursday, 10/6: HW 3 due, HW 4 goes out. • Next Thursday, 10/13: HW 4 due, HW 5 goes out. • Thursday, 10/20: HW 5 due, Midterm available. • Midterm: ▶ Check-out exam: you have 8 hours to complete it once you ▶ Answers must be typeset, as usual. ▶ You should have more than enough time. ▶ We’ll post practice midterms in advance. • Evaluations: we’ll be fjelding an anonymous survey about the

  4. Where are we? Where are we going? with real data. can use it as a best guess for 𝜈 ? 4 / 56 • Last few weeks: probability, learning how to think about r.v.s • Now: how to estimate features of underlying distributions • Build on last week: if the sample mean will be “close” to 𝜈 ,

  5. 1/ Point Estimation 5 / 56

  6. Motivating example 6 / 56 • Gerber, Green, and Larimer (APSR, 2008)

  7. Motivating Example load("../data/gerber_green_larimer.RData") ## turn turnout variable into a numeric "Neighbors"]) neigh.mean ## [1] 0.378 contr.mean <- mean(social$voted[social$treatment == "Civic Duty"]) contr.mean ## [1] 0.315 neigh.mean - contr.mean ## [1] 0.0634 7 / 56 social$voted <- 1 * (social$voted == "Yes") neigh.mean <- mean(social$voted[social$treatment == • Is this a “real”? Is it big?

  8. ( 𝑍 − 𝑌 )? Why study estimators? estimate the difgerence among two subsets of the data (male and female, for instance) and then take the weighted average of the two ( 𝑎 is the share of women): 8 / 56 • Goal 1: Inference ▶ What is our best guess about some quantity of interest? ▶ What are a set of plausible values of the quantity of interest? • Goal 2: Compare estimators ▶ In an experiment, use simple difgerence in sample means ▶ Or the post-stratifjcation estimator, where we estimate the (𝑍 𝑔 − 𝑌 𝑔 )𝑎 + (𝑍 𝑛 − 𝑌 𝑛 )(1 − 𝑎) ▶ Which (if either) is better? How would we know?

  9. Samples from the population population. 9 / 56 • Our focus: 𝑍 1 , … , 𝑍 𝑜 are i.i.d. draws from 𝑔 (𝑧) ▶ e.g.: 𝑍 𝑗 = 1 if citizen 𝑗 votes, 𝑍 𝑗 = 0 otherwise. ▶ i.i.d. can be justifjed through random sampling from a ▶ 𝑔 (𝑧) is often called the population distribution • Statistical inference or learning is using data to infer 𝑔 (𝑧) .

  10. Point estimation value of some fjxed, unknown quantity of interest, 𝜄 . between two groups. (regression). 10 / 56 • Point estimation: providing a single “best guess” as to the ▶ 𝜄 is a feature of the population distribution, 𝑔 (𝑧) ▶ Also called: estimands, parameters. • Examples of quantities of interest: ▶ 𝜈 = 𝔽[𝑍 𝑗 ] : the mean (turnout rate in the population). ▶ 𝜏 2 = 𝕎[𝑍 𝑗 ] : the variance. ▶ 𝜈 𝑧 − 𝜈 𝑦 = 𝔽[𝑍] − 𝔽[𝑌] : the difgerence in mean turnout ▶ 𝑠(𝑦) = 𝔽[𝑍|𝑌 = 𝑦] : the conditional expectation function • These are the things we want to learn about.

  11. Estimators ̂ convergence in probability/distribution. 𝜄 2 , …} is a sequence of r.v.s, so we can think about ̂ Estimator 11 / 56 • ̂ ̂ An estimator, 𝜄 𝑜 of some parameter 𝜄 , is a function of the sample: 𝜄 𝑜 = ℎ(𝑍 1 , … , 𝑍 𝑜 ) . 𝜄 𝑜 is a r.v. because it is a function of r.v.s. ▶ ⇝ 𝜄 𝑜 has a distribution. ▶ { ̂ 𝜄 1 , ̂ • An estimate is one particular realization of the estimator/r.v.

  12. Examples of Estimators possible estimators: ▶ ̂ ▶ ̂ ▶ ̂ ▶ ̂ 12 / 56 • For the population expectation, 𝜈 , we have many difgerent 𝜄 𝑜 = 𝑍 𝑜 the sample mean 𝜄 𝑜 = 𝑍 1 just use the fjrst observation 𝜄 𝑜 = max (𝑍 1 , … , 𝑍 𝑜 ) 𝜄 𝑜 = 3 always guess 3

  13. Understanding check estimate was the sample mean and my estimator was 0.38”? 13 / 56 • Question Why is the following statement wrong: “My

  14. The three distributions example) repeated samples from the population distribution from this distribution 14 / 56 • Population Distribution: the data-generating process ▶ Bernoulli in the case of the social pressure/voter turnout • Empirical distribution: 𝑍 1 , … , 𝑍 𝑜 ▶ series of 1s and 0s in the sample • Sampling distribution: distribution of the estimator over ▶ the 0.38 sample mean in the “Neighbors” group is one draw

  15. Sampling distribution, in pictures 𝑜 1 , … , 𝑍 𝑙−1 𝑜 } ̂ 𝜄 𝑙−1 {𝑍 𝑙 ⋮ 1 , … , 𝑍 𝑙 𝑜 } ̂ 𝜄 𝑙 𝑜 sampling distribution {𝑍 𝑙−1 ⋮ 𝑔 (𝑧) 𝑜 population distribution ̂ 𝜄 𝑜 estimator {𝑍 1 ̂ 𝜄 1 𝑜 {𝑍 2 ̂ 𝜄 2 15 / 56 1 , … , 𝑍 1 𝑜 } 1 , … , 𝑍 2 𝑜 }

  16. Sampling distribution ## now we take the mean of one sample, which is one ## draw from the **sampling distribution** mean(my.samp) ## [1] 0.2 ## let's take another draw from the population dist my.samp.2 <- rbinom(n = 10, size = 1, prob = 0.4) ## Let's feed this sample to the sample mean ## estimator to get another estimate, which is ## another draw from the sampling distribution mean(my.samp.2) ## [1] 0.4 16 / 56 my.samp <- rbinom(n = 10, size = 1, prob = 0.4)

  17. Sampling distribution by simulation the sample mean here when 𝑜 = 100 . nsims <- 10000 mean.holder <- rep(NA, times = nsims) mean.holder[i] <- mean(my.samp) ## sample mean first.holder[i] <- my.samp[1] ## first obs } 17 / 56 • Let’s generate 10,000 draws from the sampling distribution of for (i in 1:nsims) { my.samp <- rbinom(n = 100, size = 1, prob = 0.4)

  18. Sampling distribution versus population distribution 18 / 56 5000 Population Distribution Frequency 3000 Sampling Distribution 1000 0 0.0 0.2 0.4 0.6 0.8 1.0

  19. Question The sampling distribution refers to the distribution of 𝜄 , true or false. 19 / 56

  20. 2/ Properties of Estimators 20 / 56

  21. Properties of estimators ̂ 𝜄 𝑜 . true value. fjxed sample size 𝑜 . let 𝑜 → ∞ . 21 / 56 • We only get one draw from the sampling distribution, • Want to use estimators whose distribution is “close” to the • There are two ways we evaluate estimators: ▶ Finite sample: the properties of its sampling distribution for a ▶ Large sample: the properties of the sampling distribution as we

  22. Running example 𝑧 𝑦 ̂ 22 / 56 • Two independent random samples (treatment/control): ▶ 𝑍 1 , … , 𝑍 𝑜 𝑧 are i.i.d. with mean 𝜈 𝑧 and variance 𝜏 2 ▶ 𝑌 1 , … , 𝑌 𝑜 𝑦 are i.i.d. with mean 𝜈 𝑦 and variance 𝜏 2 ▶ Overall sample size 𝑜 = 𝑜 𝑧 + 𝑜 𝑦 • Parameter is the population difgerence in means, which is the treatment efgect of the social pressure mailer: 𝜈 𝑧 − 𝜈 𝑦 • Estimator is the difgerence in sample means: 𝐸 𝑜 = 𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦

  23. Finite-sample properties ▶ 𝜄 𝑜 ] 𝜄 𝑜 ] . Let ̂ 𝜄 𝑜 ] = 0 ̂ 𝜄 𝑜 ] − 𝜄 23 / 56 𝜄 𝑜 be a estimator of 𝜄 . Then we have the following defjnitions: • bias [ ̂ 𝜄 𝑜 ] = 𝔽[ ̂ 𝜄 𝑜 is unbiased if bias [ ̂ ▶ Last week: 𝑌 𝑜 is unbiased for 𝜈 since 𝔽[𝑌 𝑜 ] = 𝜈 • Sampling variance is 𝕎[ ̂ ▶ Example: 𝕎[𝑌 𝑜 ] = 𝜏 2 /𝑜 • Standard error is se [ ̂ 𝜄 𝑜 ] = √𝕎[ ̂ ▶ Example: se [𝑌 𝑜 ] = 𝜏/√𝑜

  24. Diff-in-means finite sample 𝑦 𝑜 𝑦 𝑦 𝑜 𝑧 𝑧 se [̂ properites 𝑜 𝑦 𝑜 𝑧 𝑧 24 / 56 • Unbiasedness from unbiasedness of sample means: 𝔽[𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦 ] = 𝔽[𝑍 𝑜 𝑧 ] − 𝔽[𝑌 𝑜 𝑦 ] = 𝜈 𝑧 − 𝜈 𝑦 • Sampling variance, by independent samples: 𝕎[𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦 ] = 𝕎[𝑍 𝑜 𝑧 ] + 𝕎[𝑌 𝑜 𝑦 ] = 𝜏 2 + 𝜏 2 • Standard error: + 𝜏 2 𝐸 𝑜 ] = √𝜏 2

  25. 𝜄 𝑜 ] Mean squared error lower overall MSE. 25 / 56 • Mean squared error or MSE is MSE = 𝔽[( ̂ 𝜄 𝑜 − 𝜄) 2 ] • The MSE assesses the quality of an estimator. ▶ How big are (squared) deviations from the true parameter? ▶ Ideally, this would be as low as possible! • Useful decomposition result: 𝜄 𝑜 ] 2 + 𝕎[ ̂ MSE = bias [ ̂ • ⇝ for unbiased estimators, MSE is the sampling variance. • Might accept some bias for large reductions in variance for

  26. Consistency answers! 𝐸 𝑜 ] → 0 𝑜 𝑦 𝑦 𝑧 is consistent. 𝜄 𝑜 𝜄 𝑜 ] → 0 as 𝑜 → ∞ , then ̂ 26 / 56 ̂ 𝜄 𝑜 ̂ 𝑞 → 𝜄 . • An estimator is consistent if ▶ Distribution of 𝜄 𝑜 collapses on 𝜄 as 𝑜 → ∞ . ▶ WLLN: 𝑌 𝑜 is consistent for 𝜈 . ▶ Inconsistent estimator are bad bad bad: more data gives worse • Theorem: If bias [ ̂ 𝜄 𝑜 ] → 0 and se [ ̂ • Example: Difgerence-in-means. 𝐸 𝑜 ] = 𝜏 2 𝑜 𝑧 + 𝜏 2 ▶ ̂ 𝐸 𝑜 is unbiased with 𝕎[̂ ▶ ⇝ ̂ 𝐸 𝑜 consistent since 𝕎[̂ • NB: Unbiasedness does not imply consistency, nor vice versa.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend