Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell - PowerPoint PPT Presentation

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56

1. Point Estimation 2. Properties of Estimators 3. Interval Estimation 4. Where Do Estimators Come From?* 5. Wrap up 2 / 56

check it out. Housekeeping course this week. 3 / 56 • This Thursday, 10/6: HW 3 due, HW 4 goes out. • Next Thursday, 10/13: HW 4 due, HW 5 goes out. • Thursday, 10/20: HW 5 due, Midterm available. • Midterm: ▶ Check-out exam: you have 8 hours to complete it once you ▶ Answers must be typeset, as usual. ▶ You should have more than enough time. ▶ We’ll post practice midterms in advance. • Evaluations: we’ll be fjelding an anonymous survey about the

Where are we? Where are we going? with real data. can use it as a best guess for 𝜈 ? 4 / 56 • Last few weeks: probability, learning how to think about r.v.s • Now: how to estimate features of underlying distributions • Build on last week: if the sample mean will be “close” to 𝜈 ,

1/ Point Estimation 5 / 56

Motivating example 6 / 56 • Gerber, Green, and Larimer (APSR, 2008)

Motivating Example load("../data/gerber_green_larimer.RData") ## turn turnout variable into a numeric "Neighbors"]) neigh.mean ## [1] 0.378 contr.mean <- mean(social$voted[social$treatment == "Civic Duty"]) contr.mean ## [1] 0.315 neigh.mean - contr.mean ## [1] 0.0634 7 / 56 social$voted <- 1 * (social$voted == "Yes") neigh.mean <- mean(social$voted[social$treatment == • Is this a “real”? Is it big?

( 𝑍 − 𝑌 )? Why study estimators? estimate the difgerence among two subsets of the data (male and female, for instance) and then take the weighted average of the two ( 𝑎 is the share of women): 8 / 56 • Goal 1: Inference ▶ What is our best guess about some quantity of interest? ▶ What are a set of plausible values of the quantity of interest? • Goal 2: Compare estimators ▶ In an experiment, use simple difgerence in sample means ▶ Or the post-stratifjcation estimator, where we estimate the (𝑍 𝑔 − 𝑌 𝑔 )𝑎 + (𝑍 𝑛 − 𝑌 𝑛 )(1 − 𝑎) ▶ Which (if either) is better? How would we know?

Samples from the population population. 9 / 56 • Our focus: 𝑍 1 , … , 𝑍 𝑜 are i.i.d. draws from 𝑔 (𝑧) ▶ e.g.: 𝑍 𝑗 = 1 if citizen 𝑗 votes, 𝑍 𝑗 = 0 otherwise. ▶ i.i.d. can be justifjed through random sampling from a ▶ 𝑔 (𝑧) is often called the population distribution • Statistical inference or learning is using data to infer 𝑔 (𝑧) .

Point estimation value of some fjxed, unknown quantity of interest, 𝜄 . between two groups. (regression). 10 / 56 • Point estimation: providing a single “best guess” as to the ▶ 𝜄 is a feature of the population distribution, 𝑔 (𝑧) ▶ Also called: estimands, parameters. • Examples of quantities of interest: ▶ 𝜈 = 𝔽[𝑍 𝑗 ] : the mean (turnout rate in the population). ▶ 𝜏 2 = 𝕎[𝑍 𝑗 ] : the variance. ▶ 𝜈 𝑧 − 𝜈 𝑦 = 𝔽[𝑍] − 𝔽[𝑌] : the difgerence in mean turnout ▶ 𝑠(𝑦) = 𝔽[𝑍|𝑌 = 𝑦] : the conditional expectation function • These are the things we want to learn about.

Estimators ̂ convergence in probability/distribution. 𝜄 2 , …} is a sequence of r.v.s, so we can think about ̂ Estimator 11 / 56 • ̂ ̂ An estimator, 𝜄 𝑜 of some parameter 𝜄 , is a function of the sample: 𝜄 𝑜 = ℎ(𝑍 1 , … , 𝑍 𝑜 ) . 𝜄 𝑜 is a r.v. because it is a function of r.v.s. ▶ ⇝ 𝜄 𝑜 has a distribution. ▶ { ̂ 𝜄 1 , ̂ • An estimate is one particular realization of the estimator/r.v.

Examples of Estimators possible estimators: ▶ ̂ ▶ ̂ ▶ ̂ ▶ ̂ 12 / 56 • For the population expectation, 𝜈 , we have many difgerent 𝜄 𝑜 = 𝑍 𝑜 the sample mean 𝜄 𝑜 = 𝑍 1 just use the fjrst observation 𝜄 𝑜 = max (𝑍 1 , … , 𝑍 𝑜 ) 𝜄 𝑜 = 3 always guess 3

Understanding check estimate was the sample mean and my estimator was 0.38”? 13 / 56 • Question Why is the following statement wrong: “My

The three distributions example) repeated samples from the population distribution from this distribution 14 / 56 • Population Distribution: the data-generating process ▶ Bernoulli in the case of the social pressure/voter turnout • Empirical distribution: 𝑍 1 , … , 𝑍 𝑜 ▶ series of 1s and 0s in the sample • Sampling distribution: distribution of the estimator over ▶ the 0.38 sample mean in the “Neighbors” group is one draw

Sampling distribution, in pictures 𝑜 1 , … , 𝑍 𝑙−1 𝑜 } ̂ 𝜄 𝑙−1 {𝑍 𝑙 ⋮ 1 , … , 𝑍 𝑙 𝑜 } ̂ 𝜄 𝑙 𝑜 sampling distribution {𝑍 𝑙−1 ⋮ 𝑔 (𝑧) 𝑜 population distribution ̂ 𝜄 𝑜 estimator {𝑍 1 ̂ 𝜄 1 𝑜 {𝑍 2 ̂ 𝜄 2 15 / 56 1 , … , 𝑍 1 𝑜 } 1 , … , 𝑍 2 𝑜 }

Sampling distribution ## now we take the mean of one sample, which is one ## draw from the **sampling distribution** mean(my.samp) ## [1] 0.2 ## let's take another draw from the population dist my.samp.2 <- rbinom(n = 10, size = 1, prob = 0.4) ## Let's feed this sample to the sample mean ## estimator to get another estimate, which is ## another draw from the sampling distribution mean(my.samp.2) ## [1] 0.4 16 / 56 my.samp <- rbinom(n = 10, size = 1, prob = 0.4)

Sampling distribution by simulation the sample mean here when 𝑜 = 100 . nsims <- 10000 mean.holder <- rep(NA, times = nsims) mean.holder[i] <- mean(my.samp) ## sample mean first.holder[i] <- my.samp[1] ## first obs } 17 / 56 • Let’s generate 10,000 draws from the sampling distribution of for (i in 1:nsims) { my.samp <- rbinom(n = 100, size = 1, prob = 0.4)

Sampling distribution versus population distribution 18 / 56 5000 Population Distribution Frequency 3000 Sampling Distribution 1000 0 0.0 0.2 0.4 0.6 0.8 1.0

Question The sampling distribution refers to the distribution of 𝜄 , true or false. 19 / 56

2/ Properties of Estimators 20 / 56

Properties of estimators ̂ 𝜄 𝑜 . true value. fjxed sample size 𝑜 . let 𝑜 → ∞ . 21 / 56 • We only get one draw from the sampling distribution, • Want to use estimators whose distribution is “close” to the • There are two ways we evaluate estimators: ▶ Finite sample: the properties of its sampling distribution for a ▶ Large sample: the properties of the sampling distribution as we

Running example 𝑧 𝑦 ̂ 22 / 56 • Two independent random samples (treatment/control): ▶ 𝑍 1 , … , 𝑍 𝑜 𝑧 are i.i.d. with mean 𝜈 𝑧 and variance 𝜏 2 ▶ 𝑌 1 , … , 𝑌 𝑜 𝑦 are i.i.d. with mean 𝜈 𝑦 and variance 𝜏 2 ▶ Overall sample size 𝑜 = 𝑜 𝑧 + 𝑜 𝑦 • Parameter is the population difgerence in means, which is the treatment efgect of the social pressure mailer: 𝜈 𝑧 − 𝜈 𝑦 • Estimator is the difgerence in sample means: 𝐸 𝑜 = 𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦

Finite-sample properties ▶ 𝜄 𝑜 ] 𝜄 𝑜 ] . Let ̂ 𝜄 𝑜 ] = 0 ̂ 𝜄 𝑜 ] − 𝜄 23 / 56 𝜄 𝑜 be a estimator of 𝜄 . Then we have the following defjnitions: • bias [ ̂ 𝜄 𝑜 ] = 𝔽[ ̂ 𝜄 𝑜 is unbiased if bias [ ̂ ▶ Last week: 𝑌 𝑜 is unbiased for 𝜈 since 𝔽[𝑌 𝑜 ] = 𝜈 • Sampling variance is 𝕎[ ̂ ▶ Example: 𝕎[𝑌 𝑜 ] = 𝜏 2 /𝑜 • Standard error is se [ ̂ 𝜄 𝑜 ] = √𝕎[ ̂ ▶ Example: se [𝑌 𝑜 ] = 𝜏/√𝑜

Diff-in-means finite sample 𝑦 𝑜 𝑦 𝑦 𝑜 𝑧 𝑧 se [̂ properites 𝑜 𝑦 𝑜 𝑧 𝑧 24 / 56 • Unbiasedness from unbiasedness of sample means: 𝔽[𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦 ] = 𝔽[𝑍 𝑜 𝑧 ] − 𝔽[𝑌 𝑜 𝑦 ] = 𝜈 𝑧 − 𝜈 𝑦 • Sampling variance, by independent samples: 𝕎[𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦 ] = 𝕎[𝑍 𝑜 𝑧 ] + 𝕎[𝑌 𝑜 𝑦 ] = 𝜏 2 + 𝜏 2 • Standard error: + 𝜏 2 𝐸 𝑜 ] = √𝜏 2

𝜄 𝑜 ] Mean squared error lower overall MSE. 25 / 56 • Mean squared error or MSE is MSE = 𝔽[( ̂ 𝜄 𝑜 − 𝜄) 2 ] • The MSE assesses the quality of an estimator. ▶ How big are (squared) deviations from the true parameter? ▶ Ideally, this would be as low as possible! • Useful decomposition result: 𝜄 𝑜 ] 2 + 𝕎[ ̂ MSE = bias [ ̂ • ⇝ for unbiased estimators, MSE is the sampling variance. • Might accept some bias for large reductions in variance for

Consistency answers! 𝐸 𝑜 ] → 0 𝑜 𝑦 𝑦 𝑧 is consistent. 𝜄 𝑜 𝜄 𝑜 ] → 0 as 𝑜 → ∞ , then ̂ 26 / 56 ̂ 𝜄 𝑜 ̂ 𝑞 → 𝜄 . • An estimator is consistent if ▶ Distribution of 𝜄 𝑜 collapses on 𝜄 as 𝑜 → ∞ . ▶ WLLN: 𝑌 𝑜 is consistent for 𝜈 . ▶ Inconsistent estimator are bad bad bad: more data gives worse • Theorem: If bias [ ̂ 𝜄 𝑜 ] → 0 and se [ ̂ • Example: Difgerence-in-means. 𝐸 𝑜 ] = 𝜏 2 𝑜 𝑧 + 𝜏 2 ▶ ̂ 𝐸 𝑜 is unbiased with 𝕎[̂ ▶ ⇝ ̂ 𝐸 𝑜 consistent since 𝕎[̂ • NB: Unbiasedness does not imply consistency, nor vice versa.

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell - PowerPoint PPT Presentation

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56 1. Point Estimation 2. Properties of Estimators 3. Interval Estimation 4. Where Do Estimators Come From?* 5. Wrap up 2 / 56 check it out. Housekeeping

4th Quarter 2000 4th Quarter 2000 November 28, 2000 November 28, 2000 Investor Community

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Wild fires 1950 1950 2000 2000 250 1950 1950 2000 2000 30 40 50 20 10 0 350 200

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

Winlink 2000 Winlink 2000 May 22, 2007 May 22, 2007 Gwinnett Amateur Radio Emergency Service

TDR Assumptions for Pulsed Neutron Yield [/keV] Neutron Yield [/keV] 2500 2000 2000 2500

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

II.2 Statistical Inference: Sampling and Estimation A statistical model is a set of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Review of Estimation Theory Berlin 2003 References: 1. X. Huang et. al., Spoken Language

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

2000 I I NTERIM NTERIM R R ESULTS ESULTS P P RESENTATION 2000 RESENTATION 13th September 2000

HD-2000 HIGH DEFINITION MPEG ENCODER MODULATOR WITH ASI OUTPUT HD-2000 FRONT HD-2000 BACK

Point Estimates: Parameters I (7.1.1) Point Estimates: Parameters I (7.1.1) Point Estimates:

Confidence Intervals & Z-Scores Statistics is the grammar of science. Karl Pearson

Unit 3: Inferential Statistics for Continuous Data Statistics for Linguists with R A SIGIL

Statistical Significance and Performance Measures l Just a brief review of confidence intervals

Part 6. Confidence Interval Min Chen School of Computer Science and Engineering Seoul National

Parameters and confidence inter v als FOU N DATION S OF IN FE R E N C E Jo Hardin Instr u ctor

Chapter 21: The Accuracy of Percentages The Bootstrap When we do not know what is in the box, we

4/19/2016 Chapter 6 Inference for categorical data 1.Quick Review Huamei Dong 03/17/2016 Last

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell - PowerPoint PPT Presentation

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56 1. Point Estimation 2. Properties of Estimators 3. Interval Estimation 4. Where Do Estimators Come From?* 5. Wrap up 2 / 56 check it out. Housekeeping

4th Quarter 2000 4th Quarter 2000 November 28, 2000 November 28, 2000 Investor Community

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Wild fires 1950 1950 2000 2000 250 1950 1950 2000 2000 30 40 50 20 10 0 350 200

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

Winlink 2000 Winlink 2000 May 22, 2007 May 22, 2007 Gwinnett Amateur Radio Emergency Service

TDR Assumptions for Pulsed Neutron Yield [/keV] Neutron Yield [/keV] 2500 2000 2000 2500

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

II.2 Statistical Inference: Sampling and Estimation A statistical model is a set of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Review of Estimation Theory Berlin 2003 References: 1. X. Huang et. al., Spoken Language

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

2000 I I NTERIM NTERIM R R ESULTS ESULTS P P RESENTATION 2000 RESENTATION 13th September 2000

HD-2000 HIGH DEFINITION MPEG ENCODER MODULATOR WITH ASI OUTPUT HD-2000 FRONT HD-2000 BACK

Point Estimates: Parameters I (7.1.1) Point Estimates: Parameters I (7.1.1) Point Estimates:

Confidence Intervals &amp; Z-Scores Statistics is the grammar of science. Karl Pearson

Unit 3: Inferential Statistics for Continuous Data Statistics for Linguists with R A SIGIL

Statistical Significance and Performance Measures l Just a brief review of confidence intervals

Part 6. Confidence Interval Min Chen School of Computer Science and Engineering Seoul National

Parameters and confidence inter v als FOU N DATION S OF IN FE R E N C E Jo Hardin Instr u ctor

Chapter 21: The Accuracy of Percentages The Bootstrap When we do not know what is in the box, we

4/19/2016 Chapter 6 Inference for categorical data 1.Quick Review Huamei Dong 03/17/2016 Last

Confidence Intervals & Z-Scores Statistics is the grammar of science. Karl Pearson