Gov 2000: 5. Estimation and Statistical Inference
Matthew Blackwell
Fall 2016
1 / 56
Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell - - PowerPoint PPT Presentation
Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56 1. Point Estimation 2. Properties of Estimators 3. Interval Estimation 4. Where Do Estimators Come From?* 5. Wrap up 2 / 56 check it out. Housekeeping
Matthew Blackwell
Fall 2016
1 / 56
2 / 56
▶ Check-out exam: you have 8 hours to complete it once you
check it out.
▶ Answers must be typeset, as usual. ▶ You should have more than enough time. ▶ We’ll post practice midterms in advance.
course this week.
3 / 56
with real data.
can use it as a best guess for 𝜈?
4 / 56
5 / 56
6 / 56
load("../data/gerber_green_larimer.RData") ## turn turnout variable into a numeric social$voted <- 1 * (social$voted == "Yes") neigh.mean <- mean(social$voted[social$treatment == "Neighbors"]) neigh.mean ## [1] 0.378 contr.mean <- mean(social$voted[social$treatment == "Civic Duty"]) contr.mean ## [1] 0.315 neigh.mean - contr.mean ## [1] 0.0634
7 / 56
▶ What is our best guess about some quantity of interest? ▶ What are a set of plausible values of the quantity of interest?
▶ In an experiment, use simple difgerence in sample means
(𝑍 − 𝑌)?
▶ Or the post-stratifjcation estimator, where we estimate the
estimate the difgerence among two subsets of the data (male and female, for instance) and then take the weighted average
(𝑍𝑔 − 𝑌𝑔 )𝑎 + (𝑍𝑛 − 𝑌𝑛)(1 − 𝑎)
▶ Which (if either) is better? How would we know? 8 / 56
▶ e.g.: 𝑍𝑗 = 1 if citizen 𝑗 votes, 𝑍𝑗 = 0 otherwise. ▶ i.i.d. can be justifjed through random sampling from a
population.
▶ 𝑔 (𝑧) is often called the population distribution
9 / 56
value of some fjxed, unknown quantity of interest, 𝜄.
▶ 𝜄 is a feature of the population distribution, 𝑔 (𝑧) ▶ Also called: estimands, parameters.
▶ 𝜈 = 𝔽[𝑍𝑗]: the mean (turnout rate in the population). ▶ 𝜏2 = 𝕎[𝑍𝑗]: the variance. ▶ 𝜈𝑧 − 𝜈𝑦 = 𝔽[𝑍] − 𝔽[𝑌]: the difgerence in mean turnout
between two groups.
▶ 𝑠(𝑦) = 𝔽[𝑍|𝑌 = 𝑦]: the conditional expectation function
(regression).
10 / 56
Estimator
An estimator, ̂ 𝜄𝑜 of some parameter 𝜄, is a function of the sample: ̂ 𝜄𝑜 = ℎ(𝑍1, … , 𝑍𝑜).
𝜄𝑜 is a r.v. because it is a function of r.v.s.
▶ ⇝
̂ 𝜄𝑜 has a distribution.
▶ { ̂
𝜄1, ̂ 𝜄2, …} is a sequence of r.v.s, so we can think about convergence in probability/distribution.
11 / 56
possible estimators:
▶
̂ 𝜄𝑜 = 𝑍𝑜 the sample mean
▶
̂ 𝜄𝑜 = 𝑍1 just use the fjrst observation
▶
̂ 𝜄𝑜 = max(𝑍1, … , 𝑍𝑜)
▶
̂ 𝜄𝑜 = 3 always guess 3
12 / 56
estimate was the sample mean and my estimator was 0.38”?
13 / 56
▶ Bernoulli in the case of the social pressure/voter turnout
example)
▶ series of 1s and 0s in the sample
repeated samples from the population distribution
▶ the 0.38 sample mean in the “Neighbors” group is one draw
from this distribution
14 / 56
𝑔 (𝑧) population distribution ̂ 𝜄𝑜 estimator
{𝑍1
1 , … , 𝑍1 𝑜 }
̂ 𝜄1
𝑜
{𝑍2
1 , … , 𝑍2 𝑜 }
̂ 𝜄2
𝑜
⋮ ⋮
{𝑍𝑙−1
1
, … , 𝑍𝑙−1
𝑜
} ̂ 𝜄𝑙−1
𝑜
{𝑍𝑙
1, … , 𝑍𝑙 𝑜}
̂ 𝜄𝑙
𝑜
sampling distribution
15 / 56
## now we take the mean of one sample, which is one ## draw from the **sampling distribution** my.samp <- rbinom(n = 10, size = 1, prob = 0.4) mean(my.samp) ## [1] 0.2 ## let's take another draw from the population dist my.samp.2 <- rbinom(n = 10, size = 1, prob = 0.4) ## Let's feed this sample to the sample mean ## estimator to get another estimate, which is ## another draw from the sampling distribution mean(my.samp.2) ## [1] 0.4
16 / 56
the sample mean here when 𝑜 = 100. nsims <- 10000 mean.holder <- rep(NA, times = nsims) for (i in 1:nsims) { my.samp <- rbinom(n = 100, size = 1, prob = 0.4) mean.holder[i] <- mean(my.samp) ## sample mean first.holder[i] <- my.samp[1] ## first obs }
17 / 56
Frequency 0.0 0.2 0.4 0.6 0.8 1.0 1000 3000 5000 Population Distribution Sampling Distribution
18 / 56
Question The sampling distribution refers to the distribution of 𝜄, true or false.
19 / 56
20 / 56
̂ 𝜄𝑜.
true value.
▶ Finite sample: the properties of its sampling distribution for a
fjxed sample size 𝑜.
▶ Large sample: the properties of the sampling distribution as we
let 𝑜 → ∞.
21 / 56
▶ 𝑍1, … , 𝑍𝑜𝑧 are i.i.d. with mean 𝜈𝑧 and variance 𝜏2
𝑧
▶ 𝑌1, … , 𝑌𝑜𝑦 are i.i.d. with mean 𝜈𝑦 and variance 𝜏2
𝑦
▶ Overall sample size 𝑜 = 𝑜𝑧 + 𝑜𝑦
treatment efgect of the social pressure mailer: 𝜈𝑧 − 𝜈𝑦
̂ 𝐸𝑜 = 𝑍𝑜𝑧 − 𝑌𝑜𝑦
22 / 56
Let ̂ 𝜄𝑜 be a estimator of 𝜄. Then we have the following defjnitions:
𝜄𝑜] = 𝔽[ ̂ 𝜄𝑜] − 𝜄
▶
̂ 𝜄𝑜 is unbiased if bias[ ̂ 𝜄𝑜] = 0
▶ Last week: 𝑌𝑜 is unbiased for 𝜈 since 𝔽[𝑌𝑜] = 𝜈
𝜄𝑜].
▶ Example: 𝕎[𝑌𝑜] = 𝜏2/𝑜
𝜄𝑜] = √𝕎[ ̂ 𝜄𝑜]
▶ Example: se[𝑌𝑜] = 𝜏/√𝑜 23 / 56
𝔽[𝑍𝑜𝑧 − 𝑌𝑜𝑦] = 𝔽[𝑍𝑜𝑧] − 𝔽[𝑌𝑜𝑦] = 𝜈𝑧 − 𝜈𝑦
𝕎[𝑍𝑜𝑧 − 𝑌𝑜𝑦] = 𝕎[𝑍𝑜𝑧] + 𝕎[𝑌𝑜𝑦] = 𝜏2
𝑧
𝑜𝑧 + 𝜏2
𝑦
𝑜𝑦
se[̂ 𝐸𝑜] = √𝜏2
𝑧
𝑜𝑧 + 𝜏2
𝑦
𝑜𝑦
24 / 56
MSE = 𝔽[( ̂ 𝜄𝑜 − 𝜄)2]
▶ How big are (squared) deviations from the true parameter? ▶ Ideally, this would be as low as possible!
MSE = bias[ ̂ 𝜄𝑜]2 + 𝕎[ ̂ 𝜄𝑜]
lower overall MSE.
25 / 56
̂ 𝜄𝑜
𝑞
→ 𝜄.
▶ Distribution of
̂ 𝜄𝑜 collapses on 𝜄 as 𝑜 → ∞.
▶ WLLN: 𝑌𝑜 is consistent for 𝜈. ▶ Inconsistent estimator are bad bad bad: more data gives worse
answers!
𝜄𝑜] → 0 and se[ ̂ 𝜄𝑜] → 0 as 𝑜 → ∞, then ̂ 𝜄𝑜 is consistent.
▶ ̂
𝐸𝑜 is unbiased with 𝕎[̂ 𝐸𝑜] = 𝜏2
𝑧
𝑜𝑧 + 𝜏2
𝑦
𝑜𝑦
▶ ⇝ ̂
𝐸𝑜 consistent since 𝕎[̂ 𝐸𝑜] → 0
26 / 56
̂ 𝜄𝑔
𝑜 = 𝑍1.
▶ Unbiased because 𝔽[ ̂
𝜄𝑔
𝑜] = 𝔽[𝑍1] = 𝜈𝑧
▶ Not consistent:
̂ 𝜄𝑔
𝑜 is constant in 𝑜 so its distribution never
collapses.
▶ Said difgerently: the variance of
̂ 𝜄𝑔
𝑜 never shrinks.
𝑜 𝑜 − 1𝑍𝑜 = 1 𝑜 − 1
𝑜
∑
𝑗=1
𝑍𝑗
▶ Bias: 𝔽[ 𝑜
𝑜−1𝑍𝑜] − 𝜈𝑧 = 1 𝑜−1𝜈𝑧
▶ Consistent because bias and se → 0 as 𝑜 → ∞. 27 / 56
̂ 𝜄𝑜 − 𝜄 se[ ̂ 𝜄𝑜]
𝑒
→ 𝑂(0, 1)
▶ Allows us to approximate the probability of
̂ 𝜄𝑜 being far away from 𝜄 in large samples.
by some version of the Central Limit Theorem.
▶ CLT: 𝑌𝑜 is asymptotically normal
̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) √𝜏2
𝑧/𝑜𝑧 + 𝜏2 𝑦/𝑜𝑦 𝑒
→ 𝑂(0, 1)
28 / 56
𝜄𝑜]?!
se[ ̂ 𝜄𝑜]!
̂ 𝜄𝑜 is asymptotically normal and ̂ se[ ̂ 𝜄𝑜]
𝑞
→ se[ ̂ 𝜄𝑜], then: ̂ 𝜄𝑜 − 𝜄 ̂ se[ ̂ 𝜄𝑜]
𝑒
→ 𝑂(0, 1)
large samples.
29 / 56
𝐸𝑜] = 𝜏2
𝑧
𝑜𝑧 + 𝜏2
𝑦
𝑜𝑦
▶ Need to estimate these dang unknown population variances,
𝜏2
𝑧 and 𝜏2 𝑦.
𝑧 = 1 𝑜𝑧−1 ∑ 𝑜𝑧 𝑗=1(𝑍𝑗 − 𝑍𝑜𝑧)2
▶ Consistent for population variance: 𝑇2
𝑧 𝑞
→ 𝜏2
𝑧
̂ 𝕎[̂ 𝐸𝑜] = 𝑇2
𝑧
𝑜𝑧 + 𝑇2
𝑦
𝑜𝑧
𝑞
→ 𝜏2
𝑧
𝑜𝑧 + 𝜏2
𝑦
𝑜𝑦 = 𝕎[̂ 𝐸𝑜]
30 / 56
𝕎[̂ 𝐸𝑜]
𝑞
→ 𝕎[̂ 𝐸𝑜] then ̂ se[̂ 𝐸𝑜] = √̂ 𝕎[̂ 𝐸𝑜]
𝑞
→ se[̂ 𝐸]
▶ Challenge question: prove this.
𝐸𝑜 is asymptotically normal and ̂ se[̂ 𝐸𝑜] is consistent, then we know that: ̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) √𝑇2
𝑧/𝑜𝑧 + 𝑇2 𝑦/𝑜𝑦 𝑒
→ 𝑂(0, 1)
how far ̂ 𝐸𝑜 will be from the truth!
31 / 56
32 / 56
truth with some fjxed probability
𝜈𝑧 − 𝜈𝑦, consists of two bounds within which we expect 𝜈𝑧 − 𝜈𝑦 to reside: 𝑏 ≤ 𝜈𝑧 − 𝜈𝑦 ≤ 𝑐
the distributional properties of estimators. Ideas extend to all estimators, including regression.
33 / 56
Confjdence interval
A 100(1 − 𝛽)% confjdence interval for a population parameter 𝜄 is an interval 𝐷𝑜 = (𝑏, 𝑐), where 𝑏 = 𝑏(𝑍1, … , 𝑍𝑜) and 𝑐 = 𝑐(𝑍1, … , 𝑍𝑜) are functions of the data such that ℙ(𝑏 ≤ 𝜄 ≤ 𝑐) ≥ 1 − 𝛽.
time.
▶ An estimator just like 𝑌𝑜 but with two values.
estimate.
34 / 56
se = √𝑇2
𝑧/𝑜𝑧 + 𝑇2 𝑦/𝑜𝑦, so that:
̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) ̂ se
𝑒
→ 𝑂(0, 1)
interval such that: (𝜈𝑧 − 𝜈𝑦): ℙ (𝑏 ≤ (𝜈𝑧 − 𝜈𝑦) ≤ 𝑐) = 0.95
will between these two bounds.
ℙ ⎛ ⎜ ⎝ −1.96 ≤ ̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) ̂ se ≤ 1.96⎞ ⎟ ⎠ ≈ 0.95
35 / 56
0.95 ≈ ℙ( − 1.96 ≤ ̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) ̂ se ≤ 1.96) =ℙ( − 1.96 × ̂ se ≤ ̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) ≤ 1.96 × ̂ se) =ℙ( − ̂ 𝐸𝑜 − 1.96 × ̂ se ≤ − (𝜈𝑧 − 𝜈𝑦) ≤ − ̂ 𝐸𝑜 + 1.96 × ̂ se) =ℙ(̂ 𝐸𝑜 − 1.96 × ̂ se ≤ (𝜈𝑧 − 𝜈𝑦) ≤ ̂ 𝐸𝑜 + 1.96 × ̂ se)
𝐸𝑜 − 1.96 × ̂ se
𝐸𝑜 + 1.96 × ̂ se
▶ Usually written as ̂
𝐸𝑜 ± 1.96 × ̂ se
36 / 56
neigh_var <- var(social$voted[social$treatment == "Neighbors"]) neigh_n <- 38201 civic_var <- var(social$voted[social$treatment == "Civic Duty"]) civic_n <- 38218 se_diff <- sqrt(neigh_var/neigh_n + civic_var/civic_n) ## lower bound (0.378 - 0.315) - 1.96 * se_diff ## [1] 0.0563 ## upper bound (0.378 - 0.315) + 1.96 * se_diff ## [1] 0.0697
37 / 56
confjdence interval is the following:
▶ “I calculated a 95% confjdence interval of [0.05,0.13], which
means that there is a 95% chance that the true difgerence in means in is that interval.”
▶ This is WRONG.
is fjxed.
▶ It is either in the interval or it isn’t—there’s no room for
probability at all.
𝐸𝑜 ± 1.96 × ̂ se[̂ 𝐸𝑜]. This is what varies from sample to sample.
constructed confjdence interval will contain the true value.
38 / 56
𝑍𝑜 ± 1.96 × ̂ se[𝑍𝑜] ⇝ 𝑍𝑜 ± 1.96 × 𝑇𝑜/√𝑜
set.seed(2143) sims <- 10000 cover <- rep(0, times = sims) low.bound <- up.bound <- rep(NA, times = sims) for (i in 1:sims) { draws <- rnorm(500, mean = 1, sd = sqrt(10)) low.bound[i] <- mean(draws) - sd(draws)/sqrt(500) * 1.96 up.bound[i] <- mean(draws) + sd(draws)/sqrt(500) * 1.96 if (low.bound[i] < 1 & up.bound[i] > 1) { cover[i] <- 1 } } mean(cover) ## [1] 0.95
39 / 56
0.6 0.8 1.0 1.2 1.4 Trial Estimate 40 / 56
0.6 0.8 1.0 1.2 1.4 Trial Estimate 41 / 56
0.6 0.8 1.0 1.2 1.4 Trial Estimate 42 / 56
0.6 0.8 1.0 1.2 1.4 Trial Estimate 43 / 56
0.6 0.8 1.0 1.2 1.4 Trial Estimate
calculated confjdence intervals contains the true value.
44 / 56
̂ 𝜄𝑜 be an asymptotically normal estimator for 𝜄.
▶ Any aysmp. normal estimator! 𝑌𝑜, ̂
𝐸𝑜, or whatever!
̂ 𝜄𝑜 ± 𝑨𝛽/2 × ̂ se[ ̂ 𝜄𝑜]
ℙ (−𝑨𝛽/2 ≤ ̂ 𝜄𝑜 − 𝜄 ̂ se[ ̂ 𝜄𝑜] ≤ 𝑨𝛽/2) = (1 − 𝛽)
45 / 56
2 4
0.0 0.1 0.2 0.3 0.4 0.5 dnorm(x) 0.95 z = 1.96
values such that for 𝑎 ∼ 𝑂(0, 1): ℙ(−𝑨𝛽/2 ≤ 𝑎 ≤ 𝑨𝛽/2) = 1 − 𝛽
tails.
want the 𝑨 values that put 0.025 (2.5%) in each of the tails.
46 / 56
ℙ({𝑎 < −𝑨𝛽/2} ∪ {𝑎 > 𝑨𝛽/2}) = 𝛽 ℙ(𝑎 < −𝑨𝛽/2) + ℙ(𝑎 > 𝑨𝛽/2) = 𝛽 (additivity) 2 × ℙ(𝑎 > 𝑨𝛽/2) = 𝛽 (symmetry) ℙ(𝑎 < 𝑨𝛽/2) = 1 − 𝛽/2
2 4 0.0 0.1 0.2 0.3 0.4 0.5 dnorm(x) 0.975 z = ?
47 / 56
evaluated at 1 − 𝛽/2!
confjdence interval (90% in this case)
qnorm(0.95) ## [1] 1.64
̂ 𝜄𝑜 ± 1.64 × ̂ se[ ̂ 𝜄𝑜]
48 / 56
when we increase our confjdence, from say 95% to 99%? Do confjdence intervals get wider or shorter?
49 / 56
50 / 56
that could have possibly generated the data.
fjnite number of parameters.
▶ Bernoulli distribution:
𝔾 = {𝑔 (𝑧; 𝑞) = 𝑧𝑞(1 − 𝑧)1−𝑞 ∶ 0 ≤ 𝑞 ≤ 1}
▶ Normal distribution:
𝔾 = {𝑔 (𝑧; 𝜈, 𝜏2) = 1 𝜏√2𝜌 exp {− 1 2𝜏2 (𝑧 − 𝜈)2} ∶ 𝜈 ∈ ℝ, 𝜏2 > 0}
▶ Basis of maximum likelihood, Bayesian inference, etc.
▶ ⇝ if our choice of model is wrong, our inferences might be
wrong
51 / 56
by a fjnite set of parameters.
▶ All distributions with fjnite mean:
𝔾 = {𝑔 (𝑧) ∶ 𝔽[𝑍] < ∞}
▶ All distributions with fjnite mean and variance:
𝔾 = {𝑔 (𝑧) ∶ 𝔽[𝑍] < ∞, 𝕎[𝑍] < ∞}
52 / 56
method of moments.
▶ Derive estimators from the assumed p.m.f./p.d.f. 𝑔 (𝑧). ▶ Gov 2001 and beyond.
▶ Quantities of interest are usually made up of expectations:
𝔽[(𝑍)] for some function ()
▶ Analogy principle: replace any population expectations,
𝔽[(𝑍)] with sample means, 1
𝑜 ∑𝑜 𝑗=1 (𝑍𝑗)
53 / 56
𝜈 = 𝔽[𝑍𝑗] ⇝ ̂ 𝜈 = 1 𝑜
𝑜
∑
𝑗=1
𝑍𝑗
𝜏2 = 𝔽[(𝑍𝑗 − 𝔽[𝑍𝑗])2] ⇝ 1 𝑜
𝑜
∑
𝑗=1
(𝑍𝑗 − 𝑍)2
Cov[𝑌𝑗, 𝑍𝑗] = 𝔽[(𝑌𝑗−𝔽[𝑌𝑗])(𝑍𝑗−𝔽[𝑍𝑗])] ⇝ 1 𝑜
𝑜
∑
𝑗=1
(𝑌𝑗−𝑌)(𝑍𝑗−𝑍)
54 / 56
55 / 56
any parameter.
with you for almost any statistical procedure moving forward.
56 / 56