Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell - - PowerPoint PPT Presentation

gov 2000 5 estimation and statistical inference
SMART_READER_LITE
LIVE PREVIEW

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell - - PowerPoint PPT Presentation

Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56 1. Point Estimation 2. Properties of Estimators 3. Interval Estimation 4. Where Do Estimators Come From?* 5. Wrap up 2 / 56 check it out. Housekeeping


slide-1
SLIDE 1

Gov 2000: 5. Estimation and Statistical Inference

Matthew Blackwell

Fall 2016

1 / 56

slide-2
SLIDE 2
  • 1. Point Estimation
  • 2. Properties of Estimators
  • 3. Interval Estimation
  • 4. Where Do Estimators Come From?*
  • 5. Wrap up

2 / 56

slide-3
SLIDE 3

Housekeeping

  • This Thursday, 10/6: HW 3 due, HW 4 goes out.
  • Next Thursday, 10/13: HW 4 due, HW 5 goes out.
  • Thursday, 10/20: HW 5 due, Midterm available.
  • Midterm:

▶ Check-out exam: you have 8 hours to complete it once you

check it out.

▶ Answers must be typeset, as usual. ▶ You should have more than enough time. ▶ We’ll post practice midterms in advance.

  • Evaluations: we’ll be fjelding an anonymous survey about the

course this week.

3 / 56

slide-4
SLIDE 4

Where are we? Where are we going?

  • Last few weeks: probability, learning how to think about r.v.s
  • Now: how to estimate features of underlying distributions

with real data.

  • Build on last week: if the sample mean will be “close” to 𝜈,

can use it as a best guess for 𝜈?

4 / 56

slide-5
SLIDE 5

1/ Point Estimation

5 / 56

slide-6
SLIDE 6

Motivating example

  • Gerber, Green, and Larimer (APSR, 2008)

6 / 56

slide-7
SLIDE 7

Motivating Example

load("../data/gerber_green_larimer.RData") ## turn turnout variable into a numeric social$voted <- 1 * (social$voted == "Yes") neigh.mean <- mean(social$voted[social$treatment == "Neighbors"]) neigh.mean ## [1] 0.378 contr.mean <- mean(social$voted[social$treatment == "Civic Duty"]) contr.mean ## [1] 0.315 neigh.mean - contr.mean ## [1] 0.0634

  • Is this a “real”? Is it big?

7 / 56

slide-8
SLIDE 8

Why study estimators?

  • Goal 1: Inference

▶ What is our best guess about some quantity of interest? ▶ What are a set of plausible values of the quantity of interest?

  • Goal 2: Compare estimators

▶ In an experiment, use simple difgerence in sample means

(𝑍 − 𝑌)?

▶ Or the post-stratifjcation estimator, where we estimate the

estimate the difgerence among two subsets of the data (male and female, for instance) and then take the weighted average

  • f the two (𝑎 is the share of women):

(𝑍𝑔 − 𝑌𝑔 )𝑎 + (𝑍𝑛 − 𝑌𝑛)(1 − 𝑎)

▶ Which (if either) is better? How would we know? 8 / 56

slide-9
SLIDE 9

Samples from the population

  • Our focus: 𝑍1, … , 𝑍𝑜 are i.i.d. draws from 𝑔 (𝑧)

▶ e.g.: 𝑍𝑗 = 1 if citizen 𝑗 votes, 𝑍𝑗 = 0 otherwise. ▶ i.i.d. can be justifjed through random sampling from a

population.

▶ 𝑔 (𝑧) is often called the population distribution

  • Statistical inference or learning is using data to infer 𝑔 (𝑧).

9 / 56

slide-10
SLIDE 10

Point estimation

  • Point estimation: providing a single “best guess” as to the

value of some fjxed, unknown quantity of interest, 𝜄.

▶ 𝜄 is a feature of the population distribution, 𝑔 (𝑧) ▶ Also called: estimands, parameters.

  • Examples of quantities of interest:

▶ 𝜈 = 𝔽[𝑍𝑗]: the mean (turnout rate in the population). ▶ 𝜏2 = 𝕎[𝑍𝑗]: the variance. ▶ 𝜈𝑧 − 𝜈𝑦 = 𝔽[𝑍] − 𝔽[𝑌]: the difgerence in mean turnout

between two groups.

▶ 𝑠(𝑦) = 𝔽[𝑍|𝑌 = 𝑦]: the conditional expectation function

(regression).

  • These are the things we want to learn about.

10 / 56

slide-11
SLIDE 11

Estimators

Estimator

An estimator, ̂ 𝜄𝑜 of some parameter 𝜄, is a function of the sample: ̂ 𝜄𝑜 = ℎ(𝑍1, … , 𝑍𝑜).

  • ̂

𝜄𝑜 is a r.v. because it is a function of r.v.s.

▶ ⇝

̂ 𝜄𝑜 has a distribution.

▶ { ̂

𝜄1, ̂ 𝜄2, …} is a sequence of r.v.s, so we can think about convergence in probability/distribution.

  • An estimate is one particular realization of the estimator/r.v.

11 / 56

slide-12
SLIDE 12

Examples of Estimators

  • For the population expectation, 𝜈, we have many difgerent

possible estimators:

̂ 𝜄𝑜 = 𝑍𝑜 the sample mean

̂ 𝜄𝑜 = 𝑍1 just use the fjrst observation

̂ 𝜄𝑜 = max(𝑍1, … , 𝑍𝑜)

̂ 𝜄𝑜 = 3 always guess 3

12 / 56

slide-13
SLIDE 13

Understanding check

  • Question Why is the following statement wrong: “My

estimate was the sample mean and my estimator was 0.38”?

13 / 56

slide-14
SLIDE 14

The three distributions

  • Population Distribution: the data-generating process

▶ Bernoulli in the case of the social pressure/voter turnout

example)

  • Empirical distribution: 𝑍1, … , 𝑍𝑜

▶ series of 1s and 0s in the sample

  • Sampling distribution: distribution of the estimator over

repeated samples from the population distribution

▶ the 0.38 sample mean in the “Neighbors” group is one draw

from this distribution

14 / 56

slide-15
SLIDE 15

Sampling distribution, in pictures

𝑔 (𝑧) population distribution ̂ 𝜄𝑜 estimator

{𝑍1

1 , … , 𝑍1 𝑜 }

̂ 𝜄1

𝑜

{𝑍2

1 , … , 𝑍2 𝑜 }

̂ 𝜄2

𝑜

⋮ ⋮

{𝑍𝑙−1

1

, … , 𝑍𝑙−1

𝑜

} ̂ 𝜄𝑙−1

𝑜

{𝑍𝑙

1, … , 𝑍𝑙 𝑜}

̂ 𝜄𝑙

𝑜

sampling distribution

15 / 56

slide-16
SLIDE 16

Sampling distribution

## now we take the mean of one sample, which is one ## draw from the **sampling distribution** my.samp <- rbinom(n = 10, size = 1, prob = 0.4) mean(my.samp) ## [1] 0.2 ## let's take another draw from the population dist my.samp.2 <- rbinom(n = 10, size = 1, prob = 0.4) ## Let's feed this sample to the sample mean ## estimator to get another estimate, which is ## another draw from the sampling distribution mean(my.samp.2) ## [1] 0.4

16 / 56

slide-17
SLIDE 17

Sampling distribution by simulation

  • Let’s generate 10,000 draws from the sampling distribution of

the sample mean here when 𝑜 = 100. nsims <- 10000 mean.holder <- rep(NA, times = nsims) for (i in 1:nsims) { my.samp <- rbinom(n = 100, size = 1, prob = 0.4) mean.holder[i] <- mean(my.samp) ## sample mean first.holder[i] <- my.samp[1] ## first obs }

17 / 56

slide-18
SLIDE 18

Sampling distribution versus population distribution

Frequency 0.0 0.2 0.4 0.6 0.8 1.0 1000 3000 5000 Population Distribution Sampling Distribution

18 / 56

slide-19
SLIDE 19

Question The sampling distribution refers to the distribution of 𝜄, true or false.

19 / 56

slide-20
SLIDE 20

2/ Properties of Estimators

20 / 56

slide-21
SLIDE 21

Properties of estimators

  • We only get one draw from the sampling distribution,

̂ 𝜄𝑜.

  • Want to use estimators whose distribution is “close” to the

true value.

  • There are two ways we evaluate estimators:

▶ Finite sample: the properties of its sampling distribution for a

fjxed sample size 𝑜.

▶ Large sample: the properties of the sampling distribution as we

let 𝑜 → ∞.

21 / 56

slide-22
SLIDE 22

Running example

  • Two independent random samples (treatment/control):

▶ 𝑍1, … , 𝑍𝑜𝑧 are i.i.d. with mean 𝜈𝑧 and variance 𝜏2

𝑧

▶ 𝑌1, … , 𝑌𝑜𝑦 are i.i.d. with mean 𝜈𝑦 and variance 𝜏2

𝑦

▶ Overall sample size 𝑜 = 𝑜𝑧 + 𝑜𝑦

  • Parameter is the population difgerence in means, which is the

treatment efgect of the social pressure mailer: 𝜈𝑧 − 𝜈𝑦

  • Estimator is the difgerence in sample means:

̂ 𝐸𝑜 = 𝑍𝑜𝑧 − 𝑌𝑜𝑦

22 / 56

slide-23
SLIDE 23

Finite-sample properties

Let ̂ 𝜄𝑜 be a estimator of 𝜄. Then we have the following defjnitions:

  • bias[ ̂

𝜄𝑜] = 𝔽[ ̂ 𝜄𝑜] − 𝜄

̂ 𝜄𝑜 is unbiased if bias[ ̂ 𝜄𝑜] = 0

▶ Last week: 𝑌𝑜 is unbiased for 𝜈 since 𝔽[𝑌𝑜] = 𝜈

  • Sampling variance is 𝕎[ ̂

𝜄𝑜].

▶ Example: 𝕎[𝑌𝑜] = 𝜏2/𝑜

  • Standard error is se[ ̂

𝜄𝑜] = √𝕎[ ̂ 𝜄𝑜]

▶ Example: se[𝑌𝑜] = 𝜏/√𝑜 23 / 56

slide-24
SLIDE 24

Diff-in-means finite sample properites

  • Unbiasedness from unbiasedness of sample means:

𝔽[𝑍𝑜𝑧 − 𝑌𝑜𝑦] = 𝔽[𝑍𝑜𝑧] − 𝔽[𝑌𝑜𝑦] = 𝜈𝑧 − 𝜈𝑦

  • Sampling variance, by independent samples:

𝕎[𝑍𝑜𝑧 − 𝑌𝑜𝑦] = 𝕎[𝑍𝑜𝑧] + 𝕎[𝑌𝑜𝑦] = 𝜏2

𝑧

𝑜𝑧 + 𝜏2

𝑦

𝑜𝑦

  • Standard error:

se[̂ 𝐸𝑜] = √𝜏2

𝑧

𝑜𝑧 + 𝜏2

𝑦

𝑜𝑦

24 / 56

slide-25
SLIDE 25

Mean squared error

  • Mean squared error or MSE is

MSE = 𝔽[( ̂ 𝜄𝑜 − 𝜄)2]

  • The MSE assesses the quality of an estimator.

▶ How big are (squared) deviations from the true parameter? ▶ Ideally, this would be as low as possible!

  • Useful decomposition result:

MSE = bias[ ̂ 𝜄𝑜]2 + 𝕎[ ̂ 𝜄𝑜]

  • ⇝ for unbiased estimators, MSE is the sampling variance.
  • Might accept some bias for large reductions in variance for

lower overall MSE.

25 / 56

slide-26
SLIDE 26

Consistency

  • An estimator is consistent if

̂ 𝜄𝑜

𝑞

→ 𝜄.

▶ Distribution of

̂ 𝜄𝑜 collapses on 𝜄 as 𝑜 → ∞.

▶ WLLN: 𝑌𝑜 is consistent for 𝜈. ▶ Inconsistent estimator are bad bad bad: more data gives worse

answers!

  • Theorem: If bias[ ̂

𝜄𝑜] → 0 and se[ ̂ 𝜄𝑜] → 0 as 𝑜 → ∞, then ̂ 𝜄𝑜 is consistent.

  • Example: Difgerence-in-means.

▶ ̂

𝐸𝑜 is unbiased with 𝕎[̂ 𝐸𝑜] = 𝜏2

𝑧

𝑜𝑧 + 𝜏2

𝑦

𝑜𝑦

▶ ⇝ ̂

𝐸𝑜 consistent since 𝕎[̂ 𝐸𝑜] → 0

  • NB: Unbiasedness does not imply consistency, nor vice versa.

26 / 56

slide-27
SLIDE 27

Unbiased versus consistent

  • Unbiased, not consistent: “fjrst observation” estimator,

̂ 𝜄𝑔

𝑜 = 𝑍1.

▶ Unbiased because 𝔽[ ̂

𝜄𝑔

𝑜] = 𝔽[𝑍1] = 𝜈𝑧

▶ Not consistent:

̂ 𝜄𝑔

𝑜 is constant in 𝑜 so its distribution never

collapses.

▶ Said difgerently: the variance of

̂ 𝜄𝑔

𝑜 never shrinks.

  • Consistent, but biased: sample mean with 𝑜 replaced by 𝑜 − 1:

𝑜 𝑜 − 1𝑍𝑜 = 1 𝑜 − 1

𝑜

𝑗=1

𝑍𝑗

▶ Bias: 𝔽[ 𝑜

𝑜−1𝑍𝑜] − 𝜈𝑧 = 1 𝑜−1𝜈𝑧

▶ Consistent because bias and se → 0 as 𝑜 → ∞. 27 / 56

slide-28
SLIDE 28

Asymptotic normality

  • An estimator is asymptotically normal if

̂ 𝜄𝑜 − 𝜄 se[ ̂ 𝜄𝑜]

𝑒

→ 𝑂(0, 1)

▶ Allows us to approximate the probability of

̂ 𝜄𝑜 being far away from 𝜄 in large samples.

  • Many, many, many estimators will be asymptotically normal

by some version of the Central Limit Theorem.

▶ CLT: 𝑌𝑜 is asymptotically normal

  • By an extension of the CLT for independent samples:

̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) √𝜏2

𝑧/𝑜𝑧 + 𝜏2 𝑦/𝑜𝑦 𝑒

→ 𝑂(0, 1)

28 / 56

slide-29
SLIDE 29

Help, I don’t know the SE

  • But we don’t know se[ ̂

𝜄𝑜]?!

  • ⇝ plug in a consistent estimator ̂

se[ ̂ 𝜄𝑜]!

  • If

̂ 𝜄𝑜 is asymptotically normal and ̂ se[ ̂ 𝜄𝑜]

𝑞

→ se[ ̂ 𝜄𝑜], then: ̂ 𝜄𝑜 − 𝜄 ̂ se[ ̂ 𝜄𝑜]

𝑒

→ 𝑂(0, 1)

  • Using the true vs. estimated standard error doesn’t matter in

large samples.

29 / 56

slide-30
SLIDE 30

Estimating the Sampling Variance/Standard Error

  • Difg-in-means variance: 𝕎[̂

𝐸𝑜] = 𝜏2

𝑧

𝑜𝑧 + 𝜏2

𝑦

𝑜𝑦

▶ Need to estimate these dang unknown population variances,

𝜏2

𝑧 and 𝜏2 𝑦.

  • Use the sample variances: 𝑇2

𝑧 = 1 𝑜𝑧−1 ∑ 𝑜𝑧 𝑗=1(𝑍𝑗 − 𝑍𝑜𝑧)2

▶ Consistent for population variance: 𝑇2

𝑧 𝑞

→ 𝜏2

𝑧

  • Estimated difg-in-means variance is consistent:

̂ 𝕎[̂ 𝐸𝑜] = 𝑇2

𝑧

𝑜𝑧 + 𝑇2

𝑦

𝑜𝑧

𝑞

→ 𝜏2

𝑧

𝑜𝑧 + 𝜏2

𝑦

𝑜𝑦 = 𝕎[̂ 𝐸𝑜]

30 / 56

slide-31
SLIDE 31

Putuing it all together

  • If ̂

𝕎[̂ 𝐸𝑜]

𝑞

→ 𝕎[̂ 𝐸𝑜] then ̂ se[̂ 𝐸𝑜] = √̂ 𝕎[̂ 𝐸𝑜]

𝑞

→ se[̂ 𝐸]

▶ Challenge question: prove this.

  • Since we know ̂

𝐸𝑜 is asymptotically normal and ̂ se[̂ 𝐸𝑜] is consistent, then we know that: ̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) √𝑇2

𝑧/𝑜𝑧 + 𝑇2 𝑦/𝑜𝑦 𝑒

→ 𝑂(0, 1)

  • Now we can make approximate probability statements about

how far ̂ 𝐸𝑜 will be from the truth!

31 / 56

slide-32
SLIDE 32

3/ Interval Estimation

32 / 56

slide-33
SLIDE 33

Interval estimation - what and why?

  • 𝑍𝑜 − 𝑌𝑜 is our best guess about 𝜈𝑧 − 𝜈𝑦
  • But ℙ(𝑍𝑜 − 𝑌𝑜 = 𝜈𝑧 − 𝜈𝑦) = 0!
  • Alternative: produce a range of values that will contain the

truth with some fjxed probability

  • An interval estimate of the population difgerence in means,

𝜈𝑧 − 𝜈𝑦, consists of two bounds within which we expect 𝜈𝑧 − 𝜈𝑦 to reside: 𝑏 ≤ 𝜈𝑧 − 𝜈𝑦 ≤ 𝑐

  • How can we possibly fjgure out such an interval? We’ll rely on

the distributional properties of estimators. Ideas extend to all estimators, including regression.

33 / 56

slide-34
SLIDE 34

What is a confidence interval?

Confjdence interval

A 100(1 − 𝛽)% confjdence interval for a population parameter 𝜄 is an interval 𝐷𝑜 = (𝑏, 𝑐), where 𝑏 = 𝑏(𝑍1, … , 𝑍𝑜) and 𝑐 = 𝑐(𝑍1, … , 𝑍𝑜) are functions of the data such that ℙ(𝑏 ≤ 𝜄 ≤ 𝑐) ≥ 1 − 𝛽.

  • The random interval (𝑏, 𝑐) will bound 𝜄 100(1 − 𝛽)% of the

time.

▶ An estimator just like 𝑌𝑜 but with two values.

  • 1 − 𝛽 is the coverage of the confjdence interval.
  • Extremely useful way to represent our uncertainty about our

estimate.

34 / 56

slide-35
SLIDE 35

Deriving a probabilistic bound

  • Let ̂

se = √𝑇2

𝑧/𝑜𝑧 + 𝑇2 𝑦/𝑜𝑦, so that:

̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) ̂ se

𝑒

→ 𝑂(0, 1)

  • Because of the CLT, we can use this to derive a confjdence

interval such that: (𝜈𝑧 − 𝜈𝑦): ℙ (𝑏 ≤ (𝜈𝑧 − 𝜈𝑦) ≤ 𝑐) = 0.95

  • We want to fjnd a value so that in 95% of random samples, it

will between these two bounds.

  • Use the following fact. For large 𝑜:

ℙ ⎛ ⎜ ⎝ −1.96 ≤ ̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) ̂ se ≤ 1.96⎞ ⎟ ⎠ ≈ 0.95

35 / 56

slide-36
SLIDE 36

Deriving the interval

  • Let’s work backwards to derive the confjdence interval:

0.95 ≈ ℙ( − 1.96 ≤ ̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) ̂ se ≤ 1.96) =ℙ( − 1.96 × ̂ se ≤ ̂ 𝐸𝑜 − (𝜈𝑧 − 𝜈𝑦) ≤ 1.96 × ̂ se) =ℙ( − ̂ 𝐸𝑜 − 1.96 × ̂ se ≤ − (𝜈𝑧 − 𝜈𝑦) ≤ − ̂ 𝐸𝑜 + 1.96 × ̂ se) =ℙ(̂ 𝐸𝑜 − 1.96 × ̂ se ≤ (𝜈𝑧 − 𝜈𝑦) ≤ ̂ 𝐸𝑜 + 1.96 × ̂ se)

  • Lower bound: ̂

𝐸𝑜 − 1.96 × ̂ se

  • Upper bound: ̂

𝐸𝑜 + 1.96 × ̂ se

▶ Usually written as ̂

𝐸𝑜 ± 1.96 × ̂ se

  • Bounds are random! Not (𝜈𝑧 − 𝜈𝑦)!

36 / 56

slide-37
SLIDE 37

CI for social pressure effect

neigh_var <- var(social$voted[social$treatment == "Neighbors"]) neigh_n <- 38201 civic_var <- var(social$voted[social$treatment == "Civic Duty"]) civic_n <- 38218 se_diff <- sqrt(neigh_var/neigh_n + civic_var/civic_n) ## lower bound (0.378 - 0.315) - 1.96 * se_diff ## [1] 0.0563 ## upper bound (0.378 - 0.315) + 1.96 * se_diff ## [1] 0.0697

37 / 56

slide-38
SLIDE 38

Interpreting the confidence interval

  • Caution! An often recited, but incorrect interpretation of a

confjdence interval is the following:

▶ “I calculated a 95% confjdence interval of [0.05,0.13], which

means that there is a 95% chance that the true difgerence in means in is that interval.”

▶ This is WRONG.

  • The true value of the population difgerence in means, 𝜈𝑧 − 𝜈𝑦,

is fjxed.

▶ It is either in the interval or it isn’t—there’s no room for

probability at all.

  • The randomness is in the interval: ̂

𝐸𝑜 ± 1.96 × ̂ se[̂ 𝐸𝑜]. This is what varies from sample to sample.

  • Correct interpretation: across 95% of random samples, the

constructed confjdence interval will contain the true value.

38 / 56

slide-39
SLIDE 39

Confidence interval simulation

  • Draw samples of size 500 (pretty big) from 𝑂(1, 10)
  • Calculate confjdence intervals for the sample mean:

𝑍𝑜 ± 1.96 × ̂ se[𝑍𝑜] ⇝ 𝑍𝑜 ± 1.96 × 𝑇𝑜/√𝑜

set.seed(2143) sims <- 10000 cover <- rep(0, times = sims) low.bound <- up.bound <- rep(NA, times = sims) for (i in 1:sims) { draws <- rnorm(500, mean = 1, sd = sqrt(10)) low.bound[i] <- mean(draws) - sd(draws)/sqrt(500) * 1.96 up.bound[i] <- mean(draws) + sd(draws)/sqrt(500) * 1.96 if (low.bound[i] < 1 & up.bound[i] > 1) { cover[i] <- 1 } } mean(cover) ## [1] 0.95

39 / 56

slide-40
SLIDE 40

Plotuing the CIs

0.6 0.8 1.0 1.2 1.4 Trial Estimate 40 / 56

slide-41
SLIDE 41

Plotuing the CIs

0.6 0.8 1.0 1.2 1.4 Trial Estimate 41 / 56

slide-42
SLIDE 42

Plotuing the CIs

0.6 0.8 1.0 1.2 1.4 Trial Estimate 42 / 56

slide-43
SLIDE 43

Plotuing the CIs

0.6 0.8 1.0 1.2 1.4 Trial Estimate 43 / 56

slide-44
SLIDE 44

Plotuing the CIs

0.6 0.8 1.0 1.2 1.4 Trial Estimate

  • You can see that in these 100 samples, exactly 95 of the

calculated confjdence intervals contains the true value.

44 / 56

slide-45
SLIDE 45

More general confidence intervals

  • Let

̂ 𝜄𝑜 be an asymptotically normal estimator for 𝜄.

▶ Any aysmp. normal estimator! 𝑌𝑜, ̂

𝐸𝑜, or whatever!

  • A general formula for a 100(1 − 𝛽)% confjdence interval is:

̂ 𝜄𝑜 ± 𝑨𝛽/2 × ̂ se[ ̂ 𝜄𝑜]

  • 𝑨𝛽/2 comes from a similar derivation as earlier:

ℙ (−𝑨𝛽/2 ≤ ̂ 𝜄𝑜 − 𝜄 ̂ se[ ̂ 𝜄𝑜] ≤ 𝑨𝛽/2) = (1 − 𝛽)

  • Remember! Asymptotics are approximations!

45 / 56

slide-46
SLIDE 46

Finding the z values

  • 4
  • 2

2 4

  • 0.1

0.0 0.1 0.2 0.3 0.4 0.5 dnorm(x) 0.95 z = 1.96

  • z = -1.96
  • How do we fjgure out what 𝑨𝛽/2 will be? Need to fjnd the

values such that for 𝑎 ∼ 𝑂(0, 1): ℙ(−𝑨𝛽/2 ≤ 𝑎 ≤ 𝑨𝛽/2) = 1 − 𝛽

  • Intuitively, we want the 𝑨 values that puts 𝛽/2 in each of the

tails.

  • For example, with 𝛽 = 0.05 for a 95% confjdence interval, we

want the 𝑨 values that put 0.025 (2.5%) in each of the tails.

46 / 56

slide-47
SLIDE 47

Putuing it in the tails

  • How to get the 𝑨 values? Put 𝛽 probability in the tails:

ℙ({𝑎 < −𝑨𝛽/2} ∪ {𝑎 > 𝑨𝛽/2}) = 𝛽 ℙ(𝑎 < −𝑨𝛽/2) + ℙ(𝑎 > 𝑨𝛽/2) = 𝛽 (additivity) 2 × ℙ(𝑎 > 𝑨𝛽/2) = 𝛽 (symmetry) ℙ(𝑎 < 𝑨𝛽/2) = 1 − 𝛽/2

  • Find the 𝑨-value that puts probability 1 − 𝛽/2 below it:
  • 4
  • 2

2 4 0.0 0.1 0.2 0.3 0.4 0.5 dnorm(x) 0.975 z = ?

47 / 56

slide-48
SLIDE 48

Calculating z-values in R

  • Inverse of the CDF (quantile) of the standard Normal

evaluated at 1 − 𝛽/2!

  • Procedure for a 90% confjdence interval:
  • 1. Choose a value 𝛽 (0.1 for example) for a 100(1 − 𝛽)%

confjdence interval (90% in this case)

  • 2. Convert this to 1 − 𝛽/2 (0.95 in this case)
  • 3. Plug this value into qnorm() to fjnd 𝑨𝛽/2:

qnorm(0.95) ## [1] 1.64

  • 90% CI:

̂ 𝜄𝑜 ± 1.64 × ̂ se[ ̂ 𝜄𝑜]

48 / 56

slide-49
SLIDE 49

Question

  • Question What happens to the size of the confjdence interval

when we increase our confjdence, from say 95% to 99%? Do confjdence intervals get wider or shorter?

  • Answer Wider!
  • Decreases 𝛽⇝ increases 1 − 𝛽/2⇝ increases 𝑨𝛽/2

49 / 56

slide-50
SLIDE 50

4/ Where Do Estimators Come From?*

50 / 56

slide-51
SLIDE 51

Statistical models

  • A statistical model, 𝔾, is a set of distributions we will consider

that could have possibly generated the data.

  • A parametric model is a set that can be parameterized by a

fjnite number of parameters.

▶ Bernoulli distribution:

𝔾 = {𝑔 (𝑧; 𝑞) = 𝑧𝑞(1 − 𝑧)1−𝑞 ∶ 0 ≤ 𝑞 ≤ 1}

▶ Normal distribution:

𝔾 = {𝑔 (𝑧; 𝜈, 𝜏2) = 1 𝜏√2𝜌 exp {− 1 2𝜏2 (𝑧 − 𝜈)2} ∶ 𝜈 ∈ ℝ, 𝜏2 > 0}

  • Pros: easy to work with and explicit answers often exist

▶ Basis of maximum likelihood, Bayesian inference, etc.

  • Cons: inferences are model dependent

▶ ⇝ if our choice of model is wrong, our inferences might be

wrong

51 / 56

slide-52
SLIDE 52

Nonparametric models

  • A nonparametric model is a set that cannot be parameterized

by a fjnite set of parameters.

▶ All distributions with fjnite mean:

𝔾 = {𝑔 (𝑧) ∶ 𝔽[𝑍] < ∞}

▶ All distributions with fjnite mean and variance:

𝔾 = {𝑔 (𝑧) ∶ 𝔽[𝑍] < ∞, 𝕎[𝑍] < ∞}

  • Pros: no modeling assumptions beyond what we need.
  • Cons: can be diffjcult to work with and diffjcult to interpret.

52 / 56

slide-53
SLIDE 53

Where do estimators come from?

  • Parametric models: maximum likelihood, Bayesian estimation,

method of moments.

▶ Derive estimators from the assumed p.m.f./p.d.f. 𝑔 (𝑧). ▶ Gov 2001 and beyond.

  • Nonparametric models: plug-in estimation/analogy principle.

▶ Quantities of interest are usually made up of expectations:

𝔽[𝑕(𝑍)] for some function 𝑕()

▶ Analogy principle: replace any population expectations,

𝔽[𝑕(𝑍)] with sample means, 1

𝑜 ∑𝑜 𝑗=1 𝑕(𝑍𝑗)

53 / 56

slide-54
SLIDE 54

Plug-in estimators, examples

  • Expectation:

𝜈 = 𝔽[𝑍𝑗] ⇝ ̂ 𝜈 = 1 𝑜

𝑜

𝑗=1

𝑍𝑗

  • Variance:

𝜏2 = 𝔽[(𝑍𝑗 − 𝔽[𝑍𝑗])2] ⇝ 1 𝑜

𝑜

𝑗=1

(𝑍𝑗 − 𝑍)2

  • Covariance:

Cov[𝑌𝑗, 𝑍𝑗] = 𝔽[(𝑌𝑗−𝔽[𝑌𝑗])(𝑍𝑗−𝔽[𝑍𝑗])] ⇝ 1 𝑜

𝑜

𝑗=1

(𝑌𝑗−𝑌)(𝑍𝑗−𝑍)

54 / 56

slide-55
SLIDE 55

5/ Wrap up

55 / 56

slide-56
SLIDE 56

Wrap up

  • Generalized discussion of sample means to any estimator of

any parameter.

  • Unbiasedness, consistency, confjdence intervals, etc will be

with you for almost any statistical procedure moving forward.

  • These properties give us an expectation about how far away
  • ur estimates will be from the truth.
  • Next time: Testing hypotheses about parameters

56 / 56