CSE 312 Autumn 2012 More on parameter estimation Bias; and - - PowerPoint PPT Presentation

▶

Jan 25, 2023 289 likes •432 views

CSE 312 Autumn 2012 More on parameter estimation Bias; and Confidence Intervals 53 Bias 54 Recall Likelihood Function P( HHTHH | ): Probability of HHTHH, 0.08 given P(H) = : max 0.06 4 (1- ) P( HHTHH | Theta) 0.2

SLIDE 1

CSE 312

Autumn 2012

More on parameter estimation – Bias; and Confidence Intervals

SLIDE 2

Bias

SLIDE 3

Likelihood Function

P( HHTHH | θ ): Probability of HHTHH, given P(H) = θ: θ θ4(1-θ) 0.2 0.0013 0.5 0.0313 0.8 0.0819 0.95 0.0407

0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.02 0.04 0.06 0.08 Theta P( HHTHH | Theta)

max

Recall

SLIDE 4

(Also verify it’s max, not min, & not better on boundary)

Example 1

n coin flips, x1, x2, ..., xn; n0 tails, n1 heads, n0 + n1 = n; θ = probability of heads

Observed fraction of successes in sample is MLE of success probability in population

dL/dθ = 0

Recall

SLIDE 5

(un-) Bias

A desirable property: An estimator Y of a parameter θ is an unbiased estimator if E[Y] = θ For coin ex. above, MLE is unbiased: Y = fraction of heads = (Σ1≤i≤nXi)/n, (Xi = indicator for heads in ith trial) so E[Y] = (Σ1≤i≤n E[Xi])/n = n θ/n = θ

by linearity of expectation

SLIDE 6

Are all unbiased estimators equally good?

No! E.g., “Ignore all but 1st flip; if it was H, let Y’ = 1; else Y’ = 0” Exercise: show this is unbiased Exercise: if observed data has at least one H and at least one T, what is the likelihood of the data given the model with θ = Y’ ?

SLIDE 7

Ex 3:

xi ∼ N(µ, σ2), µ, σ2 both unknown

0.2 0.4 0.2 0.4 0.6 0.8 1 2 3

0.2

θ1 θ2 Sample mean is MLE of population mean, again

In general, a problem like this results in 2 equations in 2 unknowns. Easy in this case, since θ2 drops out of the ∂/∂θ1 = 0 equation

Likelihood surface

Recall

SLIDE 8

Ex. 3, (cont.)

ln L(x1, x2, . . . , xn|θ1, θ2) = ⌅

1≤i≤n

−1 2 ln 2πθ2 − (xi − θ1)2 2θ2

∂ ∂θ2 ln L(x1, x2, . . . , xn|θ1, θ2)

= ⌅

1≤i≤n

−1 2 2π 2πθ2 + (xi − θ1)2 2θ2

= 0 ˆ θ2 = ⇤

1≤i≤n(xi − ˆ

θ1)2⇥ /n = ¯ s2

Sample variance is MLE of population variance

Recall

SLIDE 9

Bias? if Y = (Σ1≤i≤n Xi)/n is the sample mean then E[Y] = (Σ1≤i≤n E[Xi])/n = n μ/n = μ so the MLE is an unbiased estimator of population mean Similarly, (Σ1≤i≤n (Xi-μ)2)/n is an unbiased estimator of σ2. Unfortunately, if μ is unknown, estimated from the same data, as above, is a consistent, but biased estimate

f population variance. (An example of overfitting.) Unbiased

estimate (B&T p467):

One Moral: MLE is a great idea, but not a magic bullet

Ex. 3, (cont.)

Roughly, limn→∞ = correct

SLIDE 10

Biased?

Yes. Why? As an extreme, think about n = 1.

Then θ2 = 0; probably an underestimate! Also, consider n = 2. Then θ1 is exactly between the two sample points, the position that exactly minimizes the expression for θ2. Any other choices for θ1, θ2 make the likelihood of the observed data slightly lower. But it’s actually pretty unlikely that two sample points would be chosen exactly equidistant from, and on

pposite sides of the mean, so the MLE θ2

systematically underestimates θ2. (But not by much, & bias shrinks with sample size.)

ˆ

θ1

ˆ

θ2

ˆ

θ2

SLIDE 11

Confidence Intervals

SLIDE 12

A Problem With Point Estimates

Think again about estimating the mean of a normal distribution. Sample X1, X2, …, Xn We showed sample mean Yn = (Σ1≤i≤n Xi)/n is an unbiased (and consistent) estimator of the population

mean. But with probability 1, it’s wrong!

Can we say anything about how wrong? E.g., could I find a value Δ s.t. I’m 95% confident that the true mean is within ±Δ of my estimate?

SLIDE 13

Yn = (Σ1≤i≤n Xi)/n is a random variable It has a mean and a variance Assuming Xi’s are i.i.d. normal, mean = μ, variance = σ2, Var(Yn) = Var((Σ1≤i≤n Xi)/n) = (1/n2) Σ1≤i≤n Var(Xi) = (1/n2)(n σ2) = σ2/n So, Pr((√n)|Yn-μ|/σ < z) = 2(1- Φ(z)) , (z >0) E.g., Pr((√n)|Yn-μ|/σ < 1.96) ≈ 95% I.e., true μ within ±1.96σ/√n of estimate ~ 95% of time

CSE 312

Autumn 2012

More on parameter estimation – Bias; and Confidence Intervals

Bias

Likelihood Function

Recall

Example 1

n coin flips, x1, x2, ..., xn; n0 tails, n1 heads, n0 + n1 = n; θ = probability of heads

Recall

(un-) Bias

A desirable property: An estimator Y of a parameter θ is an unbiased estimator if E[Y] = θ For coin ex. above, MLE is unbiased: Y = fraction of heads = (Σ1≤i≤nXi)/n, (Xi = indicator for heads in ith trial) so E[Y] = (Σ1≤i≤n E[Xi])/n = n θ/n = θ

Are all unbiased estimators equally good?

No! E.g., “Ignore all but 1st flip; if it was H, let Y’ = 1; else Y’ = 0” Exercise: show this is unbiased Exercise: if observed data has at least one H and at least one T, what is the likelihood of the data given the model with θ = Y’ ?

Ex 3:

xi ∼ N(µ, σ2), µ, σ2 both unknown

θ1 θ2 Sample mean is MLE of population mean, again

Recall

Sample variance is MLE of population variance

Recall

More on Bias of θ2

ˆ

ˆ

ˆ

ˆ

Confidence Intervals

A Problem With Point Estimates