CSE 312 Autumn 2012 More on parameter estimation Bias; and - - PowerPoint PPT Presentation

cse 312
SMART_READER_LITE
LIVE PREVIEW

CSE 312 Autumn 2012 More on parameter estimation Bias; and - - PowerPoint PPT Presentation

CSE 312 Autumn 2012 More on parameter estimation Bias; and Confidence Intervals 53 Bias 54 Recall Likelihood Function P( HHTHH | ): Probability of HHTHH, 0.08 given P(H) = : max 0.06 4 (1- ) P( HHTHH | Theta) 0.2


slide-1
SLIDE 1

53

CSE 312

Autumn 2012

More on parameter estimation – Bias; and Confidence Intervals

slide-2
SLIDE 2

Bias

54

slide-3
SLIDE 3

Likelihood Function

P( HHTHH | θ ): Probability of HHTHH, given P(H) = θ: θ θ4(1-θ) 0.2 0.0013 0.5 0.0313 0.8 0.0819 0.95 0.0407

0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.02 0.04 0.06 0.08 Theta P( HHTHH | Theta)

max

Recall

slide-4
SLIDE 4

56

(Also verify it’s max, not min, & not better on boundary)

Example 1

n coin flips, x1, x2, ..., xn; n0 tails, n1 heads, n0 + n1 = n; θ = probability of heads

Observed fraction of successes in sample is MLE of success probability in population

dL/dθ = 0

Recall

slide-5
SLIDE 5

(un-) Bias

57

A desirable property: An estimator Y of a parameter θ is an unbiased estimator if E[Y] = θ For coin ex. above, MLE is unbiased: Y = fraction of heads = (Σ1≤i≤nXi)/n, (Xi = indicator for heads in ith trial) so E[Y] = (Σ1≤i≤n E[Xi])/n = n θ/n = θ

by linearity of expectation

slide-6
SLIDE 6

Are all unbiased estimators equally good?

No! E.g., “Ignore all but 1st flip; if it was H, let Y’ = 1; else Y’ = 0” Exercise: show this is unbiased Exercise: if observed data has at least one H and at least one T, what is the likelihood of the data given the model with θ = Y’ ?

58

slide-7
SLIDE 7

59

Ex 3:

xi ∼ N(µ, σ2), µ, σ2 both unknown

  • 0.4
  • 0.2
0.2 0.4 0.2 0.4 0.6 0.8 1 2 3
  • 0.4
  • 0.2
0.2

θ1 θ2 Sample mean is MLE of population mean, again

In general, a problem like this results in 2 equations in 2 unknowns. Easy in this case, since θ2 drops out of the ∂/∂θ1 = 0 equation

Likelihood surface

Recall

slide-8
SLIDE 8

60

  • Ex. 3, (cont.)

ln L(x1, x2, . . . , xn|θ1, θ2) = ⌅

1≤i≤n

−1 2 ln 2πθ2 − (xi − θ1)2 2θ2

∂ ∂θ2 ln L(x1, x2, . . . , xn|θ1, θ2)

= ⌅

1≤i≤n

−1 2 2π 2πθ2 + (xi − θ1)2 2θ2

2

= 0 ˆ θ2 = ⇤

1≤i≤n(xi − ˆ

θ1)2⇥ /n = ¯ s2

Sample variance is MLE of population variance

Recall

slide-9
SLIDE 9

Bias? if Y = (Σ1≤i≤n Xi)/n is the sample mean then E[Y] = (Σ1≤i≤n E[Xi])/n = n μ/n = μ so the MLE is an unbiased estimator of population mean Similarly, (Σ1≤i≤n (Xi-μ)2)/n is an unbiased estimator of σ2. Unfortunately, if μ is unknown, estimated from the same data, as above, is a consistent, but biased estimate

  • f population variance. (An example of overfitting.) Unbiased

estimate (B&T p467):

One Moral: MLE is a great idea, but not a magic bullet

61

  • Ex. 3, (cont.)

Roughly, limn→∞ = correct

slide-10
SLIDE 10

Biased?

  • Yes. Why? As an extreme, think about n = 1.

Then θ2 = 0; probably an underestimate! Also, consider n = 2. Then θ1 is exactly between the two sample points, the position that exactly minimizes the expression for θ2. Any other choices for θ1, θ2 make the likelihood of the observed data slightly lower. But it’s actually pretty unlikely that two sample points would be chosen exactly equidistant from, and on

  • pposite sides of the mean, so the MLE θ2

systematically underestimates θ2. (But not by much, & bias shrinks with sample size.)

More on Bias of θ2

62

ˆ

ˆ

θ1

ˆ

θ2

ˆ

θ2

slide-11
SLIDE 11

Confidence Intervals

63

slide-12
SLIDE 12

A Problem With Point Estimates

Think again about estimating the mean of a normal distribution. Sample X1, X2, …, Xn We showed sample mean Yn = (Σ1≤i≤n Xi)/n is an unbiased (and consistent) estimator of the population

  • mean. But with probability 1, it’s wrong!

Can we say anything about how wrong? E.g., could I find a value Δ s.t. I’m 95% confident that the true mean is within ±Δ of my estimate?

64

slide-13
SLIDE 13

Yn = (Σ1≤i≤n Xi)/n is a random variable It has a mean and a variance Assuming Xi’s are i.i.d. normal, mean = μ, variance = σ2, Var(Yn) = Var((Σ1≤i≤n Xi)/n) = (1/n2) Σ1≤i≤n Var(Xi) = (1/n2)(n σ2) = σ2/n So, Pr((√n)|Yn-μ|/σ < z) = 2(1- Φ(z)) , (z >0) E.g., Pr((√n)|Yn-μ|/σ < 1.96) ≈ 95% I.e., true μ within ±1.96σ/√n of estimate ~ 95% of time

65