Foundations of Computing II Lecture 24: Biased Estimation Stefano - - PowerPoint PPT Presentation

foundations of computing ii
SMART_READER_LITE
LIVE PREVIEW

Foundations of Computing II Lecture 24: Biased Estimation Stefano - - PowerPoint PPT Presentation

CSE 312 Foundations of Computing II Lecture 24: Biased Estimation Stefano Tessaro tessaro@cs.washington.edu 1 Parameter Estimation Workflow Parameter estimate Independent Distribution + % Algorithm samples # ' , , # * (#|%)


slide-1
SLIDE 1

CSE 312

Foundations of Computing II

Lecture 24: Biased Estimation

Stefano Tessaro

tessaro@cs.washington.edu

1

slide-2
SLIDE 2

Parameter Estimation – Workflow

2

Distribution ℙ(#|%) Independent samples #', … , #* from ℙ(#|%)

Algorithm

+ %

Parameter estimate

% = unknown parameter

Maximum Likelihood Estimation (MLE). Given data #', … . , #*, find + % = + %(#', … , #*) (“the MLE”) such that . #', … . , #* + % is maximized!

slide-3
SLIDE 3

Likelihood – Continuous Case

3

  • Definition. The likelihood of independent observations #', … . , #* is

. #', … . , #* % = /

01' *

2(#0|%)

slide-4
SLIDE 4

Example – Gaussian Parameters Normal outcomes #', … , #*, known variance 34 = 1

4

. #', … . , #* 6 = /

01' *

1 28 9: ;<:= >

4

Goal: MLE for 6 = expectation ln . #', … . , #* 6 = − B ln 28 2 − C

01' *

#0 − 6 4 2 = 1 28

*

/

01' *

9: ;<:= >

4

slide-5
SLIDE 5

Example – Gaussian Parameters

5

Goal: estimate 6 = expectation

ln . #', … . , #* 6 = − B ln 28 2 − C

01' *

#0 − 6 4 2

D D6 ln . #', … . , #* 6 = C

01' *

(#0 − 6) = C

01' *

#0 − B6 = 0 Note:

F F= ;<:= > 4

=

' 4 ⋅ 2 ⋅ #0 − 6 ⋅ −1 = 6 − #0

H 6 = ∑0

* #0

B In other words, MLE is the population mean of the data.

slide-6
SLIDE 6

0.1 0.2 0.3 0.4 0.5

6

−1 −2 −3 −4 1 2 3 4 5 6 B samples #', … , #* ∈ ℝ from Gaussian P(6, 34). Most likely 6 and 34?

slide-7
SLIDE 7

Two-parameter optimization

7

Normal outcomes #', … , #*

Goal: estimate %' = µ = expectation and %4 = 34 = variance . #', … . , #* %', %4 = 1 28%4

*

/

01' *

9

: ;<:RS > 4R>

ln . #', … . , #* %', %4 = = −B ln 28 %4 2 − C

01' *

#0 − %' 4 2%4

slide-8
SLIDE 8

Two-parameter estimation

ln . #', … . , #* %', %4 = − ln 28 %4 2 − C

01' *

#0 − %'

4

2%4 We need to find a solution + %', + %4 to D D%' ln . #', … . , #* %', %4 = 0 D D%4 ln . #', … . , #* %', %4 = 0

8

slide-9
SLIDE 9

MLE for Expectation

9

ln . #', … . , #* %', %4 = −B ln 28 %4 2 − C

01' *

#0 − %' 4 2%4

D D%' ln . #', … . , #* %', %4 = 1 %4 C

*

(#0 − %') = 0 + %' = ∑0

* #0

B In other words, MLE of expectation is (again) the population mean of the data, regardless of %4 What about the variance?

slide-10
SLIDE 10

MLE for Variance

10

ln . #', … . , #* T %', %4 = −B ln 28 %4 2 − C

01' *

#0 − T %'

4

2%4

D D%4 ln . #', … . , #* + %', %4 = + %U = 1 B C

01' *

#0 − + %'

4

In other words, MLE of variance is the population variance of the data. − B 2%4 + 1 2%4

4 C 01' *

#0 − + %'

4

= −B ln 28 2 − B ln %4 2 − 1 2%4 C

01' *

#0 − T %'

4

= 0

slide-11
SLIDE 11

So far

  • We have decided that MLE estimators are always good.
  • But why is it really the case?

– Next: A natural property not always satisfied by MLE – And why MLE is nonetheless “good”

11

slide-12
SLIDE 12

When is an estimator good?

12

  • Definition. An estimator is unbiased if for all B ≥ 1,

X Θ* = %.

Distribution ℙ(#|%) samples Z', … , Z* from ℙ(#|%)

Algorithm

Θ*

Parameter estimate

% = unknown parameter

slide-13
SLIDE 13

Example – Coin Flips Coin-flip outcomes #', … , #*, with B[ heads, B\ tails

13

Recall: + % =

*] *

  • Fact. +

% is unbiased Let ^', … , ^* be s.t. ^0 = 1 iff #0 = _ (and 0 otherwise) In particular ℙ ^0 = 1 = % ` Θ = 1 B C

01' *

^0 X(` Θ) = 1 B C

01' *

X ^0 = 1 B B ⋅ % = %

slide-14
SLIDE 14

Notes

  • Unbiasedness is not the ultimate goal either

– Consider estimator which sets + % = 1 if first coin toss is heads, and + % = 0 otherwise – regardless of number of samples. – ℙ a* = 1 = % – X a* = %

  • Generally, we would like instead ℙ a* ≈ % with high

probability as B → ∞.

– Will discuss this on Monday. – Unbiasedness is a step towards this.

14

slide-15
SLIDE 15

Example – Gaussian

15

Normal outcomes Z', … , Z* iid according to P(6, 34) ` Θ4 = 1 B C

01' *

Z0 − ` Θ'

4

` Θ' = ∑01'

*

Z0 B

slide-16
SLIDE 16

Example – Gaussian

16

Normal outcomes Z', … , Z* iid according to P(6, 34) ` Θ' = ∑0

* Z0

B X(` Θ') = ∑01'

*

X(Z0) B = B ⋅ 6 B = 6 Therefore: Unbiased!

slide-17
SLIDE 17

Example – Gaussian

17

Normal outcomes Z', … , Z* iid according to P(6, 34) X(` Θ4) = 0 ≠ 34 ` Θ4 = 1 B C

01' *

Z0 − ` Θ'

4

Example: B = 1 ` Θ' = Z' 1 = Z' ` Θ4 = 1 1 Z' − Z' 4 = 0 Assume: 34 > 0 Therefore: Biased!

Next time: Unbiased estimator proof + more intuition + confidence intervals

` Θ4 = 1 B − 1 C

01' *

Z0 − ` Θ'

4

Unbiased!

slide-18
SLIDE 18

Example – Consistency

18

Normal outcomes Z', … , Z* iid according to P(6, 34) ` Θ4 = 1 B C

01' *

Z0 − ` Θ'

4

Assume: 34 > 0 ` Θ4 = 1 B − 1 C

01' *

Z0 − ` Θ'

4

Sample variance – Unbiased!

Left ` Θ4 converges to same value as right ` Θ4, i.e., 34, as B → ∞.

Population variance – Biased!

Left ` Θ4 is “consistent”

slide-19
SLIDE 19

Consistent Estimators & MLE

19

  • Definition. An estimator is unbiased if X Θ* = % for all B ≥ 1.

Distribution ℙ(#|%) samples Z', … , Z* from ℙ(#|%)

Algorithm

Θ*

Parameter estimate

% = unknown parameter

  • Definition. An estimator is consistent if lim

*→i X Θj = %.

  • Theorem. MLE estimators are consistent.

(But not necessarily unbiased)