Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin - - PowerPoint PPT Presentation

maximum maximum likelihood estimation
SMART_READER_LITE
LIVE PREVIEW

Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin - - PowerPoint PPT Presentation

Learning Probabilistic Graphical Parameter Estimation Models Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin Example P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- sampled IID from P sampled IID from P


slide-1
SLIDE 1

Maximum

Probabilistic Graphical Models

Parameter Estimation Learning

Daphne Koller

Maximum Likelihood Estimation

slide-2
SLIDE 2

Biased Coin Example

P is a Bernoulli distribution: P(X=1) = θ, P(X=0) = 1-θ sampled IID from P

Daphne Koller

  • Tosses are independent of each other
  • Tosses are sampled from the same

distribution (identically distributed)

sampled IID from P

slide-3
SLIDE 3

IID as a PGM

θ θ

Daphne Koller

X

Data m

X[1] X[M]

. . .

⎩ ⎨ ⎧ = − = =

1

] [ 1 ] [ ) | ] [ ( x m x x m x m x P θ θ θ

slide-4
SLIDE 4

Maximum Likelihood Estimation

  • Goal: find θ∈[0,1] that predicts D well
  • Prediction quality = likelihood of D given θ

= =

M

m x P D P D L ) | ] [ ( ) | ( ) : ( θ θ θ

Daphne Koller

= m

m x P D P D L

1

) | ] [ ( ) | ( ) : ( θ θ θ

( )

H H T T H L , , , , : θ

0.2 0.4 0.6 0.8 1 θ

L(D:θ)

slide-5
SLIDE 5

Maximum Likelihood Estimator

  • Observations: MH heads and MT tails
  • Find θ maximizing likelihood

T H

M M T H

M M L ) 1 ( ) , : ( θ θ θ − =

Daphne Koller

  • Equivalent to maximizing log-likelihood
  • Differentiating the log-likelihood and

solving for θ:

) 1 log( log ) , : ( θ θ θ − + =

T H T H

M M M M l

T H H

M M M + = θˆ

slide-6
SLIDE 6

Sufficient Statistics

  • For computing θ in the coin toss example,

we only needed MH and MT since

Daphne Koller

  • MH and MT are sufficient statistics

T H

M M

D L ) 1 ( ) : ( θ θ θ − =

slide-7
SLIDE 7

Sufficient Statistics

  • A function s(D) is a sufficient statistic from

instances to a vector in ℜk if for any two datasets D and D’ and any θ∈Θ we have

) ' : ( ) : ( ]) [ ( ]) [ ( D L D L i x s i x s θ θ = ⇒ = ∑

Daphne Koller

) : ( ) : ( ]) [ ( ]) [ (

' ] [ ] [

D L D L i x s i x s

D i x D i x

θ θ = ⇒ = ∑

∈ ∈

Datasets Statistics

slide-8
SLIDE 8

Sufficient Statistic for Multinomial

  • For a dataset D over variable X with k values,

the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D

Daphne Koller

=

=

k i M i

i

D L

1

) : ( θ θ

  • Sufficient statistic s(x) is a tuple of dimension k

– s(xi)=(0,...0,1,0,...,0)

i

slide-9
SLIDE 9

Sufficient Statistic for Gaussian

  • Gaussian distribution:
  • Rewrite as

2

2 1 2

2 1 ) ( ) , ( ~ ) (

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − −

=

σ μ

σ π σ μ

x

e X p N X P if

Daphne Koller

Rewrite as

  • Sufficient statistics for Gaussian:

s(x)=<1,x,x2>

p(X) = 1 2πσ exp −x2 1 2σ 2 + x μ σ 2 − μ 2 2σ 2 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

slide-10
SLIDE 10

Maximum Likelihood Estimation

  • MLE Principle: Choose θ to maximize L(D:Θ)
  • Multinomial MLE:

=

i i

M ˆ θ

Daphne Koller

  • Multinomial MLE:
  • Gaussian MLE:

=

m

m x M ] [ 1 μ )

=

=

m i i

M

1

θ

− =

m

m x M

2

) ˆ ] [ ( 1 ˆ μ σ

slide-11
SLIDE 11

Summary

  • Maximum likelihood estimation is a simple

principle for parameter selection given D Lik lih d functi n uniqu l d t min d

Daphne Koller

  • Likelihood function uniquely determined

by sufficient statistics that summarize D

  • MLE has closed form solution for many

parametric distributions