Maximum
Probabilistic Graphical Models
Parameter Estimation Learning
Daphne Koller
Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin - - PowerPoint PPT Presentation
Learning Probabilistic Graphical Parameter Estimation Models Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin Example P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- sampled IID from P sampled IID from P
Probabilistic Graphical Models
Parameter Estimation Learning
Daphne Koller
P is a Bernoulli distribution: P(X=1) = θ, P(X=0) = 1-θ sampled IID from P
Daphne Koller
sampled IID from P
θ θ
Daphne Koller
X
Data m
X[1] X[M]
. . .
⎩ ⎨ ⎧ = − = =
1
] [ 1 ] [ ) | ] [ ( x m x x m x m x P θ θ θ
∏
= =
M
m x P D P D L ) | ] [ ( ) | ( ) : ( θ θ θ
Daphne Koller
∏
= m
m x P D P D L
1
) | ] [ ( ) | ( ) : ( θ θ θ
( )
H H T T H L , , , , : θ
0.2 0.4 0.6 0.8 1 θ
L(D:θ)
T H
M M T H
M M L ) 1 ( ) , : ( θ θ θ − =
Daphne Koller
) 1 log( log ) , : ( θ θ θ − + =
T H T H
M M M M l
T H H
M M M + = θˆ
Daphne Koller
T H
M M
D L ) 1 ( ) : ( θ θ θ − =
instances to a vector in ℜk if for any two datasets D and D’ and any θ∈Θ we have
) ' : ( ) : ( ]) [ ( ]) [ ( D L D L i x s i x s θ θ = ⇒ = ∑
Daphne Koller
) : ( ) : ( ]) [ ( ]) [ (
' ] [ ] [
D L D L i x s i x s
D i x D i x
θ θ = ⇒ = ∑
∈ ∈
Datasets Statistics
the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D
Daphne Koller
=
=
k i M i
i
D L
1
) : ( θ θ
– s(xi)=(0,...0,1,0,...,0)
i
2
2 1 2
2 1 ) ( ) , ( ~ ) (
⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − −
=
σ μ
σ π σ μ
x
e X p N X P if
Daphne Koller
p(X) = 1 2πσ exp −x2 1 2σ 2 + x μ σ 2 − μ 2 2σ 2 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟
=
i i
M ˆ θ
Daphne Koller
∑
=
m
m x M ] [ 1 μ )
∑
=
=
m i i
M
1
θ
∑
− =
m
m x M
2
) ˆ ] [ ( 1 ˆ μ σ
Daphne Koller