Graphical Models (Lecture 1 - Introduction)
Tibério Caetano
tiberiocaetano.com Statistical Machine Learning Group NICTA Canberra
LLSS, Canberra, 2009
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 1 / 17 nicta-logo
Graphical Models (Lecture 1 - Introduction) Tibrio Caetano - - PowerPoint PPT Presentation
Graphical Models (Lecture 1 - Introduction) Tibrio Caetano tiberiocaetano.com Statistical Machine Learning Group NICTA Canberra LLSS, Canberra, 2009 nicta-logo Tibrio Caetano: Graphical Models (Lecture 1 - Introduction) 1 / 17 Material
tiberiocaetano.com Statistical Machine Learning Group NICTA Canberra
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 1 / 17 nicta-logo
Many good books Chris Bishop’s book‘“Pattern Recognition and Machine Learning” (Graphical Models chapter available from his webpage in pdf format, as well as all the figures – many used here in these slides!) Judea Pearl’s “Probabilistic Reasoning in Intelligent Systems” Stephen Lauritzen’s “Graphical Models” · · · Unpublished material Michael Jordan’s unpublished book “An Introduction to Probabilistic Graphical Models” Koller and Friedman’s unpublished book “Structured Probabilistic Models” Videos Sam Roweis’ videos on videolectures.net (Excellent!)
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 2 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 3 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 4 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 5 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 6 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 7 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 8 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 9 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 10 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 11 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 12 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 13 / 17 nicta-logo
1 σ √ 2π exp[−(x−µ)2/(2σ2)]
µ−σ p(x)dx
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 14 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 15 / 17 nicta-logo
x2,...,xN p(x1, . . . , xN)
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 16 / 17 nicta-logo
Tibério Caetano: Graphical Models (Lecture 1 - Introduction) 17 / 17 nicta-logo
tiberiocaetano.com Statistical Machine Learning Group NICTA Canberra
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 1 / 22 nicta-logo
A - The random vector comprised of all variables other
A for
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 2 / 22 nicta-logo
x˜
A∈X˜ A p(xA, x˜
A)
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 3 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 4 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 5 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 6 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 7 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 8 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 9 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 10 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 11 / 22 nicta-logo
x1,x3 p(x1, x2, x3)
x1,x3 p(x1, x2, x3), O(|X1||X2||X3|)
x1,x3 p(x1, x2, x3) = x1,x3 p(x1|x2)p(x2|x3)p(x3)
x3 p(x2|x3)p(x3) x1 p(x1|x2), O(|X2||X3|)
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 12 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 13 / 22 nicta-logo
i=1 p(xi|x<i)
i=1 p(xπi|x<πi).
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 14 / 22 nicta-logo
i=1 p(xπi|x<πi).
i=1 p(xπi|xpaπi )
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 15 / 22 nicta-logo
Directed Graphical Models:
X1 X2 X3 X5 X6 X4
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 16 / 22 nicta-logo
Directed Graphical Models:
X1 X2 X3 X5 X6 X4
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 17 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 18 / 22 nicta-logo
i p(xi|xpai)
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 19 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 20 / 22 nicta-logo
i=1 p(xi|x<i)
i=1 p(xπi|x<πi).
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 21 / 22 nicta-logo
Tibério Caetano: Graphical Models (Lecture 2 - Basics) 22 / 22 nicta-logo
tiberiocaetano.com Statistical Machine Learning Group NICTA Canberra
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 1 / 19 nicta-logo
i p(xi|x<i)
i p(xi|xpa(i))
i p(xπi|xpa(πi)) with πi > k, where
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 2 / 19 nicta-logo
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 3 / 19 nicta-logo
i=1 p(xπi|x<πi))
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 4 / 19 nicta-logo
a c b
c p(abc) = c p(a)p(c|a)p(b|c) =
c p(b|c)p(c|a) = p(a)p(b|a) = p(a)p(b)
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 5 / 19 nicta-logo
a c b
p(c) = p(a)p(c|a)p(b|c) p(c)
p(c)
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 6 / 19 nicta-logo
a c b
Bayes
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 7 / 19 nicta-logo
c a b
c p(abc) = c p(c)p(a|c)p(b|c) =
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 8 / 19 nicta-logo
c a b
p(c) = p(c)p(a|c)p(b|c) p(c)
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 9 / 19 nicta-logo
c a b
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 10 / 19 nicta-logo
c a b
c p(abc) = c p(a)p(b)p(c|ab) = p(a)p(b)
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 11 / 19 nicta-logo
c a b
p(c) = p(a)p(b)p(c|ab) p(c)
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 12 / 19 nicta-logo
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 13 / 19 nicta-logo
f e b a c f e b a c
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 14 / 19 nicta-logo
Directed Graphical Models:
X1 X2 X3 X5 X6 X4
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 15 / 19 nicta-logo
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 16 / 19 nicta-logo
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 17 / 19 nicta-logo
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 18 / 19 nicta-logo
Tibério Caetano: Graphical Models (Lecture 3 -Bayesian Networks) 19 / 19 nicta-logo
tiberiocaetano.com Statistical Machine Learning Group NICTA Canberra
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 1 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 2 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 3 / 18 nicta-logo
A C B
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 4 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 5 / 18 nicta-logo
x1 x2 x3 x4
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 6 / 18 nicta-logo
Z
x p(x) = 1.
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 7 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 8 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 9 / 18 nicta-logo
Möbius Inversion: for C ⊆ B ⊆ A ⊆ S and F : P(S) → R: F(A) =
B:B⊆A
Define F = φ = log p and compute the inner sum for the case where B is not a clique (i.e. ∃X1, X2 not connected in B). Then CI φ(X1, C, X2) + φ(C) = φ(C, X1) + φ(C, X2) holds and
(−1)|B|−|C|φ(C) =
∈C
(−1)|B|−|C|φ(C) +
∈C
(−1)|B|−|C∪X1|φ(C, X1) +
∈C
(−1)|B|−|C∪X2|φ(C, X2) +
∈C
(−1)|B|−|X1∪C∪X2|φ(X1, C, X2) = =
∈C
(−1)|B|−|C| [φ(X1, C, X2) + φ(C) − φ(C, X1) − φ(C, X2)]
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 10 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 11 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 12 / 18 nicta-logo
x
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 13 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 14 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 15 / 18 nicta-logo
P U D
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 16 / 18 nicta-logo
xi
i) = p(xi|xA)
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 17 / 18 nicta-logo
Tibério Caetano: Graphical Models (Lecture 4 - Markov Random Fields) 18 / 18 nicta-logo
tiberiocaetano.com Statistical Machine Learning Group NICTA Canberra
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 1 / 29 nicta-logo
i p(xi|pai)
Z
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 2 / 29 nicta-logo
p(x) = 1
Z
N−1
i=1 ψ(xi, xi+1) (Exercise: which graph is this?)
p(x1) =
x2,...,xN 1 Z
N−1
i=1 ψ(xi, xi+1)
p(x1) = 1
Z
x3 ψ(x2, x3) · · · xN ψ(xN−1, xN)
O(N
i=1 |Xi|) vs. O(i=N−1 i=1
|Xi||Xi+1|))
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 3 / 29 nicta-logo
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 4 / 29 nicta-logo
1
X
2
X
3
X X 4 X 5 X6
❦ ✚ ✮ ★ ✥ ✤ ❤ ✽ ✚ ✏ ❢ ✥ ★ ✗ ✩ ✭ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✢ ✫ ✮ ✩ ✥ ✚ ✘ ✙ ✬ ★ ✗ ✧ ✤ ✥ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✩ ✥ ✧ ✤ ✥ ✚ ✗ ✮ ❛ ✚ ✔ ✛ ✔1
X
2
X
3
X X 4 X 5
❦ ✚ ✮ ★ ✥ ✤ ❤ ✽ ✚ ✏ ❢ ✥ ★ ✗ ✩ ✭ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✢ ✫ ✮ ✩ ✥ ✚ ✘ ✙ ✬ ★ ✗ ✧ ✤ ✥ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✩ ✥ ✧ ✤ ✥ ✚ ✗ ✮ ❛ ✚ ✔ ✛ ✔1
X
2
X
3
X X 4
❦ ✚ ✮ ★ ✥ ✤ ❤ ✽ ✚ ✏ ❢ ✥ ★ ✗ ✩ ✭ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✢ ✫ ✮ ✩ ✥ ✚ ✘ ✙ ✬ ★ ✗ ✧ ✤ ✥ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✩ ✥ ✧ ✤ ✥ ✚ ✗ ✮ ❛ ✚ ✔ ✛ ✔1
X
2
X
3
X
❦ ✚ ✮ ★ ✥ ✤ ❤ ✽ ✚ ✏ ❢ ✥ ★ ✗ ✩ ✭ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✢ ✫ ✮ ✩ ✥ ✚ ✘ ✙ ✬ ★ ✗ ✧ ✤ ✥ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✩ ✥ ✧ ✤ ✥ ✚ ✗ ✮ ❛ ✚ ✔ ✛ ✔1
X
2
X
❦ ✚ ✮ ★ ✥ ✤ ❤ ✽ ✚ ✏ ❢ ✥ ★ ✗ ✩ ✭ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✢ ✫ ✮ ✩ ✥ ✚ ✘ ✙ ✬ ★ ✗ ✧ ✤ ✥ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✩ ✥ ✧ ✤ ✥ ✚ ✗ ✮ ❛ ✚ ✔ ✛ ✔1
X
❦ ✚ ✮ ★ ✥ ✤ ❤ ✽ ✚ ✏ ❢ ✥ ★ ✗ ✩ ✭ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✢ ✫ ✮ ✩ ✥ ✚ ✘ ✙ ✬ ★ ✗ ✧ ✤ ✥ ✘ ✙ ✤ ✤ ✫ ✚ ✬ ✚ ✗ ✢ ✘ ✚ ✩ ✗ ✩ ✥ ✧ ✤ ✥ ✚ ✗ ✮ ❛ ✚ ✔ ✛ ✔Compute p(x1) with elimination order (6, 5, 4, 3, 2) p(x1) = Z −1 P
x2,...,x6 ψ(x1, x2)ψ(x1, x3)ψ(x3, x5)ψ(x2, x5, x6)ψ(x2, x4)
p(x1) = Z −1 P
x2 ψ(x1, x2) P x3 ψ(x1, x3) P x4 ψ(x2, x4)
X
x5
ψ(x3, x5) X
x6
ψ(x2, x5, x6) | {z }
m6(x2,x5)
| {z }
m5(x2,x3)
p(x1) = Z −1sumx2 ψ(x1, x2) P
x3 ψ(x1, x3)m5(x2, x3)
X
x4
ψ(x2, x4) | {z }
m4(x2)
p(x1) = Z −1 X
x2
ψ(x1, x2)m4(x2) X
x3
ψ(x1, x3)m5(x2, x3) | {z }
m3(x1,x2)
| {z }
m2(x1)
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 5 / 29 nicta-logo
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 6 / 29 nicta-logo
x1 xn−1 xn xn+1 xN µα(xn−1) µα(xn) µβ(xn) µβ(xn+1)
x<n,x>n 1 Z
i=1 ψ(xi, xi+1)
Z
i=1 ψ(xi, xi+1) N−1 i=n ψ(xi, xi+1)
Z
i=1 ψ(xi, xi+1)
i=n ψ(xi, xi+1)
Z
n−1
i=1
|Xi||Xi+1|)))
N−1
i=n |Xi||Xi+1|)) Tibério Caetano: Graphical Models (Lecture 5 - Inference) 7 / 29 nicta-logo
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 8 / 29 nicta-logo
xi−1 ψ(xi−1, xi)mi−2(xi−1)
xi+1 ψ(xi, xi+1)mi+2(x + 1)
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 9 / 29 nicta-logo
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 10 / 29 nicta-logo
xj ψ(xj, xi) k:k∼j,k=i mk(xj)
j:j∼i mj(xi)
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 11 / 29 nicta-logo
x˜
i p(x) for all i, then x∗
i = argmaxxi p(xi) and then
1, x∗ 2, . . . , x∗ N)
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 12 / 29 nicta-logo
1, x∗ 2) = 0 (where x∗ i = argmaxxi p(xi))
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 13 / 29 nicta-logo
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 14 / 29 nicta-logo
i = maxxi
i = argmaxxi p∗ i
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 15 / 29 nicta-logo
X1 X2 X3 X5 X6 X4
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 16 / 29 nicta-logo
X1 X2 X3 X5 X6 X4 How to compute ) | (
6 1 x
x p
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 17 / 29 nicta-logo
) ( 1 ) , ( ) , ( ) ( ) , ( 1 ) , ( ) , ( ) , ( ) ( ) , ( 1 ) , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , , ( ) , ( ) , ( ) , ( ) , ( 1 ) , (
1 2 6 1 2 1 3 2 4 2 1 6 1 3 2 5 3 1 2 4 2 1 6 1 4 2 3 2 5 3 1 2 1 6 1 5 2 6 5 3 4 2 3 1 2 1 6 1 6 6 6 5 2 5 3 4 2 3 1 2 1 6 1 6 6 6 5 2 5 3 4 2 3 1 2 1 6 1
2 2 3 2 3 4 2 3 4 5 2 3 4 5 6 2 3 4 5 6
x m Z x x p x x m x m x x Z x x p x x m x x x m x x Z x x p x x x x m x x x x Z x x p x x m x x x x x x x x Z x x p x x x x x x x x x x x x x Z x x p x x x x x x x x x x x x x Z x x p
x x x x x x x x x x x x x x x x x x x x
= = = = = = =
ψ ψ ψ ψ ψ ψ ψ ψ ψ ψ δ ψ ψ ψ ψ ψ δ ψ ψ ψ ψ ψ
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 18 / 29 nicta-logo
1
1 2 6 x
1 2 6 1
1
1 2 1 2 6 1 x
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 19 / 29 nicta-logo
X1 X2 X3 X5 X6 X4 What if now we want to compute ) | (
6 3 x
x p
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 20 / 29 nicta-logo
) ( 1 ) , ( ) , ( 1 ) , ( ) , ( ) ( ) , ( ) , ( 1 ) , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , , ( ) , ( ) , ( ) , ( ) , ( 1 ) , (
3 1 6 3 3 1 2 6 3 3 2 5 2 4 2 1 3 1 6 3 4 2 3 2 5 2 1 3 1 6 3 5 2 6 5 3 4 2 2 1 3 1 6 3 6 6 6 5 2 5 3 4 2 2 1 3 1 6 3 6 6 6 5 2 5 3 4 2 3 1 2 1 6 3
1 1 2 1 2 4 1 2 4 5 1 2 4 5 6 1 2 4 5 6
x m Z x x p x x m Z x x p x x m x m x x x x Z x x p x x x x m x x x x Z x x p x x m x x x x x x x x Z x x p x x x x x x x x x x x x x Z x x p x x x x x x x x x x x x x Z x x p
x x x x x x x x x x x x x x x x x x x x
= = = = = = =
ψ ψ ψ ψ ψ ψ ψ ψ ψ δ ψ ψ ψ ψ ψ δ ψ ψ ψ ψ ψ
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 21 / 29 nicta-logo
) ( 1 ) , ( ) , ( 1 ) , ( ) , ( ) ( ) , ( ) , ( 1 ) , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , , ( ) , ( ) , ( ) , ( ) , ( 1 ) , (
3 1 6 3 3 1 2 6 3 3 2 5 2 4 2 1 3 1 6 3 4 2 3 2 5 2 1 3 1 6 3 5 2 6 5 3 4 2 2 1 3 1 6 3 6 6 6 5 2 5 3 4 2 2 1 3 1 6 3 6 6 6 5 2 5 3 4 2 3 1 2 1 6 3
1 1 2 1 2 4 1 2 4 5 1 2 4 5 6 1 2 4 5 6x m Z x x p x x m Z x x p x x m x m x x x x Z x x p x x x x m x x x x Z x x p x x m x x x x x x x x Z x x p x x x x x x x x x x x x x Z x x p x x x x x x x x x x x x x Z x x p
x x x x x x x x x x x x x x x x x x x x
= = = = = = =
∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑
ψ ψ ψ ψ ψ ψ ψ ψ ψ δ ψ ψ ψ ψ ψ δ ψ ψ ψ ψ ψ ) ( 1 ) , ( ) , ( ) ( ) , ( 1 ) , ( ) , ( ) , ( ) ( ) , ( 1 ) , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , , ( ) , ( ) , ( ) , ( ) , ( 1 ) , ( ) , ( ) , , ( ) , ( ) , ( ) , ( ) , ( 1 ) , (
1 2 6 1 2 1 3 2 4 2 1 6 1 3 2 5 3 1 2 4 2 1 6 1 4 2 3 2 5 3 1 2 1 6 1 5 2 6 5 3 4 2 3 1 2 1 6 1 6 6 6 5 2 5 3 4 2 3 1 2 1 6 1 6 6 6 5 2 5 3 4 2 3 1 2 1 6 1
2 2 3 2 3 4 2 3 4 5 2 3 4 5 6 2 3 4 5 6x m Z x x p x x m x m x x Z x x p x x m x x x m x x Z x x p x x x x m x x x x Z x x p x x m x x x x x x x x Z x x p x x x x x x x x x x x x x Z x x p x x x x x x x x x x x x x Z x x p
x x x x x x x x x x x x x x x x x x x x
= = = = = = =
∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑∑∑∑∑
ψ ψ ψ ψ ψ ψ ψ ψ ψ ψ δ ψ ψ ψ ψ ψ δ ψ ψ ψ ψ ψ
Repeated Computations!!
6 3 x
6 1 x
How to avoid that?
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 22 / 29 nicta-logo
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 23 / 29 nicta-logo
C F E A B D C F E A B D
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 24 / 29 nicta-logo
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 25 / 29 nicta-logo
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 26 / 29 nicta-logo
(2) Create a Junction Tree X1 X2 X3 X5 X6 X4
X1 X2 X3 X2 X3 X5 X2 X5 X6 X2 X4 X2 X3 X2 X2 X5
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 27 / 29 nicta-logo
(3) Initialize clique potentials (nodes and separators)
X1 X2 X3 X2 X3 X5 X2 X5 X6 X2 X4 X2 X3 X2 X2 X5
c
Directly introduced
s
Initialized to 1
4 , 2
Ψ
3 , 2 , 1
Ψ
5 , 3 , 2
Ψ
6 , 5 , 2
Ψ
) , (
3 , 2
S S
= Φ ) , (
5 , 2
S S
= Φ ) 1 , (
2
S
= Φ
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 28 / 29 nicta-logo
(4) Message passing
X1 X2 X3 X2 X3 X5 X2 X5 X6 X2 X4 X2 X3 X2 X2 X5
= Φ
1
3 , 2 , 1 3 , 2 * x 5 , 3 , 2 3 , 2 3 , 2 * 5 , 3 , 2 *
Ψ Φ Φ = Ψ
= Φ
3
5 , 3 , 2 * 5 , 2 * x 6 , 5 , 2 5 , 2 5 , 2 * 6 , 5 , 2 *
Ψ Φ Φ = Ψ
= Φ
6 5
6 , 5 , 2 * 2 * x x 4 , 2 2 2 * 4 , 2 *
Ψ Φ Φ = Ψ
= Φ
4
4 , 2 * 2 * * x 6 , 5 , 2 2 * 2 * * 6 , 5 , 2 *
Ψ Φ Φ = Ψ
= Φ
6
6 , 5 , 2 * * 5 , 2 * * x 5 , 3 , 2 * 5 , 2 * 5 , 2 * * 5 , 3 , 2 * *
Ψ Φ Φ = Ψ
= Φ
5
5 , 3 , 2 * * 3 , 2 * * x 3 , 2 , 1 3 , 2 * 3 , 2 * * 3 , 2 , 1 * *
Ψ Φ Φ = Ψ
Tibério Caetano: Graphical Models (Lecture 5 - Inference) 29 / 29 nicta-logo
tiberiocaetano.com Statistical Machine Learning Group NICTA Canberra
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 1 / 30 nicta-logo
Marginals of p(x; θ) Conditional distributions MAP configurations etc.
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 2 / 30 nicta-logo
Z
s∈S log fs(xs; θs) − g(θ))
x exp( s∈S log fs(xs; θs))
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 3 / 30 nicta-logo
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 4 / 30 nicta-logo
i p(xi; θ) = i exp( s log fs(xi s; θs) − g(θ))
i=1
s; θs)
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 5 / 30 nicta-logo
m
s; θs)
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 6 / 30 nicta-logo
xs fs(xs; θs) = 1 ∀s so
m
s; θs) +
m
s′; θs′) = λs′
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 7 / 30 nicta-logo
m
s; θs)
m
s; θs) = 0, ∀s
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 8 / 30 nicta-logo
x exp(Φ(x), θ) is the log-partition function
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 9 / 30 nicta-logo
s∈S log fs(xs; θs) − g(θ))
s Φs(xs), θs
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 10 / 30 nicta-logo
m
s), θs
i=1 Φs(xs)/m
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 11 / 30 nicta-logo
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 12 / 30 nicta-logo
m
s), θs
i=1 Φs(xi s)/m.
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 13 / 30 nicta-logo
v θv(xφv)
n p(xV,n|θ)
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 14 / 30 nicta-logo
log p(X; θ) = log
p(xV,n|θ) =
log
p(xV; θ)δ(xV,xV,n) =
δ(xV, xV,n) log p(xV; θ) =
m(xV) log p(xV; θ) =
m(xV) log
θv(xφv ) =
m(xV)
log θv(xφv ) =
xV\φv
m(xV) log θv(xφv ) =
m(xφv ) log θv(xφv )
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 15 / 30 nicta-logo
m(xφv′ ) m(xpa(v′))
(Matches intuition)
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 16 / 30 nicta-logo
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 17 / 30 nicta-logo
X1 X2 X3
) , ( ) , ( 1 ) (
3 2 3 , 2 2 1 2 , 1
x x x x Z x p
V
Ψ Ψ =
Assume we observe N instances of this model For IID sampling, the sufficient statistics are the empirical marginals
2 1 x
and
3 2 x
How do we estimate and from the sufficient statistics?
) , (
2 1 2 , 1
x x Ψ ) , (
3 2 3 , 2
x x Ψ
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 18 / 30 nicta-logo
Let’s make a guess:
) ( ~ ) , ( ~ ) , ( ~ ) , , ( ˆ
2 3 2 2 1 3 2 1
x p x x p x x p x x x pML =
, so that We can verify that our “guess” is good, because:
) , ( ~ ) , ( ˆ
2 1 2 1 2 , 1
x x p x x
ML
= Ψ ) ( ~ ) , ( ~ ) , ( ˆ
2 3 2 3 2 3 , 2
x p x x p x x
ML
= Ψ
= =
3
) , ( ~ ) ( ~ ) , ( ~ ) , ( ~ ) , ( ˆ
2 1 2 3 2 2 1 2 1 x ML
x x p x p x x p x x p x x p
= =
1
) , ( ~ ) ( ~ ) , ( ~ ) , ( ~ ) , ( ˆ
3 2 2 3 2 2 1 3 2 x ML
x x p x p x x p x x p x x p
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 19 / 30 nicta-logo
The general recipe is: (1) For every maximal clique C, set the clique potential to its empirical marginal (2) For every intersection S between maximal cliques, associate an empirical marginal with that intersection and divide it into the potential of ONE of the cliques that form the intersection This will give ML estimates for decomposable Graphical Models
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 20 / 30 nicta-logo
A graph is complete if E contains all pairs of distinct elements of V . A graph G = (V, E) is decomposable if either
(a) A, B and C are disjoint, (b) A and C are non-empty, (c) B is complete, (d) B separates A and C in G, and (e) A ∪ B and B ∪ C are decomposable.
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 21 / 30 nicta-logo
An iterative procedure must be used: Iterative Proportional Fitting (IPF):
) ( ) ( ~ ) ( ) (
C C C C C C
x x p x x p Ψ = Ψ ) ( ) ( ~ ) ( ) (
) ( ) ( ) 1 ( C t C C t C C t C
x p x p x x Ψ = Ψ
+
Where it can be shown that:
) ( ~ ) (
) 1 ( C C t
x p x p =
+
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 22 / 30 nicta-logo
How to estimate the potentials when there are unobserved variables?
X1 X2 X3
Answer: EM algorithm
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 23 / 30 nicta-logo
Denote the observed variables by X and the hidden variables by Z X1 X2 X3 If we knew Z, the problem would reduce to maximizing the complete log-likelihood:
X Z ) | , ( log ) , ; ( θ θ z x p z x lc =
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 24 / 30 nicta-logo
However, we don’t observe Z, so the probability of the data X is Which is the incomplete log-likelihood This is the quantity we really want to maximize Note that now the logarithm cannot transform the product into a sum, since it is “blocked” by the sum over Z, and the optimization does not “decouple”
= =
z
z x p x p x l ) | , ( log ) | ( log ) ; ( θ θ θ
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 25 / 30 nicta-logo
The basic idea of the EM algorithm is: Given that Z is not observed, we may try to optimize an “averaged” version, over all possible values of Z, of the complete log-likelihood We do that through an “averaging distribution” q:
) | , ( log ) , | ( ) , , ( θ θ θ z x p x z q z x l
z q c
=
And obtain the expected complete log-likelihood The hope then is that maximizing this should at least improve the current estimate for the parameters (so that iteration would eventually maximize the log-likelihood)
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 26 / 30 nicta-logo
In order to present the algorithm, we first note that:
) , ( : ) | ( ) | , ( log ) | ( ) | ( ) | , ( ) | ( log ) ; ( ) | , ( log ) ; ( ) | ( log ) ; ( θ θ θ θ θ θ θ θ q L x z q z x p x z q x z q z x p x z q x l z x p x l x p x l
z z z
= ≥ = = =
Where L is the auxiliary function. The EM algorithm is coordinate-ascent on L
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 27 / 30 nicta-logo
) ( ) 1 ( t q t
+
) 1 ( ) 1 (
θ + + = t t
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 28 / 30 nicta-logo
Note that the “M step” is equivalent to maximizing the expected complete log- likelihood:
− = − = =
z q c z z z
x z q x z q z x l q L x z q x z q z x p x z q q L x z q z x p x z q q L ) | ( log ) | ( ) , ; ( ) , ( ) | ( log ) | ( ) | , ( log ) | ( ) , ( ) | ( ) | , ( log ) | ( ) , ( θ θ θ θ θ θ Because the second term does not depend on θ
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 29 / 30 nicta-logo
The general solution to the “E step” turns out to be Because
) ( ) 1 ( t t
+
) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (
t t t t t t t z t t t t t z t t t
Tibério Caetano: Graphical Models (Lecture 6 - Learning) 30 / 30 nicta-logo