[PPT] - Markov Networks Alan Ri2er Markov Networks Undirected PowerPoint Presentation

SLIDE 1

Markov ¡Networks ¡

Alan ¡Ri2er ¡

SLIDE 2

Markov Networks

Undirected graphical models

Cancer ¡ Cough ¡ Asthma ¡ Smoking ¡

l Poten;al ¡func;ons ¡defined ¡over ¡cliques ¡

Smoking Cancer Ф(S,C) False False 4.5 False True 4.5 True False 2.7 True True 4.5

∏Φ

=

c c c x

Z x P ) ( 1 ) (

∑∏Φ

=

x c c c x

Z ) (

SLIDE 3

Undirected ¡Graphical ¡Models: ¡ Mo;va;on ¡

Terminology: ¡

– Directed ¡graphical ¡models ¡= ¡Bayesian ¡Networks ¡ – Undirected ¡graphical ¡models ¡= ¡Markov ¡Networks ¡

We ¡just ¡learned ¡about ¡DGMs ¡(Bayes ¡Nets) ¡
For ¡some ¡domains ¡being ¡forced ¡to ¡choose ¡a ¡

direc;on ¡of ¡edges ¡is ¡awkward. ¡

Example: ¡consider ¡modeling ¡an ¡image ¡

– Assump;on: ¡neighboring ¡pixels ¡are ¡correlated ¡ – We ¡could ¡create ¡a ¡DAG ¡model ¡w/ ¡2D ¡topology ¡

SLIDE 4

2D ¡Bayesian ¡Network ¡

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20

SLIDE 5

Markov ¡Random ¡Field ¡ ¡ (Markov ¡Network) ¡

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20

SLIDE 6

UGMs ¡(Bayes ¡Nets) ¡vs ¡ ¡ DGMs ¡(Markov ¡Nets) ¡

Advantages ¡
1. Symmetric ¡
More ¡natural ¡for ¡certain ¡domains ¡(e.g. ¡spa;al ¡or ¡

rela;onal ¡data) ¡

2. Discrimina;ve ¡UGMs ¡(A.K.A ¡Condi;onal ¡Random ¡

Fields) ¡work ¡be2er ¡than ¡discrimina;ve ¡UGMs ¡

Disadvantages ¡
1. Parameters ¡are ¡less ¡interpretable ¡and ¡modular ¡
2. Parameter ¡es;ma;on ¡is ¡computa;onally ¡more ¡

expensive ¡

SLIDE 7

Condi;onal ¡Independence ¡Proper;es ¡

Much ¡Simpler ¡than ¡Bayesian ¡Networks ¡

– No ¡d-‑sepera;on, ¡v-‑structures, ¡etc… ¡

UGMs ¡define ¡CI ¡via ¡simple ¡graph ¡separa;on ¡
E.g. ¡if ¡we ¡remove ¡all ¡the ¡evidence ¡nodes ¡from ¡

the ¡graph, ¡are ¡there ¡any ¡paths ¡connec;ng ¡A ¡ and ¡B? ¡

XA ⊥G XB|XE ⇐ ⇒ E separates A from B in G

SLIDE 8

Markov ¡Blanket ¡

Also ¡Simple ¡

– Markov ¡blanket ¡of ¡a ¡node ¡is ¡just ¡the ¡set ¡of ¡it’s ¡ immediate ¡neighbors ¡ – Don’t ¡need ¡to ¡worry ¡about ¡co-‑parents ¡

SLIDE 9

Independence ¡Proper;es ¡

1 2 3 5 4 6 7

1 ⊥ 7|rest 1 ⊥ rest|2, 3 1, 2 ⊥ 6, 7|3, 4, 5

Pairwise: ¡ Local: ¡ Global: ¡

G L P p(x) > 0

SLIDE 10

Conver;ng ¡a ¡Bayesian ¡Network ¡to ¡a ¡ Markov ¡Network ¡

Temp;ng: ¡

– Just ¡drop ¡direc;onality ¡of ¡the ¡edges ¡ – But ¡this ¡is ¡clearly ¡incorrect ¡(v-‑structure) ¡ – Introduces ¡incorrect ¡CI ¡statements ¡

Solu;on: ¡

– Add ¡edges ¡between ¡“unmarried” ¡parents ¡ – This ¡process ¡is ¡called ¡moraliza2on ¡

SLIDE 11

Example: ¡moraliza;on ¡

1 2 3 5 4 6 7

Unfortunately, ¡this ¡looses ¡some ¡CI ¡informa;on ¡

– Example: ¡ 4 ⊥ 5|2

SLIDE 12

Directed ¡vs. ¡Undirected ¡GMs ¡

Q: ¡which ¡has ¡ ¡more ¡“expressive ¡power”? ¡
Recall: ¡

– G ¡is ¡an ¡I-‑map ¡of ¡P ¡if: ¡ ¡

Now ¡define: ¡

– G ¡is ¡a ¡perfect ¡I-‑map ¡of ¡P ¡if: ¡

Graph ¡can ¡represent ¡all ¡(and ¡only) ¡CIs ¡in ¡P ¡

¡

I(G) ⊆ I(P) I(G) = I(P)

Bayesian ¡Networks ¡and ¡Markov ¡Networks ¡are ¡ perfect ¡maps ¡for ¡different ¡sets ¡of ¡distribu;ons ¡

SLIDE 13

Probabilistic Models Graphical Models Directed Undirected Chordal

SLIDE 14

Parameteriza;on ¡

No ¡topological ¡ordering ¡on ¡undirected ¡graph ¡
Can’t ¡use ¡the ¡chain ¡rule ¡of ¡probability ¡to ¡

represent ¡P(y) ¡

Instead ¡we ¡will ¡use ¡poten2al ¡func2ons: ¡

– associate ¡poten;al ¡func;ons ¡with ¡each ¡maximal ¡ clique ¡in ¡the ¡graph ¡ – A ¡poten;al ¡can ¡be ¡any ¡non-‑nega;ve ¡func;on ¡

Joint ¡distribu2on ¡is ¡defined ¡to ¡be ¡

propor2onal ¡to ¡product ¡of ¡clique ¡poten2als ¡

ψc(yc|θc)

SLIDE 15

Parameteriza;on ¡(con’t) ¡

Joint ¡distribu2on ¡is ¡defined ¡to ¡be ¡

propor2onal ¡to ¡product ¡of ¡clique ¡poten2als ¡

Any ¡posi2ve ¡distribu2on ¡whose ¡CI ¡proper2es ¡

can ¡be ¡represented ¡by ¡an ¡UGM ¡can ¡be ¡ represented ¡this ¡way. ¡

SLIDE 16

Hammersly-‑Clifford ¡Theorem ¡

A ¡posi;ve ¡distribu;on ¡P(Y) ¡> ¡0 ¡sa;sfies ¡the ¡CI ¡

proper;es ¡of ¡an ¡undirected ¡graph ¡G ¡iff ¡P ¡can ¡ be ¡represented ¡as ¡a ¡product ¡of ¡factors, ¡one ¡ per ¡maximal ¡clique ¡

P(y|θ) = 1 Z(θ) Y

c∈C

ψc(yc|θc)

Z ¡is ¡the ¡par;;on ¡ func;on ¡

Z(θ) = X

y

Y

c∈C

ψc(yc|θc)

SLIDE 17

Example ¡

If ¡P ¡sa;sfies ¡the ¡condi;onal ¡

independence ¡assump;ons ¡

f ¡this ¡graph, ¡we ¡can ¡write ¡

4 5 2 3 1

P(y|θ) = 1 Z(θ)ψ123(y1, y2, y3)ψ234(y2, y3, y4)ψ35(y3, y5)

Z(θ) = X

y

ψ123(y1, y2, y3)ψ234(y2, y3, y4)ψ35(y3, y5)

SLIDE 18

Pairwise ¡MRF ¡

Poten;als ¡don’t ¡need ¡to ¡

correspond ¡to ¡maximal ¡ cliques ¡

We ¡can ¡also ¡restrict ¡

parameteriza;on ¡to ¡edges ¡ (or ¡any ¡other ¡cliques) ¡

Pairwise ¡MRF: ¡

4 5 2 3 1

P(y|θ) = ψ12(y1, y2)ψ13(y1, y3)ψ23(y2, y3)ψ24(y2, y4)ψ34(y3, y4)ψ35(y3, y5)

SLIDE 19

Represen;ng ¡Poten;al ¡Func;ons ¡

Can ¡represent ¡as ¡CPTs ¡like ¡we ¡did ¡for ¡Bayesian ¡

Networks ¡(DGMs) ¡

– But, ¡poten;als ¡are ¡not ¡probabili;es ¡ – Represent ¡rela;ve ¡“compa;bility” ¡between ¡ various ¡assignments ¡

SLIDE 20

Represen;ng ¡Poten;al ¡Func;ons ¡

More ¡general ¡approach: ¡

– Represent ¡the ¡log ¡poten;als ¡as ¡a ¡linear ¡func;on ¡of ¡ the ¡parameters ¡ – Log-‑linear ¡(maximum ¡entropy) ¡models ¡

log P(y|θ) = X

c

ψc(yc)T θc − log Z(θ)

SLIDE 21

Log-Linear Models

l Log-‑linear ¡model: ¡ Weight of Feature i Feature i

⎩ ⎨ ⎧ ∨ ¬ =

therwise

Cancer Smoking if 1 ) Cancer Smoking, (

1

f

51 .

1 =

w Cancer ¡ Cough ¡ Asthma ¡ Smoking ¡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =

∑

i i i

x f w Z x P ) ( exp 1 ) (

Markov ¡Networks ¡

Alan ¡Ri2er ¡

Markov Networks

Cancer ¡ Cough ¡ Asthma ¡ Smoking ¡

∏Φ

=

Z x P ) ( 1 ) (

∑∏Φ

=

Z ) (

Undirected ¡Graphical ¡Models: ¡ Mo;va;on ¡

– Directed ¡graphical ¡models ¡= ¡Bayesian ¡Networks ¡ – Undirected ¡graphical ¡models ¡= ¡Markov ¡Networks ¡

direc;on ¡of ¡edges ¡is ¡awkward. ¡

– Assump;on: ¡neighboring ¡pixels ¡are ¡correlated ¡ – We ¡could ¡create ¡a ¡DAG ¡model ¡w/ ¡2D ¡topology ¡

2D ¡Bayesian ¡Network ¡

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20

Markov ¡Random ¡Field ¡ ¡ (Markov ¡Network) ¡

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20

UGMs ¡(Bayes ¡Nets) ¡vs ¡ ¡ DGMs ¡(Markov ¡Nets) ¡

rela;onal ¡data) ¡

Fields) ¡work ¡be2er ¡than ¡discrimina;ve ¡UGMs ¡

expensive ¡

Condi;onal ¡Independence ¡Proper;es ¡

– No ¡d-­‑sepera;on, ¡v-­‑structures, ¡etc… ¡

the ¡graph, ¡are ¡there ¡any ¡paths ¡connec;ng ¡A ¡ and ¡B? ¡

XA ⊥G XB|XE ⇐ ⇒ E separates A from B in G

Markov ¡Blanket ¡

– Markov ¡blanket ¡of ¡a ¡node ¡is ¡just ¡the ¡set ¡of ¡it’s ¡ immediate ¡neighbors ¡ – Don’t ¡need ¡to ¡worry ¡about ¡co-­‑parents ¡

Independence ¡Proper;es ¡

1 2 3 5 4 6 7

1 ⊥ 7|rest 1 ⊥ rest|2, 3 1, 2 ⊥ 6, 7|3, 4, 5

G L P p(x) > 0

Conver;ng ¡a ¡Bayesian ¡Network ¡to ¡a ¡ Markov ¡Network ¡

– Just ¡drop ¡direc;onality ¡of ¡the ¡edges ¡ – But ¡this ¡is ¡clearly ¡incorrect ¡(v-­‑structure) ¡ – Introduces ¡incorrect ¡CI ¡statements ¡

– Add ¡edges ¡between ¡“unmarried” ¡parents ¡ – This ¡process ¡is ¡called ¡moraliza2on ¡

Example: ¡moraliza;on ¡

1 2 3 5 4 6 7

1 2 3 5 4 6 7

– Example: ¡ 4 ⊥ 5|2

Directed ¡vs. ¡Undirected ¡GMs ¡

– G ¡is ¡an ¡I-­‑map ¡of ¡P ¡if: ¡ ¡

– G ¡is ¡a ¡perfect ¡I-­‑map ¡of ¡P ¡if: ¡

¡

I(G) ⊆ I(P) I(G) = I(P)

Bayesian ¡Networks ¡and ¡Markov ¡Networks ¡are ¡ perfect ¡maps ¡for ¡different ¡sets ¡of ¡distribu;ons ¡

Probabilistic Models Graphical Models Directed Undirected Chordal

Parameteriza;on ¡

represent ¡P(y) ¡

– associate ¡poten;al ¡func;ons ¡with ¡each ¡maximal ¡ clique ¡in ¡the ¡graph ¡ – A ¡poten;al ¡can ¡be ¡any ¡non-­‑nega;ve ¡func;on ¡

propor2onal ¡to ¡product ¡of ¡clique ¡poten2als ¡

ψc(yc|θc)

Parameteriza;on ¡(con’t) ¡

propor2onal ¡to ¡product ¡of ¡clique ¡poten2als ¡

can ¡be ¡represented ¡by ¡an ¡UGM ¡can ¡be ¡ represented ¡this ¡way. ¡

Hammersly-­‑Clifford ¡Theorem ¡

proper;es ¡of ¡an ¡undirected ¡graph ¡G ¡iff ¡P ¡can ¡ be ¡represented ¡as ¡a ¡product ¡of ¡factors, ¡one ¡ per ¡maximal ¡clique ¡

P(y|θ) = 1 Z(θ) Y

c∈C

ψc(yc|θc)

Z ¡is ¡the ¡par;;on ¡ func;on ¡

Z(θ) = X

y

Y

c∈C

ψc(yc|θc)

Example ¡

independence ¡assump;ons ¡

4 5 2 3 1

P(y|θ) = 1 Z(θ)ψ123(y1, y2, y3)ψ234(y2, y3, y4)ψ35(y3, y5)

Z(θ) = X

ψ123(y1, y2, y3)ψ234(y2, y3, y4)ψ35(y3, y5)

Pairwise ¡MRF ¡

correspond ¡to ¡maximal ¡ cliques ¡

parameteriza;on ¡to ¡edges ¡ (or ¡any ¡other ¡cliques) ¡

4 5 2 3 1

P(y|θ) = ψ12(y1, y2)ψ13(y1, y3)ψ23(y2, y3)ψ24(y2, y4)ψ34(y3, y4)ψ35(y3, y5)

Represen;ng ¡Poten;al ¡Func;ons ¡

Networks ¡(DGMs) ¡

– But, ¡poten;als ¡are ¡not ¡probabili;es ¡ – Represent ¡rela;ve ¡“compa;bility” ¡between ¡ various ¡assignments ¡

Represen;ng ¡Poten;al ¡Func;ons ¡

– No ¡d-‑sepera;on, ¡v-‑structures, ¡etc… ¡

– Markov ¡blanket ¡of ¡a ¡node ¡is ¡just ¡the ¡set ¡of ¡it’s ¡ immediate ¡neighbors ¡ – Don’t ¡need ¡to ¡worry ¡about ¡co-‑parents ¡

– Just ¡drop ¡direc;onality ¡of ¡the ¡edges ¡ – But ¡this ¡is ¡clearly ¡incorrect ¡(v-‑structure) ¡ – Introduces ¡incorrect ¡CI ¡statements ¡

– G ¡is ¡an ¡I-‑map ¡of ¡P ¡if: ¡ ¡

– G ¡is ¡a ¡perfect ¡I-‑map ¡of ¡P ¡if: ¡

– associate ¡poten;al ¡func;ons ¡with ¡each ¡maximal ¡ clique ¡in ¡the ¡graph ¡ – A ¡poten;al ¡can ¡be ¡any ¡non-‑nega;ve ¡func;on ¡

Hammersly-‑Clifford ¡Theorem ¡

– Represent ¡the ¡log ¡poten;als ¡as ¡a ¡linear ¡func;on ¡of ¡ the ¡parameters ¡ – Log-‑linear ¡(maximum ¡entropy) ¡models ¡

Log-‑Linear ¡models ¡can ¡ ¡ represent ¡Table ¡CPTs ¡