Probabilistic Graphical Models Probabilistic Graphical Models
Loopy BP and Bethe Free Energy
Siamak Ravanbakhsh Fall 2019
Probabilistic Graphical Models Probabilistic Graphical Models Loopy - - PowerPoint PPT Presentation
Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy Fall 2019 Siamak Ravanbakhsh Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation So
Siamak Ravanbakhsh Fall 2019
δ
(S ) = ψ (C ) δ (S )i→j i,j
∑C
−Si i,j
i i ∏k∈Nb
−ji
k→i i,k
sepset cluster/clique
δ
(S ) = ψ (C ) δ (S )i→j i,j
∑C
−Si i,j
i i ∏k∈Nb
−ji
k→i i,k
p
(C ) ∝i i
β
(C ) =i i
ψ
(C ) δ (S )i i ∏k∈Nb
i
k→i i,k
sepset cluster/clique
ϕ
(x , x )i,j i j
x
2
x
4
x
1
x
3
x
5
x
6
what are the sepsets?
ϕ
(x , x )i,j i j
x
2
x
4
x
1
x
3
x
5
x
6
what are the sepsets? a different valid clique-tree check for running intersection property
δ
(x ) = ϕ (x , x ) δ (x )i→j j
∑x
i
i,j i j ∏k∈Nb
−ji
k→i i
from leaves towards a root back to leaves
ϕ
(x , x )i,j i j
x
i
x
j
δ
(x ) = ϕ (x , x ) δ (x )i→j j
∑x
i
i,j i j ∏k∈Nb
−ji
k→i i
p
(x ) ∝i i
δ (x )∏k∈Nb
i
k→i i
from leaves towards a root back to leaves
ϕ
(x , x )i,j i j
x
i
x
j
p
(x , x ) ∝i,j i j
ϕ (x
, x ) δ (x ) δ (x )i,j i j ∏k∈Nb
−ji
k→i i ∏k∈Nb
−ij
k→j j
the denominator is adjusting for double-counts substitute the marginals using BP messages to get (*)
p(x) =
ϕ (x , x )z 1 ∏i,j∈E i,j i j
p(x) =
p∏i
i ∣Nb
∣−1i
p (x ,x )∏i,j∈E
i,j i j
q
Z 1 ∏k i,j i j
∏i
i i ∣Nb
∣−1i
q (x ,x )∏i,j∈E
i,j i j
i,j i
E
[ ln ϕ (x , x )] −q ∑i,j i,j i j
ln(Z)
q ∑i,j i,j i j
q
q ∑i,j i,j i j
variational free energy
ignore: does not depend on q
q
Z 1 ∏k i,j i j
q
q ∑i,j i,j i j
both entropy and energy involve summation over exponentially many terms
∏i
i i ∣Nb
∣−1i
q (x ,x )∏i,j∈E
i,j i j
q
Z 1 ∏k i,j i j
∏i
i i ∣Nb
∣−1i
q (x ,x )∏i,j∈E
i,j i j
q
Z 1 ∏k i,j i j
q
q ∑i,j i,j i j
∏i
i i ∣Nb
∣−1i
q (x ,x )∏i,j∈E
i,j i j
q
Z 1 ∏k i,j i j
q
q ∑i,j i,j i j
q (x , x ) ln ϕ (x , x )∑i,j∈E ∑x
i,j
i,j i j i,j i j
∏i
i i ∣Nb
∣−1i
q (x ,x )∏i,j∈E
i,j i j
q
Z 1 ∏k i,j i j
q
q ∑i,j i,j i j
H(q ) −∑i,j∈E
i,j
(∣Nb ∣ −∑i
i
1)H(q
)i
follows from the decomposition of q
q (x , x ) ln ϕ (x , x )∑i,j∈E ∑x
i,j
i,j i j i,j i j
∏i
i i ∣Nb
∣−1i
q (x ,x )∏i,j∈E
i,j i j
q
q ∑i,j i,j i j
q
, qi,j i
q (x , x ) =i
i,j i j
j j
j
a real distribution with these marginals should exist marginal polytope for tree graphical models this local consistency is enough
H(q ) −∑i,j∈E
i,j
(∣Nb ∣ −∑i
i
1)H(q
)i
q (x , x ) ln ϕ (x , x )∑i,j∈E ∑x
i,j
i,j i j i,j i j
arg max
H(q ) −{q} ∑i,j∈E i,j
(∣Nb ∣ −∑i
i
1)H(q
) +i
q (x , x ) ln ϕ (x , x )∑i,j∈E ∑x
i,j
i,j i j i,j i j
arg max
H(q ) −{q} ∑i,j∈E i,j
(∣Nb ∣ −∑i
i
1)H(q
) +i
q (x , x ) ln ϕ (x , x )∑i,j∈E ∑x
i,j
i,j i j i,j i j
q (x , x ) =∑x
i
i,j i j
q
(x )∀i, j ∈
j j
E, x
j
q
(x , x ) ≥i,j i j
∀i, j ∈ E, x
, xi j
∑x
i
i i
1 ∀i
arg max
H(q ) −{q} ∑i,j∈E i,j
(∣Nb ∣ −∑i
i
1)H(q
) +i
q (x , x ) ln ϕ (x , x )∑i,j∈E ∑x
i,j
i,j i j i,j i j
q (x , x ) =∑x
i
i,j i j
q
(x )∀i, j ∈
j j
E, x
j
q
(x , x ) ≥i,j i j
∀i, j ∈ E, x
, xi j
∑x
i
i i
1 ∀i
BP messages are the (exponential form of the) Lagrange multipliers
δ
(x ) ∝ ϕ (x , x ) δ (x )i→j j
∑x
i
i,j i j ∏k∈Nb
−ji
k→i k
proportional to normalize the message for numerical stability
δ
(x ) ∝ ϕ (x , x ) δ (x )i→j j
∑x
i
i,j i j ∏k∈Nb
−ji
k→i k
proportional to normalize the message for numerical stability
δ
(x ) ∝ ϕ (x , x ) δ (x )i→j j
∑x
i
i,j i j ∏k∈Nb
−ji
k→i k
proportional to normalize the message for numerical stability
δ
(x ) ∝ ϕ (x , x ) δ (x )i→j j
∑x
i
i,j i j ∏k∈Nb
−ji
k→i k
p ^
i
δ (x )∏k∈Nb
i
k→i i
is not (proportional to) the exact marginal
p(x
)i
proportional to normalize the message for numerical stability
1
2
3
4
5
{1,2,4}
{3,5}
Z 1 ∏I I I
factor nodes variable nodes
is a subset of variables I ⊆ {1, … , N}
δ
(x ) ∝ δ (x )i→I i
∏J∣i∈J,J
=I J→i i
1
2
3
4
5
{1,2,4}
{3,5}
Z 1 ∏I I I
factor nodes variable nodes
is a subset of variables I ⊆ {1, … , N}
δ
(x ) ∝ δ (x )i→I i
∏J∣i∈J,J
=I J→i i
1
2
3
4
5
{1,2,4}
{3,5}
Z 1 ∏I I I
factor nodes variable nodes
is a subset of variables I ⊆ {1, … , N}
δ
(x ) ∝I→i i
ϕ (x ) δ (x )∑x
I−i
I I ∏j∈I−i j→I i
δ
(x ) ∝ δ (x )i→I i
∏J∣i∈J,J
=I J→i i
p ^
i
δ (x )∏J∣i∈J
J→i i
1
2
3
4
5
{1,2,4}
{3,5}
Z 1 ∏I I I
factor nodes variable nodes
is a subset of variables I ⊆ {1, … , N}
δ
(x ) ∝I→i i
ϕ (x ) δ (x )∑x
I−i
I I ∏j∈I−i j→I i
https://graph-tool.skewed.de
www.jianxiongxiao.com
p(y
=i
1 ∣ x
=i
1) = p(y
=i
0 ∣ x
=i
0) = 1 − ϵ
x
, … , x1 n
y
, … , y1 n
low-density parity check
p(y
=i
1 ∣ x
=i
1) = p(y
=i
0 ∣ x
=i
0) = 1 − ϵ
x
, … , x1 n
y
, … , y1 n
low-density parity check
ϕ
(x , x , x ) =stu s t u
{1 if x
⊕ x ⊕ x = 1s t u
s t u ∏i=1 n
i
i
i y
)i
image: wainwright&jordan
p(y
=i
1 ∣ x
=i
1) = p(y
=i
0 ∣ x
=i
0) = 1 − ϵ
x
, … , x1 n
y
, … , y1 n
low-density parity check
ϕ
(x , x , x ) =stu s t u
{1 if x
⊕ x ⊕ x = 1s t u
s t u ∏i=1 n
i
i
i y
)i
image: wainwright&jordan
∗
x
low-density parity check
s t u ∏i=1 n
i
i
i y
)i
image: wainwright&jordan
∗
x
i ∗
x
i
i
p(x
∣i
y)∀i
low-density parity check
s t u ∏i=1 n
i
i
i y
)i
image: wainwright&jordan
∗
x
low-density parity check
s t u ∏i=1 n
i
i
i y
)i
image: wainwright&jordan
∗
x
i ∗
x
i
i
p(x
∣i
y)∀i
low-density parity check
q
q ∑i,j i,j i j
H(q ) −∑i,j∈E
i,j
(∣Nb ∣ −∑i
i
1)H(q
)i
q (x , x ) ln ϕ (x , x )∑i,j∈E ∑x
i,j
i,j i j i,j i j
q
q ∑i,j i,j i j
∑i,j∈E
i,j
(∣Nb ∣ −∑i
i
1)H(q
)i
q (x , x ) ln ϕ (x , x )∑i,j∈E ∑x
i,j
i,j i j i,j i j
q
q ∑i,j i,j i j
q (x , x ) =∑x
i
i,j i j
q
(x )∀i, j ∈
j j
E, x
j
L :
q
q ∑i,j i,j i j
∑x
i
i,j i j
q
(x )∀i, j ∈
j j
E, x
j
q
, qi,j i
L :
q
q ∑i,j i,j i j
∑x
i
i,j i j
q
(x )∀i, j ∈
j j
E, x
j
q
, qi,j i
[q
, … , q , q , … , q ]1 n 1,3 m,n
[p
, … , p , p , … , p ]1 n 1,3 m,n
L :
q
q ∑i,j i,j i j
clusters are not necessarily max-cliques running intersection property family-preserving property
S
⊆i,j
C
∩i
C
j
instead of = in clique-tree
clusters are not necessarily max-cliques running intersection property family-preserving property
S
⊆i,j
C
∩i
C
j
instead of = in clique-tree
∏i,j p ^
i,j
(C )∏i p ^
i
instead of = in clique-tree
clusters are not necessarily max-cliques running intersection property family-preserving property
S
⊆i,j
C
∩i
C
j
instead of = in clique-tree
a factor-graph
A B C D E F
∏i,j p ^
i,j
(C )∏i p ^
i
instead of = in clique-tree
clusters are not necessarily max-cliques running intersection property family-preserving property
S
⊆i,j
C
∩i
C
j
instead of = in clique-tree
a factor-graph
A B C D E F
corresponding cluster-graph (the same BP updates)
∏i,j p ^
i,j
(C )∏i p ^
i
instead of = in clique-tree
clusters are not necessarily max-cliques running intersection property family-preserving property
S
⊆i,j
C
∩i
C
j
instead of = in clique-tree
a factor-graph
A B C D E F
corresponding cluster-graph (the same BP updates) improved cluster-graph (better entropy approximation + marginal constraint)
∏i,j p ^
i,j
(C )∏i p ^
i
instead of = in clique-tree
locally tree-like graphs dense graphs with weak interactions
11 x 11 Ising grid
locally tree-like graphs dense graphs with weak interactions
δ
(x ) ∝ (1 − α)δ (x ) + α δ (x )i→I (t+1) i i→I (t) i
∏J∣i∈J,J
=I J→i (t) i
11 x 11 Ising grid
1
2
3
4
5
{1,2,4}
{3,5}
ndΔ
max 2 δ
(x ) ∝ δ (x )i→I i
∏J∣i∈J,J
=I J→i i
from each var to all neighbors
number of vars domain size (2 for binary) max neighbours
1
2
3
4
5
{1,2,4}
{3,5}
ndΔ
max 2 δ
(x ) ∝ δ (x )i→I i
∏J∣i∈J,J
=I J→i i
δ
(x ) ∝I→i i
ϕ (x ) δ (x )∑x
I−i
I I ∏j∈I−i j→I i
from each var to all neighbors
number of vars domain size (2 for binary) max neighbours
md ∣Scope
∣∣Scope
∣max
max
number of factors vars in a factor
1
2
3
4
5
{1,2,4}
{3,5}
ndΔ
max 2 δ
(x ) ∝ δ (x )i→I i
∏J∣i∈J,J
=I J→i i
δ
(x ) ∝I→i i
ϕ (x ) δ (x )∑x
I−i
I I ∏j∈I−i j→I i
from each var to all neighbors
number of vars domain size (2 for binary) max neighbours
md ∣Scope
∣∣Scope
∣max
max
number of factors vars in a factor