Probabilistic Graphical Models Probabilistic Graphical Models Loopy - - PowerPoint PPT Presentation

probabilistic graphical models probabilistic graphical
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Graphical Models Probabilistic Graphical Models Loopy - - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy Fall 2019 Siamak Ravanbakhsh Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation So


slide-1
SLIDE 1

Probabilistic Graphical Models Probabilistic Graphical Models

Loopy BP and Bethe Free Energy

Siamak Ravanbakhsh Fall 2019

slide-2
SLIDE 2

Learning objective Learning objective

loopy belief propagation its variational derivation: Bethe approximation

slide-3
SLIDE 3

So far... So far...

exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree

slide-4
SLIDE 4

So far... So far...

exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree what if the exact inference is too expensive? (i.e., the tree-width is large) continue to use BP: loopy BP why is this a good idea? answer using variational interpretation

This lecture... This lecture...

slide-5
SLIDE 5

Recap Recap: BP in clique trees : BP in clique trees

sum-product BP message update:

δ

(S ) = ψ (C ) δ (S )

i→j i,j

∑C

−S

i i,j

i i ∏k∈Nb

−j

i

k→i i,k

from leaves towards the root back to leaves

sepset cluster/clique

slide-6
SLIDE 6

Recap Recap: BP in clique trees : BP in clique trees

sum-product BP message update:

δ

(S ) = ψ (C ) δ (S )

i→j i,j

∑C

−S

i i,j

i i ∏k∈Nb

−j

i

k→i i,k

p

(C ) ∝

i i

β

(C ) =

i i

ψ

(C ) δ (S )

i i ∏k∈Nb

i

k→i i,k

from leaves towards the root back to leaves marginal (belief) for each cluster:

sepset cluster/clique

slide-7
SLIDE 7

Clique-tree for Clique-tree for tree structures tree structures

pairwise potentials tree width = 1

ϕ

(x , x )

i,j i j

x

2

x

4

  • ne cluster per factor

x

1

x

3

x

5

x

6

  • ne possible clique-tree

what are the sepsets?

slide-8
SLIDE 8

Clique-tree for Clique-tree for tree structures tree structures

pairwise potentials tree width = 1

ϕ

(x , x )

i,j i j

x

2

x

4

  • ne cluster per factor

x

1

x

3

x

5

x

6

  • ne possible clique-tree

what are the sepsets? a different valid clique-tree check for running intersection property

slide-9
SLIDE 9

BP for BP for tree structures tree structures

pairwise potentials message update

δ

(x ) = ϕ (x , x ) δ (x )

i→j j

∑x

i

i,j i j ∏k∈Nb

−j

i

k→i i

from leaves towards a root back to leaves

ϕ

(x , x )

i,j i j

x

i

x

j

  • ne cluster per factor
slide-10
SLIDE 10

BP for BP for tree structures tree structures

pairwise potentials message update

δ

(x ) = ϕ (x , x ) δ (x )

i→j j

∑x

i

i,j i j ∏k∈Nb

−j

i

k→i i

p

(x ) ∝

i i

δ (x )

∏k∈Nb

i

k→i i

from leaves towards a root back to leaves

marginal (belief) for each cluster

ϕ

(x , x )

i,j i j

x

i

x

j

  • ne cluster per factor

p

(x , x ) ∝

i,j i j

ϕ (x

, x ) δ (x ) δ (x )

i,j i j ∏k∈Nb

−j

i

k→i i ∏k∈Nb

−i

j

k→j j

slide-11
SLIDE 11

BP for tree structures: BP for tree structures: reparametrization reparametrization

graphical model represents why is this correct?

the denominator is adjusting for double-counts substitute the marginals using BP messages to get (*)

p(x) =

ϕ (x , x )

z 1 ∏i,j∈E i,j i j

  • ne cluster per factor

write it in terms of marginals

p(x) =

p

∏i

i ∣Nb

∣−1

i

p (x ,x )

∏i,j∈E

i,j i j

*

slide-12
SLIDE 12

Variational Variational interpretation interpretation

write q in terms of marginals of interest arg min

D(q∥p)

q

BP as I-projection

p(x) =

ϕ (x , x )

Z 1 ∏k i,j i j

q(x) =

q (x )

∏i

i i ∣Nb

∣−1

i

q (x ,x )

∏i,j∈E

i,j i j

minimization gives us the marginals q

, q

i,j i

slide-13
SLIDE 13

Variational Variational free energy free energy

D(q∥p) =

q(x)(ln q(x) −

∑x ln p(x))

E

[ ln ϕ (x , x )] −

q ∑i,j i,j i j

ln(Z)

−H(q) = −H(q) − E [ ln ϕ

(x , x )] +

q ∑i,j i,j i j

ln Z I-projection is equivalent to arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

variational free energy

ignore: does not depend on q

free energy is a lower-bound on ln Z

slide-14
SLIDE 14

Simplifying the free energy Simplifying the free energy

arg min

D(q∥p)

q

p(x) =

ϕ (x , x )

Z 1 ∏k i,j i j

≡ arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

so far did not use the decomposed form of q

both entropy and energy involve summation over exponentially many terms

q(x) =

q (x )

∏i

i i ∣Nb

∣−1

i

q (x ,x )

∏i,j∈E

i,j i j

slide-15
SLIDE 15

Simplifying Simplifying the free energy the free energy

arg min

D(q∥p)

q

p(x) =

ϕ (x , x )

Z 1 ∏k i,j i j

q(x) =

q (x )

∏i

i i ∣Nb

∣−1

i

q (x ,x )

∏i,j∈E

i,j i j

slide-16
SLIDE 16

Simplifying Simplifying the free energy the free energy

arg min

D(q∥p)

q

p(x) =

ϕ (x , x )

Z 1 ∏k i,j i j

≡ arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

q(x) =

q (x )

∏i

i i ∣Nb

∣−1

i

q (x ,x )

∏i,j∈E

i,j i j

slide-17
SLIDE 17

Simplifying Simplifying the free energy the free energy

arg min

D(q∥p)

q

p(x) =

ϕ (x , x )

Z 1 ∏k i,j i j

≡ arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

q(x) =

q (x )

∏i

i i ∣Nb

∣−1

i

q (x ,x )

∏i,j∈E

i,j i j

slide-18
SLIDE 18

Simplifying Simplifying the free energy the free energy

arg min

D(q∥p)

q

p(x) =

ϕ (x , x )

Z 1 ∏k i,j i j

≡ arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

H(q ) −

∑i,j∈E

i,j

(∣Nb ∣ −

∑i

i

1)H(q

)

i

follows from the decomposition of q

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

q(x) =

q (x )

∏i

i i ∣Nb

∣−1

i

q (x ,x )

∏i,j∈E

i,j i j

slide-19
SLIDE 19

Variational interpretation: Variational interpretation: marginal constraints marginal constraints

arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

marginals should be "valid"

q

, q

i,j i

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

a real distribution with these marginals should exist marginal polytope for tree graphical models this local consistency is enough

H(q ) −

∑i,j∈E

i,j

(∣Nb ∣ −

∑i

i

1)H(q

)

i

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

slide-20
SLIDE 20

Variational derivation of BP Variational derivation of BP

arg max

H(q ) −

{q} ∑i,j∈E i,j

(∣Nb ∣ −

∑i

i

1)H(q

) +

i

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

slide-21
SLIDE 21

Variational derivation of BP Variational derivation of BP

arg max

H(q ) −

{q} ∑i,j∈E i,j

(∣Nb ∣ −

∑i

i

1)H(q

) +

i

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

(x , x ) ≥

i,j i j

∀i, j ∈ E, x

, x

i j

locally consistent marginal distributions

q (x ) =

∑x

i

i i

1 ∀i

slide-22
SLIDE 22

Variational derivation of BP Variational derivation of BP

arg max

H(q ) −

{q} ∑i,j∈E i,j

(∣Nb ∣ −

∑i

i

1)H(q

) +

i

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

(x , x ) ≥

i,j i j

∀i, j ∈ E, x

, x

i j

locally consistent marginal distributions

q (x ) =

∑x

i

i i

1 ∀i

BP update is derived as "fixed-points" of the Lagrangian

BP messages are the (exponential form of the) Lagrange multipliers

slide-23
SLIDE 23

What happens if there are What happens if there are loops loops?

We can still apply BP update:

δ

(x ) ∝ ϕ (x , x ) δ (x )

i→j j

∑x

i

i,j i j ∏k∈Nb

−j

i

k→i k

proportional to normalize the message for numerical stability

slide-24
SLIDE 24

What happens if there are What happens if there are loops loops?

We can still apply BP update:

δ

(x ) ∝ ϕ (x , x ) δ (x )

i→j j

∑x

i

i,j i j ∏k∈Nb

−j

i

k→i k

update the messages synchronously or sequentially

proportional to normalize the message for numerical stability

slide-25
SLIDE 25

What happens if there are What happens if there are loops loops?

We can still apply BP update:

δ

(x ) ∝ ϕ (x , x ) δ (x )

i→j j

∑x

i

i,j i j ∏k∈Nb

−j

i

k→i k

update the messages synchronously or sequentially may not converge (oscillating behavior)

proportional to normalize the message for numerical stability

slide-26
SLIDE 26

What happens if there are What happens if there are loops loops?

We can still apply BP update:

δ

(x ) ∝ ϕ (x , x ) δ (x )

i→j j

∑x

i

i,j i j ∏k∈Nb

−j

i

k→i k

update the messages synchronously or sequentially may not converge (oscillating behavior) even when convergent only gives an approximation:

(x ) ∝

p ^

i

δ (x )

∏k∈Nb

i

k→i i

is not (proportional to) the exact marginal

p(x

)

i

proportional to normalize the message for numerical stability

slide-27
SLIDE 27

Loopy BP on Loopy BP on factor graphs factor graphs

x

1

x

2

x

3

x

4

x

5

ϕ

{1,2,4}

ϕ

{3,5}

p(x) =

ϕ (x )

Z 1 ∏I I I

factor nodes variable nodes

is a subset of variables I ⊆ {1, … , N}

slide-28
SLIDE 28

Loopy BP on Loopy BP on factor graphs factor graphs

δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message:

x

1

x

2

x

3

x

4

x

5

ϕ

{1,2,4}

ϕ

{3,5}

p(x) =

ϕ (x )

Z 1 ∏I I I

factor nodes variable nodes

is a subset of variables I ⊆ {1, … , N}

slide-29
SLIDE 29

Loopy BP on Loopy BP on factor graphs factor graphs

δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message:

x

1

x

2

x

3

x

4

x

5

ϕ

{1,2,4}

ϕ

{3,5}

p(x) =

ϕ (x )

Z 1 ∏I I I

factor nodes variable nodes

is a subset of variables I ⊆ {1, … , N}

factor-to-variable message:

δ

(x ) ∝

I→i i

ϕ (x ) δ (x )

∑x

I−i

I I ∏j∈I−i j→I i

slide-30
SLIDE 30

Loopy BP on Loopy BP on factor graphs factor graphs

δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message:

(x ) ∝

p ^

i

δ (x )

∏J∣i∈J

J→i i

x

1

x

2

x

3

x

4

x

5

ϕ

{1,2,4}

ϕ

{3,5}

p(x) =

ϕ (x )

Z 1 ∏I I I

factor nodes variable nodes

is a subset of variables I ⊆ {1, … , N}

factor-to-variable message:

δ

(x ) ∝

I→i i

ϕ (x ) δ (x )

∑x

I−i

I I ∏j∈I−i j→I i

after convergence:

slide-31
SLIDE 31

(Loopy) BP has found many applications (Loopy) BP has found many applications

https://graph-tool.skewed.de

Social network analysis: stochastic block modelling Machine Learning: clustering tensor factorization

www.jianxiongxiao.com

Vision: inpainting &denoising stereo matching NLP and bioinformatics: Viterbi algorithm Combinatorial

  • ptimization:
slide-32
SLIDE 32

are observerd are sent through a noisy channel

p(y

=

i

1 ∣ x

=

i

1) = p(y

=

i

0 ∣ x

=

i

0) = 1 − ϵ

x

, … , x

1 n

y

, … , y

1 n

Application: LDPC coding Application: LDPC coding using BP using BP

low-density parity check

slide-33
SLIDE 33

are observerd are sent through a noisy channel

p(y

=

i

1 ∣ x

=

i

1) = p(y

=

i

0 ∣ x

=

i

0) = 1 − ϵ

x

, … , x

1 n

y

, … , y

1 n

the message satisfies parity constraints:

Application: LDPC coding Application: LDPC coding using BP using BP

low-density parity check

ϕ

(x , x , x ) =

stu s t u

{1 if x

⊕ x ⊕ x = 1

s t u

  • therwise
slide-34
SLIDE 34

are observerd

p(x ∣ y) =

ϕ(x , x , x ) (1 −

∏s,t,u

s t u ∏i=1 n

ϵ)I(x

=

i

y

) +

i

ϵI(x

=

i  y

)

i

are sent through a noisy channel

image: wainwright&jordan

p(y

=

i

1 ∣ x

=

i

1) = p(y

=

i

0 ∣ x

=

i

0) = 1 − ϵ

x

, … , x

1 n

y

, … , y

1 n

the message satisfies parity constraints: joint dist. over unobserved message:

Application: LDPC coding Application: LDPC coding using BP using BP

low-density parity check

ϕ

(x , x , x ) =

stu s t u

{1 if x

⊕ x ⊕ x = 1

s t u

  • therwise
slide-35
SLIDE 35

p(x ∣ y) =

ϕ(x , x , x ) (1 −

∏s,t,u

s t u ∏i=1 n

ϵ)I(x

=

i

y

) +

i

ϵI(x

=

i  y

)

i

image: wainwright&jordan

joint dist. over unobserved message: inference problems most likely joint assignment

x =

arg max

p(x ∣

x

y)

Application: LDPC coding Application: LDPC coding using BP using BP

low-density parity check

slide-36
SLIDE 36

p(x ∣ y) =

ϕ(x , x , x ) (1 −

∏s,t,u

s t u ∏i=1 n

ϵ)I(x

=

i

y

) +

i

ϵI(x

=

i  y

)

i

image: wainwright&jordan

joint dist. over unobserved message: inference problems most likely joint assignment max-marginals calculate the marginals using loopy BP

x =

arg max

p(x ∣

x

y) x

=

i ∗

arg max

p(x ∣

x

i

i

y)

p(x

i

y)∀i

Application: LDPC coding Application: LDPC coding using BP using BP

low-density parity check

slide-37
SLIDE 37

Application: LDPC coding Application: LDPC coding using BP using BP

p(x ∣ y) =

ψ(x , x , x ) (1 −

∏s,t,u

s t u ∏i=1 n

ϵ)I(x

=

i

y

) +

i

ϵI(x

=

i  y

)

i

image: wainwright&jordan

joint dist. over unobserved message: inference problems most likely joint assignment

x =

arg max

p(x ∣

x

y)

low-density parity check

slide-38
SLIDE 38

Application: LDPC coding Application: LDPC coding using BP using BP

p(x ∣ y) =

ψ(x , x , x ) (1 −

∏s,t,u

s t u ∏i=1 n

ϵ)I(x

=

i

y

) +

i

ϵI(x

=

i  y

)

i

image: wainwright&jordan

joint dist. over unobserved message: inference problems most likely joint assignment max-marginals calculate the marginals using loopy BP

x =

arg max

p(x ∣

x

y) x

=

i ∗

arg max

p(x ∣

x

i

i

y)

p(x

i

y)∀i

low-density parity check

slide-39
SLIDE 39

Loops and variational interepretation Loops and variational interepretation

arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

H(q ) −

∑i,j∈E

i,j

(∣Nb ∣ −

∑i

i

1)H(q

)

i

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

slide-40
SLIDE 40

Loops and variational interepretation Loops and variational interepretation

arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

the entropy term is not exact anymore called Bethe approximation to the entropy generally not convex anymore (multiple fixed points)

H(q ) −

∑i,j∈E

i,j

(∣Nb ∣ −

∑i

i

1)H(q

)

i

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

slide-41
SLIDE 41

arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

L :

Loops and variational interepretation Loops and variational interepretation

slide-42
SLIDE 42

arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

the entropy term is not exact anymore Local consistency constraints are inadequate: locally consistent may not be marginals for any joint dist.

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

, q

i,j i

L :

Loops and variational interepretation Loops and variational interepretation

slide-43
SLIDE 43

arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

the entropy term is not exact anymore Local consistency constraints are inadequate: locally consistent may not be marginals for any joint dist.

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

, q

i,j i

[q

, … , q , q , … , q ]

1 n 1,3 m,n

[p

, … , p , p , … , p ]

1 n 1,3 m,n

L :

Loops and variational interepretation Loops and variational interepretation

slide-44
SLIDE 44

arg max

H(q) +

q

E

[ ln ϕ (x , x )]

q ∑i,j i,j i j

the entropy term is not exact anymore: improved entropy approximations (e.g., region-based, convex) local consistency constraints are inadequate tighter constraints (e.g., marginal consistency of larger clusters)

Variations on BP Variations on BP

slide-45
SLIDE 45

cluster-graph generalizes clique-tree

clusters are not necessarily max-cliques running intersection property family-preserving property

Variations on BP: Variations on BP: cluster-graph cluster-graph

S

i,j

C

i

C

j

instead of = in clique-tree

slide-46
SLIDE 46

cluster-graph generalizes clique-tree

clusters are not necessarily max-cliques running intersection property family-preserving property

Variations on BP: Variations on BP: cluster-graph cluster-graph

S

i,j

C

i

C

j

instead of = in clique-tree

similar reparametrization:

p(x) ∝

(S )

∏i,j p ^

i,j

(C )

∏i p ^

i

instead of = in clique-tree

slide-47
SLIDE 47

cluster-graph generalizes clique-tree

clusters are not necessarily max-cliques running intersection property family-preserving property

Variations on BP: Variations on BP: cluster-graph cluster-graph

S

i,j

C

i

C

j

instead of = in clique-tree

a factor-graph

A B C D E F

similar reparametrization:

p(x) ∝

(S )

∏i,j p ^

i,j

(C )

∏i p ^

i

instead of = in clique-tree

slide-48
SLIDE 48

cluster-graph generalizes clique-tree

clusters are not necessarily max-cliques running intersection property family-preserving property

Variations on BP: Variations on BP: cluster-graph cluster-graph

S

i,j

C

i

C

j

instead of = in clique-tree

a factor-graph

A B C D E F

corresponding cluster-graph (the same BP updates)

similar reparametrization:

p(x) ∝

(S )

∏i,j p ^

i,j

(C )

∏i p ^

i

instead of = in clique-tree

slide-49
SLIDE 49

cluster-graph generalizes clique-tree

clusters are not necessarily max-cliques running intersection property family-preserving property

Variations on BP: Variations on BP: cluster-graph cluster-graph

S

i,j

C

i

C

j

instead of = in clique-tree

a factor-graph

A B C D E F

corresponding cluster-graph (the same BP updates) improved cluster-graph (better entropy approximation + marginal constraint)

similar reparametrization:

p(x) ∝

(S )

∏i,j p ^

i,j

(C )

∏i p ^

i

instead of = in clique-tree

slide-50
SLIDE 50

BP BP in practice in practice

works well when:

locally tree-like graphs dense graphs with weak interactions

11 x 11 Ising grid

slide-51
SLIDE 51

BP BP in practice in practice

works well when:

locally tree-like graphs dense graphs with weak interactions

sequential update works better than parallel update

δ

(x ) ∝ (1 − α)δ (x ) + α δ (x )

i→I (t+1) i i→I (t) i

∏J∣i∈J,J

=I

 J→i (t) i

improved convergence by damping (smoothing) the update

11 x 11 Ising grid

slide-52
SLIDE 52

Summary Summary

belief propagation: efficient deterministic inference exact in clique-tree = variable elimination application of distributive law

slide-53
SLIDE 53

Summary Summary

belief propagation: efficient deterministic inference exact in clique-tree = variable elimination application of distributive law

  • ptimization perspective:

KL-divergence minimization approximate objective (Bethe free energy) and constraints

slide-54
SLIDE 54

Summary Summary

belief propagation: efficient deterministic inference exact in clique-tree = variable elimination application of distributive law

  • ptimization perspective:

KL-divergence minimization approximate objective (Bethe free energy) and constraints works well in (cluster) graphs with loops (large tree-width)

slide-55
SLIDE 55

bonus slides bonus slides

slide-56
SLIDE 56

Loopy BP on factor graphs: Loopy BP on factor graphs: complexity complexity

x

1

x

2

x

3

x

4

x

5

ϕ

{1,2,4}

ϕ

{3,5}

ndΔ

max 2 δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message:

from each var to all neighbors

number of vars domain size (2 for binary) max neighbours

slide-57
SLIDE 57

Loopy BP on factor graphs: Loopy BP on factor graphs: complexity complexity

x

1

x

2

x

3

x

4

x

5

ϕ

{1,2,4}

ϕ

{3,5}

ndΔ

max 2 δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message: factor-to-variable messages:

δ

(x ) ∝

I→i i

ϕ (x ) δ (x )

∑x

I−i

I I ∏j∈I−i j→I i

from each var to all neighbors

number of vars domain size (2 for binary) max neighbours

md ∣Scope

∣Scope

max

max

number of factors vars in a factor

slide-58
SLIDE 58

Loopy BP on factor graphs: Loopy BP on factor graphs: complexity complexity

x

1

x

2

x

3

x

4

x

5

ϕ

{1,2,4}

ϕ

{3,5}

ndΔ

max 2 δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message: factor-to-variable messages:

δ

(x ) ∝

I→i i

ϕ (x ) δ (x )

∑x

I−i

I I ∏j∈I−i j→I i

from each var to all neighbors

number of vars domain size (2 for binary) max neighbours

md ∣Scope

∣Scope

max

max

number of factors vars in a factor