From Elimination to Belief Propagation Recall that Induced - - PDF document

from elimination to belief propagation
SMART_READER_LITE
LIVE PREVIEW

From Elimination to Belief Propagation Recall that Induced - - PDF document

School of Computer Science The Belief Propagation (Sum-Product) Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 5, Sep 31, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B


slide-1
SLIDE 1

1

1

School of Computer Science

Receptor A Kinase C TF F Gene G Gene H Kinase E Kinase D Receptor B X1 X2 X3 X4 X5 X6 X7 X8 Receptor A Kinase C TF F Gene G Gene H Kinase E Kinase D Receptor B X1 X2 X3 X4 X5 X6 X7 X8 X1 X2 X3 X4 X5 X6 X7 X8

Probabilistic Graphical Models (10 Probabilistic Graphical Models (10-

  • 708)

708)

Lecture 5, Sep 31, 2007

Eric Xing Eric Xing

The Belief Propagation (Sum-Product) Algorithm

Reading: J-Chap 4

Eric Xing 2

From Elimination to Belief Propagation

Recall that Induced dependency during marginalization is

captured in elimination cliques

  • Summation <-> elimination
  • Intermediate term <-> elimination clique
  • Can this lead to an generic

inference algorithm?

E F H A E F B A C E G A D C E A D C B A A

slide-2
SLIDE 2

2

Eric Xing 3

Undirected tree: a unique path between any pair of nodes Directed tree: all nodes except the root have exactly one parent Poly tree: can have multiple parents

We will come back to this later

Tree GMs

Eric Xing 4

  • Any undirected tree can be converted to a directed tree by choosing a root

node and directing all edges away from it

  • A directed tree and the corresponding undirected tree make the same

conditional independence assertions

  • Parameterizations are essentially the same.
  • Undirected tree:
  • Directed tree:
  • Equivalence:
  • Evidence:?

Equivalence of directed and undirected trees

slide-3
SLIDE 3

3

Eric Xing 5

From elimination to message passing

  • Recall ELIMINATION algorithm:
  • Choose an ordering Z in which query node f is the final node
  • Place all potentials on an active list
  • Eliminate node i by removing all potentials containing i, take sum/product over xi.
  • Place the resultant factor back on the list
  • For a TREE graph:
  • Choose query node f as the root of the tree
  • View tree as a directed tree with edges pointing towards from f
  • Elimination ordering based on depth-first traversal
  • Elimination of each node can be considered as message-passing (or Belief

Propagation) directly along tree branches, rather than on some transformed graphs thus, we can use the tree itself as a data-structure to do general inference!!

Eric Xing 6

The elimination algorithm

Procedure Normalization (φ∗)

1.

P(X|E)=φ∗(X)/∑xφ∗(X) Procedure Sum-Product-Eliminate-Var (

F, // Set of factors Z // Variable to be eliminated )

1.

F ′ ← {φ ∈ F : Z ∈ Scope[φ]}

2.

F ′′ ← F − F ′

3.

ψ ←∏φ∈F ′ φ

4.

τ ← ∑Z ψ

5.

return F ′′ ∪ {τ} Procedure Initialize (G, Z)

1.

Let Z1, . . . ,Zk be an ordering of Z such that Zi ≺ Zj iff i < j

2.

Initialize F with the full the set of factors Procedure Evidence (E)

1.

for each i∈ΙE ,

F =F ∪δ(Ei, ei)

Procedure Sum-Product-Variable- Elimination (F, Z, ≺)

1.

for i = 1, . . . , k F ← Sum-Product-Eliminate-Var(F, Zi)

2.

φ∗ ← ∏φ∈F φ

3.

return φ∗

4.

Normalization (φ∗)

slide-4
SLIDE 4

4

Eric Xing 7

f i j k l

Message passing for trees

Let mij(xi) denote the factor resulting from eliminating variables from bellow up to i, which is a function of xi: This is reminiscent of a message sent from j to i.

mij(xi) represents a "belief" of xi from xj!

Eric Xing 8

Elimination on trees is equivalent to message passing along

tree branches!

f i j k l

slide-5
SLIDE 5

5

Eric Xing 9

X1 X4 X3 X2

Computing P(X1)

m32(x2) m42(x2) m21(x1)

The message passing protocol:

  • A node can send a message to its neighbors when (and only when)

it has received messages from all its other neighbors.

  • Computing node marginals:
  • Naïve approach: consider each node as the root and execute the message

passing algorithm

Eric Xing 10

X1 X4 X3 X2

Computing P(X2)

m32(x2) m42(x2) m12(x2)

The message passing protocol:

  • A node can send a message to its neighbors when (and only when)

it has received messages from all its other neighbors.

  • Computing node marginals:
  • Naïve approach: consider each node as the root and execute the message

passing algorithm

slide-6
SLIDE 6

6

Eric Xing 11

X1 X4 X3 X2

Computing P(X3)

m23(x3) m42(x2) m12(x2)

The message passing protocol:

  • A node can send a message to its neighbors when (and only when)

it has received messages from all its other neighbors.

  • Computing node marginals:
  • Naïve approach: consider each node as the root and execute the message

passing algorithm

Eric Xing 12

Computing node marginals

Naïve approach:

  • Complexity: NC
  • N is the number of nodes
  • C is the complexity of a complete message passing

Alternative dynamic programming approach

  • 2-Pass algorithm (next slide )
  • Complexity: 2C!
slide-7
SLIDE 7

7

Eric Xing 13

m24(X 4)

X1 X2 X3 X4

The message passing protocol:

A two-pass algorithm:

m21(X 1) m32(X 2) m42(X 2) m12(X 2) m23(X 3)

Eric Xing 14

Belief Propagation (SP-algorithm): Sequential implementation

slide-8
SLIDE 8

8

Eric Xing 15

Belief Propagation (SP-algorithm): Parallel synchronous implementation

  • For a node of degree d, whenever messages have arrived on any subset of d-1

node, compute the message for the remaining edge and send!

  • A pair of messages have been computed for each edge, one for each direction
  • All incoming messages are eventually computed for each node

Eric Xing 16

Correctness of BP on tree

Collollary: the synchronous implementation is "non-blocking" Thm: The Message Passage Guarantees obtaining all

marginals in the tree

What about non-tree?

slide-9
SLIDE 9

9

Eric Xing 17

Example 1

X1 X2 X3 X5 X4 X1 X2 X3 X5 X4 P(X1) P(X2) P(X3|X1,X2) P(X5|X1,X3) P(X4|X2,X3) fa(X1) fb(X2) fc(X3,X1,X2) fd(X5,X1,X3) fe(X4,X2,X3) fa fb fc fd fe

Another view of SP: Factor Graph

Eric Xing 18

Example 2 Example 3

X1 X2 ψ(x1,x2,x3) = fa(x1,x2)fb(x2,x3)fc(x3,x1) ψ(x1,x2,x3) = fa(x1,x2,x3) X3 fa fc fb X1 X2 X3 X1 X2 X3 fa X1 X2 X3

Factor Graphs

slide-10
SLIDE 10

10

Eric Xing 19

Factor Tree

A Factor graph is a Factor Tree if the undirected graph

  • btained by ignoring the distinction between variable nodes

and factor nodes is an undirected tree

ψ(x1,x2,x3) = fa(x1,x2,x3) X1 X2 X3 fa X1 X2 X3

Eric Xing 20

xi

f1 fs f3

xj xi xk fs

Message Passing on a Factor Tree

Two kinds of messages

1.

ν: from variables to factors

2.

µ: from factors to variables

slide-11
SLIDE 11

11

Eric Xing 21

Message Passing on a Factor Tree, con'd

P(xi) ∝ ∏s 2 N(i) µsi(xi) ∝ νis(xi)µsi(xi)

Message passing protocol:

  • A node can send a message to a neighboring node only when it has

received messages from all its other neighbors

Marginal probability of nodes:

xi

f1 fs f3

xj xi xk fs

Eric Xing 22

X1 X2 X3 X1 X2 X3 fd fe fa fc fb µa1 µb2 µc3 ν1d ν3e µd2 µe2 ν2d ν2e ν2b µd1 µe3 ν1a ν3c

BP on a Factor Tree

slide-12
SLIDE 12

12

Eric Xing 23

Tree-like graphs to Factor trees

X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6

Why factor graph?

Eric Xing 24

X1 X2 X3 X5 X4 X1 X2 X3 X5 X4

Poly-trees to Factor trees

slide-13
SLIDE 13

13

Eric Xing 25

Why factor graph?

  • Because FG turns tree-like

graphs to factor trees,

  • and trees are a data-structure

that guarantees correctness of BP !

X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X5 X4 X1 X2 X3 X5 X4 Eric Xing 26

Max-product algorithm: computing MAP probabilities

f i j k l

slide-14
SLIDE 14

14

Eric Xing 27

Max-product algorithm:

computing MAP configurations using a final bookkeeping backward pass

f i j k l

Eric Xing 28

Sum-Product algorithm computes singleton marginal

probabilities on:

  • Trees
  • Tree-like graphs
  • Poly-trees

Maximum a posteriori configurations can be computed by

replacing sum with max in the sum-product algorithm

  • Extra bookkeeping required

Summary

slide-15
SLIDE 15

15

Eric Xing 29

Inference on general GM

  • Now, what if the GM is not a tree-like graph?
  • Can we still directly run message

message-passing protocol along its edges?

  • For non-trees, we do not have the guarantee that message-passing

will be consistent!

  • Then what?
  • Construct a graph data-structure from P that has a tree structure, and run

message-passing on it!

Junction tree algorithm

Eric Xing 30

Recall that Induced dependency during marginalization is

captured in elimination cliques

  • Summation <-> elimination
  • Intermediate term <-> elimination clique
  • Can this lead to an generic

inference algorithm?

Elimination Clique

E F H A E F B A C E G A D C E A D C B A A

slide-16
SLIDE 16

16

Eric Xing 31

A Clique Tree

E F H A E F B A C E G A D C E A D C B A A

h

m

g

m

e

m

f

m

b

m

c

m

d

m

=

e f g e

e a m e m d c e p d c a m ) , ( ) ( ) , | ( ) , , (

Eric Xing 32

  • Elimination ≡ message passing on a clique tree
  • Messages can be reused

E F H A E F B A C E G A D C E A D C B A A h

m

g

m

e

m

f

m

b

m

c

m

d

m

B A D C E F G H B A D C E F G H B A D C B A D C E F G B A D C E F B A D C E B A C B A A

≡ From Elimination to Message Passing

=

e f g e

e a m e m d c e p d c a m ) , ( ) ( ) , | ( ) , , (

slide-17
SLIDE 17

17

Eric Xing 33 E F H A E F B A C E G A D C E A D C B A A c

m

b

m

g

m

e

m

d

m

f

m

h

m

From Elimination to Message Passing

  • Elimination ≡ message passing on a clique tree
  • Another query ...
  • Messages mf and mh are reused, others need to be recomputed