from elimination to belief propagation
play

From Elimination to Belief Propagation Recall that Induced - PDF document

School of Computer Science The Belief Propagation (Sum-Product) Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 5, Sep 31, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B


  1. School of Computer Science The Belief Propagation (Sum-Product) Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 5, Sep 31, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: J-Chap 4 Gene G Gene G X 7 X 7 X 7 X 8 X 8 X 8 Gene H Gene H 1 From Elimination to Belief Propagation � Recall that Induced dependency during marginalization is captured in elimination cliques Summation <-> elimination � Intermediate term <-> elimination clique � B A B A A C A A C D A E F C D E Can this lead to an generic E � E F G inference algorithm? H Eric Xing 2 1

  2. Tree GMs Directed tree: all Undirected tree: a Poly tree: can have nodes except the root unique path between multiple parents have exactly one any pair of nodes parent We will come back to this later Eric Xing 3 Equivalence of directed and undirected trees Any undirected tree can be converted to a directed tree by choosing a root � node and directing all edges away from it A directed tree and the corresponding undirected tree make the same � conditional independence assertions Parameterizations are essentially the same. � Undirected tree: � Directed tree: � Equivalence: � Evidence:? � Eric Xing 4 2

  3. From elimination to message passing Recall ELIMINATION algorithm: � Choose an ordering Z in which query node f is the final node � Place all potentials on an active list � Eliminate node i by removing all potentials containing i , take sum/product over x i . � Place the resultant factor back on the list � For a TREE graph: � Choose query node f as the root of the tree � View tree as a directed tree with edges pointing towards from f � Elimination ordering based on depth-first traversal � Elimination of each node can be considered as message-passing (or Belief � Propagation) directly along tree branches, rather than on some transformed graphs � thus, we can use the tree itself as a data-structure to do general inference!! Eric Xing 5 The elimination algorithm Procedure Normalization ( φ ∗ ) Procedure Initialize ( G , Z ) P ( X | E )= φ ∗ ( X )/ ∑ x φ ∗ ( X ) Let Z 1 , . . . ,Z k be an ordering of Z 1. 1. such that Z i ≺ Z j iff i < j Initialize F with the full the set of 2. factors Procedure Evidence ( E ) for each i ∈ Ι E , 1. Procedure Sum-Product-Eliminate-Var ( F = F ∪δ ( E i , e i ) F , // Set of factors Z // Variable to be eliminated Procedure Sum-Product-Variable- Elimination ( F , Z , ≺ ) ) F ′ ← { φ ∈ F : Z ∈ Scope [ φ ]} 1. for i = 1, . . . , k 1. F ′′ ← F − F ′ 2. F ← Sum-Product-Eliminate-Var( F , Z i ) ψ ← ∏ φ ∈ F ′ φ φ ∗ ← ∏ φ∈ F φ 3. 2. τ ← ∑ Z ψ 4. return φ ∗ 3. return F ′′ ∪ { τ } Normalization ( φ ∗ ) 5. 4. Eric Xing 6 3

  4. Message passing for trees Let m ij ( x i ) denote the factor resulting from f eliminating variables from bellow up to i , which is a function of x i : This is reminiscent of a message sent i from j to i . j k l m ij ( x i ) represents a "belief" of x i from x j ! Eric Xing 7 � Elimination on trees is equivalent to message passing along tree branches! f i j k l Eric Xing 8 4

  5. The message passing protocol: A node can send a message to its neighbors when (and only when) � it has received messages from all its other neighbors. Computing node marginals: � Naïve approach: consider each node as the root and execute the message � passing algorithm m 21 (x 1 ) X 1 Computing P(X 1 ) m 32 (x 2 ) m 42 (x 2 ) X 2 X 3 X 4 Eric Xing 9 The message passing protocol: A node can send a message to its neighbors when (and only when) � it has received messages from all its other neighbors. Computing node marginals: � Naïve approach: consider each node as the root and execute the message � passing algorithm m 12 (x 2 ) X 1 Computing P(X 2 ) m 32 (x 2 ) m 42 (x 2 ) X 2 X 3 X 4 Eric Xing 10 5

  6. The message passing protocol: A node can send a message to its neighbors when (and only when) � it has received messages from all its other neighbors. Computing node marginals: � Naïve approach: consider each node as the root and execute the message � passing algorithm m 12 (x 2 ) X 1 Computing P(X 3 ) m 23 (x 3 ) m 42 (x 2 ) X 2 X 3 X 4 Eric Xing 11 Computing node marginals � Naïve approach: Complexity: NC � N is the number of nodes � C is the complexity of a complete message passing � � Alternative dynamic programming approach 2-Pass algorithm (next slide � ) � Complexity: 2C! � Eric Xing 12 6

  7. The message passing protocol: � A two-pass algorithm: X 1 m 21 (X 1 ) m 12 (X 2 ) m 32 (X 2 ) m 42 (X 2 ) X 2 X 4 X 3 m 24 (X 4 ) m 23 (X 3 ) Eric Xing 13 Belief Propagation ( SP-algorithm ): Sequential implementation Eric Xing 14 7

  8. Belief Propagation ( SP-algorithm ): Parallel synchronous implementation For a node of degree d, whenever messages have arrived on any subset of d-1 � node, compute the message for the remaining edge and send! A pair of messages have been computed for each edge, one for each direction � All incoming messages are eventually computed for each node � Eric Xing 15 Correctness of BP on tree � Collollary: the synchronous implementation is "non-blocking" � Thm: The Message Passage Guarantees obtaining all marginals in the tree � What about non-tree? Eric Xing 16 8

  9. Another view of SP: Factor Graph � Example 1 X 1 X 5 X 1 X 5 f d f a X 3 X 3 f c f e f b X 2 X 4 X 2 X 4 P(X 1 ) P(X 2 ) P(X 3 |X 1 ,X 2 ) P(X 5 |X 1 ,X 3 ) P(X 4 |X 2 ,X 3 ) f a (X 1 ) f b (X 2 ) f c (X 3 ,X 1 ,X 2 ) f d (X 5 ,X 1 ,X 3 ) f e (X 4 ,X 2 ,X 3 ) Eric Xing 17 Factor Graphs � Example 2 X 1 X 1 f a f c X 2 X 3 X 2 X 3 f b ψ( x 1 ,x 2 ,x 3 ) = f a (x 1 ,x 2 )f b (x 2 ,x 3 )f c (x 3 ,x 1 ) � Example 3 X 1 X 1 f a X 2 X 3 X 2 X 3 ψ( x 1 ,x 2 ,x 3 ) = f a (x 1 ,x 2 ,x 3 ) Eric Xing 18 9

  10. Factor Tree � A Factor graph is a Factor Tree if the undirected graph obtained by ignoring the distinction between variable nodes and factor nodes is an undirected tree X 1 X 1 f a X 2 X 3 X 2 X 3 ψ( x 1 ,x 2 ,x 3 ) = f a (x 1 ,x 2 ,x 3 ) Eric Xing 19 Message Passing on a Factor Tree � Two kinds of messages ν : from variables to factors 1. µ : from factors to variables 2. f 1 x j f s x i f s x i f 3 x k Eric Xing 20 10

  11. Message Passing on a Factor Tree, con'd � Message passing protocol: A node can send a message to a neighboring node only when it has � received messages from all its other neighbors � Marginal probability of nodes: f 1 x j f s x i f s x i f 3 x k P(x i ) ∝ ∏ s 2 N(i) µ si (x i ) ∝ ν is (x i ) µ si (x i ) Eric Xing 21 BP on a Factor Tree X 3 X 1 X 2 ν 1d µ e2 µ d2 ν 3e f d f e X 3 X 1 X 2 µ d1 ν 2d ν 2e µ e3 µ c3 µ a1 ν 3c ν 1a µ b2 ν 2b f c f a f b Eric Xing 22 11

  12. Why factor graph? � Tree-like graphs to Factor trees X 1 X 1 X 2 X 2 X 4 X 3 X 3 X 4 X 5 X 6 X 5 X 6 Eric Xing 23 Poly-trees to Factor trees X 2 X 1 X 1 X 2 X 3 X 4 X 3 X 4 X 5 X 5 Eric Xing 24 12

  13. Why factor graph? Because FG turns tree-like � X 1 graphs to factor trees, X 1 and trees are a data-structure � X 2 X 2 that guarantees correctness of BP ! X 4 X 3 X 3 X 4 X 5 X 6 X 5 X 6 X 2 X 1 X 1 X 2 X 3 X 4 X 3 X 4 X 5 X 5 Eric Xing 25 Max-product algorithm: computing MAP probabilities f i j k l Eric Xing 26 13

  14. Max-product algorithm: computing MAP configurations using a final bookkeeping backward pass f i j k l Eric Xing 27 Summary � Sum-Product algorithm computes singleton marginal probabilities on: Trees � Tree-like graphs � Poly-trees � � Maximum a posteriori configurations can be computed by replacing sum with max in the sum-product algorithm Extra bookkeeping required � Eric Xing 28 14

  15. Inference on general GM Now, what if the GM is not a tree-like graph? � Can we still directly run message � message-passing protocol along its edges? For non-trees, we do not have the guarantee that message-passing � will be consistent! Then what? � Construct a graph data-structure from P that has a tree structure, and run � message-passing on it! � Junction tree algorithm Eric Xing 29 Elimination Clique � Recall that Induced dependency during marginalization is captured in elimination cliques Summation <-> elimination � Intermediate term <-> elimination clique � B A B A A C A A C D A E F C D E Can this lead to an generic E � E F G inference algorithm? H Eric Xing 30 15

  16. A Clique Tree B A B A A m m b c C m A A d m C D f m A E F e C D m h E m E E F g G H ( , , ) m a c d e ∑ = p ( e | c , d ) m ( e ) m ( a , e ) g f e Eric Xing 31 From Elimination to Message Passing Elimination ≡ message passing on a clique tree � B A B A B A B A B A B A B A B A A C D C D C D C D C D C C D E F E F E F E F E G H G H G ≡ B A B A A m m c b C m A A d m C D f A m ( , , ) m a c d E F e e C D ∑ = m p ( e | c , d ) m ( e ) m ( a , e ) g f h E E e m E F g G H Messages can be reused � Eric Xing 32 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend