Sum-Product: Message Passing Belief Propagation Probabilistic - - PowerPoint PPT Presentation
Sum-Product: Message Passing Belief Propagation Probabilistic - - PowerPoint PPT Presentation
Sum-Product: Message Passing Belief Propagation Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani All single-node marginals If we need the full set of marginals, repeating elimination algorithm for each
All single-node marginals
2
If we need the full set of marginals, repeating elimination
algorithm for each individual variable is wasteful
It does not share intermediate terms
Message-passing algorithms on graphs (messages are
the shared intermediate terms).
sum-product and junction tree upon convergence of the algorithms, we obtain marginal
probabilities for all cliques of the original graph.
Tree
3
Sum-product work only in trees (and we will see it also
work on tree-like graphs)
Directed tree All nodes have one parent expect to the root Undirected tree A unique path between any pair of nodes
Parameterization
4
Consider a tree 𝒰(𝒲, ℰ) Potential functions: 𝜚 𝑦𝑗 , 𝜚(𝑦𝑗, 𝑦𝑘)
𝑄 𝒚 = 1 𝑎
𝑗∈𝒲
𝜚 𝑦𝑗
𝑗,𝑘 ∈ℰ
𝜚 𝑦𝑗, 𝑦𝑘
In directed graphs:
𝜚 𝑦𝑠 = 𝑄(𝑦𝑠), ∀𝑗 ≠ 𝑠, 𝜚 𝑦𝑗 = 1 𝜚 𝑦𝑗, 𝑦𝑘 = 𝑄(𝑦𝑘|𝑦𝑗) (𝑦𝑗 is the parent of 𝑦𝑘) 𝑎 = 1
When we have evidence on variable 𝑦𝑗 as 𝑦𝑗 =
𝑦𝑗 we replace 𝑦𝑗 in all factors in which it appears by 𝑦𝑗
𝑄 𝒚 = 𝑄(𝑦𝑠)
𝑗,𝑘 ∈ℰ
𝑄 𝑦𝑘|𝑦𝑗
Sum-product: elimination view
5
Query node 𝑠 Elimination order: inverse of the topological order
Starts from leaves and generates elimination cliques of size at
most two
Elimination of each node can be considered as message-
passing (or Belief Propagation):
Elimination on trees is equivalent to message passing along tree
branches
Instead of the node elimination, we preserve the node and
compute a message from it to its parent
This message is equivalent to the factor resulted from the elimination
- f that node and all of the nodes in its subtree
Messages
6
A node can send a message to its neighbors when (and
- nly when) it has received messages from all its other
neighbors.
Message that 𝑘 sends to 𝑗 … root
Messages and marginal distribution
7
Message that X𝑘 sends to 𝑌𝑗 𝑛𝑘𝑗 𝑦𝑗 =
𝑦𝑘
𝜚 𝑦𝑘 𝜚 𝑦𝑗, 𝑦𝑘
𝑙∈𝒪(𝑘)\𝑗
𝑛𝑙𝑘(𝑦𝑘) 𝑞 𝑦𝑠 ∝ 𝜚 𝑦𝑠
𝑙∈𝒪(𝑠)
𝑛𝑙𝑠(𝑦𝑠)
a function of only 𝑦𝑗
Messages and marginal: Example
8
Compute 𝑞 𝑦1
𝑞 𝑦1 ∝ 𝜚 𝑦1 𝑛21(𝑦1) 𝑛21 𝑦1 =
𝑦2
𝜚 𝑦2 𝜚 𝑦1, 𝑦2 𝑛32(𝑦2)𝑛42(𝑦2)
21
𝑛32 𝑦2 =
𝑦3
𝜚 𝑦3 𝜚 𝑦2, 𝑦3 𝑛42 𝑦2 =
𝑦4
𝜚 𝑦4 𝜚 𝑦2, 𝑦4 Product remained factors (after eliminating all variables except to 𝑦1)
Messages and marginal: Example
9
Compute 𝑞 𝑦2
𝑞 𝑦2 ∝ 𝜚 𝑦2 𝑛12(𝑦2)𝑛32(𝑦2)𝑛42(𝑦2) 𝑛12 𝑦2 =
𝑦1
𝜚 𝑦1 𝜚 𝑦1, 𝑦2 𝑛32 𝑦2 =
𝑦3
𝜚 𝑦3 𝜚 𝑦2, 𝑦3 𝑛42 𝑦2 =
𝑦4
𝜚 𝑦4 𝜚 𝑦2, 𝑦4
Messages on a tree
10
Messages can be reused to find probabilities on different
query variables.
Messages on the tree provide a data structure for caching
computations.
𝑌1 𝑌2 𝑌3 𝑌4 𝑌5 We need 𝑛32(𝑦2) to find both 𝑄(𝑌1) and 𝑄(𝑌2)
From elimination to message passing
11
Recall ELIMINATION algorithm:
Choose an ordering Z in which query node f is the final node Place all potentials on an active list Eliminate node i by removing all potentials containing i, take sum over xi Place the resultant factor back on the list
For a TREE graph:
Choose query node f as the root of the tree
View tree as a directed tree with edges pointing towards leaves from f
Elimination ordering based on reverse topological order Elimination of each node can be considered as message-passing directly
along tree branches
Thus, we can use the tree itself as a data-structure to do general
inference!!
This slide has been adopted from Eric Zing, PGM 10708, CMU.
Computing all node marginals
12
We can compute over all possible elimination order
(generating only elimination cliques of size 2) by only computing all possible messages (2 ℰ )
T
- allow all nodes can be the root, we just need to compute
2 ℰ messages
Messages can be reused
Instead of running the elimination algorithm 𝑂 times
Dynamic programming approach
2-Pass algorithm that saves and uses messages
A pair of messages (one for each direction) have been computed for
each edge
Messages required to compute all node marginals
13
Computing node marginals:
14
Naïve approach:
Complexity: N×C
N is the number of nodes C is the complexity of a complete message passing
Alternative dynamic programming approach
2-Pass algorithm
Complexity: 2C!
A two-pass message-passing schedule
15
Arbitrarily pick a node as the root
First pass: starting at the leaves and proceeds inward
each node passes a message to its parent. continues until the root has obtained messages from all of its
adjoining nodes.
Second pass: starting at the root and passing the messages back
- ut
messages are passed in the reverse direction. continues until all leaves have received their messages.
Asynchronous two-pass message-passing
16
First pass: upward Second pass: downward
Sum-product algorithm: example
17
𝑛21(𝑦1) 𝑛21(𝑦1)
Sum-product algorithm: example
18
𝑛21(𝑦1)
Parallel (synchronous) message-passing
19
For a node of degree d, whenever messages have arrived
- n any subset of d-1 node, compute the message for the
remaining edge and send!
A pair of messages have been computed for each edge,
- ne for each direction
All incoming messages are eventually computed for each
node
Parallel message-passing
20
Message-passing protocol: a node can send a message to a
neighboring node when and only when it has received messages from all of its other neighbors
Correctness of parallel message-passing on trees
The synchronous implementation is “non-blocking” Theorem:
The message-passing guarantees
- btaining
all marginals in the tree
Parallel message passing: Example
21
Tree-like graphs
22
Sum-product message passing idea can also be extended
to work in tree-like graphs (e.g., polytrees) too.
Although the undirected marginalized graphs resulted
from polytrees are not tree, the corresponding factor graph is a tree
Polytree Nodes can have multiple parents Moralized graph Factor graph
References
23