Junction-tree algorithm Probabilistic Graphical Models Sharif - - PowerPoint PPT Presentation
Junction-tree algorithm Probabilistic Graphical Models Sharif - - PowerPoint PPT Presentation
Junction-tree algorithm Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani Inference on general GM Now, what if the GM is not a tree-like graph? Can we still directly run message-passing protocol along
Inference on general GM
2
Now, what if the GM is not a tree-like graph? Can we still directly run message-passing protocol along
its edges?
For non-trees, we do not have the guarantee that message-
passing will be consistent!
Construct a graph data-structure from P that has a tree
structure, and run message-passing on it!
Junction tree algorithm
Junction-tree algorithm: a general approach
3
Junction trees as opposed to the sum-product on trees can be
applied on general graphs
Junction tree as opposed to the elimination algorithm is not
“query-oriented”
enables us to record and use the intermediated factors to respond to
multiple queries simultaneously
Upon convergence of the algorithms, we obtain marginal probabilities for all
cliques of the original graph.
Example: variable elimination and cluster tree
4
Elimination order: 𝑌6, 𝑌5, 𝑌4, 𝑌3, 𝑌2
Moralized graph 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4
Example: elimination cliques
5
Elimination order: 𝑌6, 𝑌5, 𝑌4, 𝑌3, 𝑌2
𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4
Example: clique tree obtained by VE
6
𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 Elimination order: 𝑌6, 𝑌5, 𝑌4, 𝑌3, 𝑌2 The clique tree contains the cliques (fully connected subsets)
generated as elimination executes
This cluster graph induced by an execution ofVE is necessarily a tree
Indeed, after an elimination, the corresponding elimination clique will not be
reappear
Example: clique tree obtained by VE
7
𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 Elimination order: 𝑌6, 𝑌5, 𝑌4, 𝑌3, 𝑌2 The clique tree contains the cliques (fully connected subsets)
generated as elimination executes
This graph induced by an execution ofVE is necessarily a tree
Indeed, after an elimination, the corresponding elimination clique will not be
reappear
Maximal cliques
Example: Elimination ≡ message passing on a clique tree
8
This slide has been adopted from Eric Zing, PGM 10708, CMU.
Computation reuse
9
Another query ... Messages 𝑛𝑔
and 𝑛ℎ are reused, others need to be recomputed
This slide has been adopted from Eric Zing, PGM 10708, CMU.
Cluster tree
10
Cluster tree is a singly connected graph (i.e., exactly one
path between each pair of nodes) in which the nodes are the clusters of an underlying graph
A separator set is defined each linked pair of clusters
contain the variables in the intersection of the clusters
𝑌𝐵, 𝑌𝐶 𝑌𝐶, 𝑌𝐷 𝑌𝐵 𝑌𝐶 𝑌𝐷 separator set 𝑌𝐶
Junction tree property
11
Junction tree property: If a variable appears in the two
cliques in the tree, it must appear in all cliques on the paths connecting them
For every pair of cliques 𝐷𝑗 and 𝐷
𝑘, all cliques on the path
between 𝐷𝑗 and 𝐷
𝑘 contain 𝑇𝑗𝑘 = 𝐷𝑗 ∩ 𝐷 𝑘
Also called as running intersection property The cluster tree that satisfies the running intersection
property is called junction tree.
Clique tree usefulness
12
Clique tree provides a structure for caching computations
Multiple
queries can be performed much more efficiently than performingVE for each one separately.
dictates
a partial
- rder
- ver
the
- perations
that are performed on factors to reach a better computational complexity
Theorem
13
The tree induced by a variable elimination algorithm
satisfies running intersection property
Proof:
Let 𝐷 and 𝐷′ be two clusters that contain 𝑌 and 𝐷𝑌 be the
cluster where 𝑌 is eliminated, we will prove that 𝑌 must be present in every clique on the path between 𝐷 and 𝐷𝑌 (and similarly on the path between 𝐷𝑌 and 𝐷′)
Idea: the computation at 𝐷𝑌 must happen later than the computation
at 𝐷 or 𝐷′
Clique trees from variable elimination
14
Each clique in the clique tree induced by VE is also a clique in
the induced graph and vice versa.
However, for inference we can reduce the clique tree to
contain only maximal cliques of the induced graph.
𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4
Triangulated graphs
15
What class of graphs have junction tree? A triangulated (or chordal) graph contains no cycles with four
- r more nodes in which there is no chord
Triangulation is the necessary and sufficient condition for a
graph to have a junction tree
only triangulated graphs have the property that their cluster trees are
junction trees.
Elimination algorithm: triangulation
16
Every induced graph (by variable elimination) is chordal For, any chordal graph there is an elimination ordering that
does not add any fill edges
In general, finding the best triangulation is NP-hard but some
good heuristics exist
Moralized graph 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 Induced graph 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1, 𝑌3, 𝑌4, 𝑌1, 𝑌5, 𝑌6
Building a junction tree
17
Different
junction trees are
- btained
for different triangulations
Obtained from different elimination orders (and different maximum
spanning trees).
Complexity of junction tree algorithms
The time and space complexity is dominated by the size of the largest
clique in the junction tree (exponential in the size of the largest clique)
Finding the junction tree with the smallest cliques is an NP-
hard problem.
finding the optimum ordering in the Elimination algorithm is NP-hard but for many graph optimum or near-optimum can often be heuristically
found
Junction-tree construction
18
Construct the undirected graph Triangulate the graph
e.g., Find an induced graph resulted from VE with a specified elimination
- rder of nodes
Find the set of maximal elimination cliques of the triangulated
graph
Build a weighted, complete graph over these maximal cliques.
Weight each edge (between cliques 𝐷𝑗 and 𝐷
𝑘 as 𝐷𝑗 ∩ 𝐷 𝑘 )
Find a Maximal SpanningTree as a junction tree for 𝐻
A cluster-tree is a junction-tree iff it is maximal spanning tree
Junction tree construction: Example
19
𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4
{𝑌1, 𝑌2, 𝑌3} {𝑌2, 𝑌3, 𝑌5} {𝑌2, 𝑌5, 𝑌6} {𝑌2, 𝑌4} {𝑌2, 𝑌3} {𝑌2} {𝑌2} {𝑌2, 𝑌5} {𝑌2} {𝑌2} {𝑌1, 𝑌2, 𝑌3} {𝑌2, 𝑌3, 𝑌5} {𝑌2, 𝑌5, 𝑌6} {𝑌2, 𝑌4} {𝑌2, 𝑌3} {𝑌2} {𝑌2} {𝑌2, 𝑌5} {𝑌2} {𝑌2}
Junction tree algorithm
20
Given a factorized probability distribution 𝑄 with the Markov
network 𝐼, builds a junction tree 𝑈 based on 𝐼
For each clique, it finds the marginal probability over the
variables in that clique
Message-passing sum product (Shafer-Shenoy algorithm)
Run a message-passing algorithm on the junction tree constructed according
to the distribution
Belief update: Local consistency preservation (Hugin algorithm)
rescaling (update) equations
Junction tree algorithm: inference
21
Junction tree inference algorithm is a message passing on
a junction tree structure.
Each clique starts with a set of initial factors. Each clique sends one message to each neighbor in a schedule. Finally, for each clique, the marginal over its variables is
computed.
Junction tree algorithm: inference
22
Junction tree inference algorithm is a message passing on
a junction tree structure.
Each clique starts with a set of initial factors.
We assign a factor in the distribution 𝑄 to one and only one clique in
𝑈 if the scope of the factor is a subset of the variables in that clique
𝜔𝑗 =
𝜚∈𝐺𝑗
𝜚 𝐺𝑗 shows the set of factors assigned to clique 𝐷𝑗
Junction tree algorithm: inference
23
Junction tree inference algorithm is a message passing on
a junction tree structure.
Each clique starts with a set of initial factors. Each clique sends one message to each neighbor in a schedule.
Each clique multiplies the incoming messages and its potential, sum
- ut over one or more variables and send an outcoming message.
𝑛𝑗𝑘 𝑇𝑗𝑘 =
𝐷𝑗−S𝑗𝑘
𝜔𝑗
𝑙∈𝒪 𝑗 −{j}
𝑛𝑙𝑗(𝑇𝑙𝑗)
𝐷
𝑘
𝑇𝑗𝑘 𝐷𝑗 𝑛𝑗𝑘 𝑇𝑗𝑘
Separation set
24
Theorem 1: In a clique tree induced by a variable
elimination algorithm, let 𝑛𝑗𝑘 be a message that 𝐷𝑗 sends to the neighboring cluster 𝐷
𝑘 then the scope of this
message is 𝑇𝑗𝑘 = 𝐷𝑗⋂𝐷
𝑘
Theorem
2: A cluster tree satisfies running intersection property if and
- nly
if for every separation set 𝑇𝑗𝑘, 𝑊
≺ 𝑗,𝑘 and 𝑊 ≻ 𝑗,𝑘 are separated in 𝐼
given 𝑇𝑗𝑘
𝑊
≺ 𝑗,𝑘 : set of all variables in the scope of all
cliques in the 𝐷𝑗 side of the edge (𝑗, 𝑘)
Junction tree algorithm: inference
25
Junction tree inference algorithm is a message passing on
a junction tree structure.
Each clique starts with a set of initial factors. Each clique sends one message to each neighbor in a schedule. Finally, for each clique, the marginal over its variables is
computed.
After message-passing, for each clique, by combining its potential with
the received messages from its neighbors, we can compute the marginal over its variables. 𝑄(𝐷𝑠) ∝ 𝜔𝑠
𝑙∈𝒪(𝑠)
𝑛𝑙𝑠(𝑇𝑙𝑠)
Marginal on a clique as a product of the initial potential and the messages from its neighbors
Junction-tree message passing: Shafer-Shenoy algorithm
26
𝑛𝑗𝑘 𝑇𝑗𝑘 =
𝐷𝑗−S𝑗𝑘
𝜔𝑗
𝑙∈𝒪 𝑗 −{j}
𝑛𝑙𝑗(𝑇𝑙𝑗) 𝑄(𝐷𝑠) ∝ 𝜔𝑠
𝑙∈𝒪(𝑠)
𝑛𝑙𝑠(𝑇𝑙𝑠)
𝐷
𝑘
𝑇𝑗𝑘 𝐷𝑗 𝑛𝑗𝑘 𝑇𝑗𝑘 𝑛𝑘𝑗 𝑇𝑗𝑘
𝑄(𝑇𝑗𝑘) ∝ 𝑛𝑗𝑘 𝑇𝑗𝑘 𝑛𝑘𝑗 𝑇𝑗𝑘
𝜔𝑗 =
𝜚∈𝐺𝑗
𝜚 𝐺𝑗 shows the set of factors assigned to clique 𝐷𝑗
Junction-tree algorithm: correctness
27
If 𝑌 is eliminated when a message is sent from 𝐷𝑗 to a neighboring 𝐷
𝑘 such
that 𝑌 ∈ 𝐷𝑗 and 𝑌 ∉ 𝐷
𝑘, then 𝑌 does not appear in the tree on the 𝐷 𝑘 side
- f the edge (𝑗, 𝑘) after elimination (according to junction-tree property)
𝑊
≺ 𝑗,𝑘 : set of all variables in the scope of all
cliques in the 𝐷𝑗 side of the edge (𝑗, 𝑘) 𝐺≺ 𝑗,𝑘 : set of factors in the cliques in the 𝐷𝑗 side
- f the edge (𝑗, 𝑘)
𝐺𝑗: set of factors in the clique 𝐷𝑗
Junction-tree algorithm: correctness
28 Induction on the length of the path from the leaves:
Base step: leaves
Inductive step:
=
𝐷𝑗\𝑇𝑗𝑘 𝑊≺ 𝑗1,𝑗
…
𝑊≺ 𝑗𝑙,𝑗 𝜚∈𝐺≺ 𝑗1,𝑗
𝜚 …
𝜚∈𝐺≺ 𝑗𝑙,𝑗
𝜚
𝜚∈𝐺𝑗
𝜚 =
𝐷𝑗\𝑇𝑗𝑘 𝜚∈𝐺𝑗
𝜚
𝑊≺ 𝑗1,𝑗 𝜚∈𝐺≺ 𝑗1,𝑗
𝜚 …
𝑊≺ 𝑗𝑙,𝑗 𝜚∈𝐺≺ 𝑗𝑙,𝑗
𝜚 =
𝐷𝑗\𝑇𝑗𝑘
𝜔𝑗 × 𝑛𝑗1→𝑗 × ⋯ × 𝑛𝑗𝑙→𝑗
𝑊
≺ 𝑗,𝑘 is a disjoint union of 𝑊 ≺ 𝑗𝑙,𝑗
for 𝑙 = 1, … , 𝑛 and 𝐷𝑗\𝑇𝑗𝑘 𝐷𝑗1 𝐷𝑗𝑛 … 𝐷𝑗 𝐷
𝑘
𝑛𝑗→𝑘 𝑇𝑗𝑘 =
𝐷𝑗\S𝑗𝑘
𝜔𝑗
𝑙∈𝒪 𝑗 −{j}
𝑛𝑙𝑗(𝑇𝑙𝑗) =
𝑊≺ 𝑗,𝑘 𝜚∈𝐺≼ 𝑗,𝑘
𝜚
Message passing schedule
29
A two-pass message-passing schedule: arbitrarily pick a
node as the root
First pass: starting at the leaves and proceeds inward
each node passes a message to its parent. continues until the root has obtained messages from all of its
adjoining nodes.
Second pass: starting at the root and passing the messages back
- ut
messages are passed in the reverse direction. continues until all leaves have received their messages.
We maintained initial clique potentials and messages in
this approach
Junction tree algorithm Belief update perspective: Hugin algorithm
30
We intend to find two sets of potential functions:
Clique potentials: on each clique 𝒀𝐷, we define a potential function or belief
𝜔 𝒀𝐷 that is proportional to the marginal probability on that clique 𝜔 𝑌𝐷 ∝ 𝑄 𝑌𝐷
Separator potentials: on each separator set 𝒀𝑇 , we define a potential
function or belief 𝜚 𝒀𝑇 that is proportional to the marginal probability on 𝒀𝑇
𝜔 𝑌𝑇 ∝ 𝑄 𝑌𝑇
Enables us to obtain local representation of marginal probabilities in
cliques
𝑊 𝑋 𝑇 𝜚𝑇 𝜔𝑊 𝜔𝑋
Extended representation of joint probability
31
We intend to find extended representation:
𝑄 𝒀 ∝ 𝐷 𝜔𝐷(𝒀𝐷) 𝑇 𝜚𝑇(𝒀𝑇)
where the global representation
𝐷 𝜔𝐷(𝒀𝐷) 𝑇 𝜚𝑇(𝒀𝑇) corresponds to the joint
probabilities
and the local representations 𝜔𝐷(𝒀𝐷) and 𝜚𝑇(𝒀𝑇) correspond to
marginal probabilities
Consistency
32
Consistency: since the potentials are required to represent
marginal probabilities, they must give the same marginals for the nodes that they have in common
Consistency is a necessary and sufficient condition for the inference algorithm to
find potentials that are marginals We will first introduce local consistency:
𝜚𝑇𝑗𝑘 =
𝐷𝑗\S𝑗𝑘
𝜔𝑗 =
𝐷𝑘\S𝑗𝑘
𝜔𝑘
Local consistency (updates): Message passing with division
33
Updating 𝑋 based on 𝑊 (passing information from 𝑊 to 𝑋):
𝜚𝑇
∗ = 𝑊\S
𝜔𝑊 𝜔𝑋
∗ = 𝜚𝑇 ∗
𝜚𝑇 𝜔𝑋 𝜔𝑊
∗ = 𝜔𝑊
Updating 𝑊 based on 𝑋 (passing information from 𝑋 to 𝑊):
𝜚𝑇
∗∗ = 𝑋\S
𝜔𝑋
∗
𝜔𝑊
∗∗ = 𝜚𝑇 ∗∗
𝜚𝑇
∗ 𝜔𝑊 ∗
𝜔𝑋
∗∗ = 𝜔𝑋 ∗ 𝑊 𝑋 𝑇
𝜚𝑇 𝜔𝑊 𝜔𝑋 𝑊 𝑋 𝑇 𝜚𝑇 𝜔𝑊 𝜔𝑋 marginalization rescaling The separator potentials have been initialized to unity
Properties of updates
34
Invariant
joint: after these updates, the joint probability remains unchanged
𝜔𝑋
∗ 𝜔𝑊 ∗
𝜚𝑇
∗
= 𝜔𝑋𝜔𝑊 𝜚𝑇
Local consistency: 𝜔𝑊
∗∗ and 𝜔𝑋 ∗∗ are consistent with
respect to 𝑇
𝑄 𝒀𝑇 = 𝜚𝑇
∗∗ = 𝑋\S
𝜔𝑋
∗∗ = 𝑊\S
𝜔𝑊
∗∗
𝑋\S
𝜔𝑋
∗∗ = 𝜚𝑇 ∗∗ = 𝜚𝑇 ∗∗ 𝜚𝑇 ∗
𝜚𝑇
∗ = 𝜚𝑇 ∗∗ 𝑊\S 𝜔𝑊 ∗
𝜚𝑇
∗
=
𝑊\S
𝜚𝑇
∗∗
𝜚𝑇
∗ 𝜔𝑊 ∗ = 𝑊\S
𝜔𝑊
∗∗
Global consistency
35
To ensure global consistency from enforcing local consistency,
the variables that appear in two different cliques must appear also in all of the cliques on the path
Junction-tree property: Local consistency ⇒ global consistency Thus, it will suffice to arrange the cliques into a junction tree
and only enforce local consistency on the neighboring cliques
Instead of enforcing consistency on all pair of overlapping cliques
The information flow on the path to ensure consistency of the overlapping
cliques
Message-passing in a clique-tree
36
We
indeed perform local updates (to maintain local consistency) when we have multiple overlapping cliques
We need a protocol that constrains the order in which updates are
performed.
Updates are arranged such that local consistency between a clique and
its neighbors is not ruined by subsequent updates between the clique and other neighbors
Message-passing protocol: a clique can send a message to a neighboring clique
- nly when it has received messages from all of its other neighbors
Correctness of junction-tree algorithm
37
Theorem: Let 𝑄(𝒚, 𝒇) be represented by the clique potentials
𝜔𝐷 and separator potentials 𝜚𝑇 of a junction tree (i.e., 𝑄(𝒚, 𝒇) ∝
𝐷 𝜔𝐷 𝑇 𝜚𝑇). When the junction tree terminates the clique
potentials and the separator potentials are proportional to local marginals.
Hugin algorithm: summary
38 Complication
directed model → undirected moralized graph
Graph triangulation
Creating a junction tree using maximum spanning tree approach
Initialization & evidence
Each potential of the original graph (possibly sliced) is multiplied onto exactly one clique of junction tree. Separators are initialized to unity.
Propagation of probabilities
Propagation of the probabilities by applying update equation according to a schedule
𝜚𝑇
∗ = 𝑊\S
𝜔𝑊 𝜔𝑋
∗ = 𝜚𝑇 ∗
𝜚𝑇 𝜔𝑋
When the algorithm terminates, the clique potential and separator potentials are proportional to marginal probabilities
Normalize the clique potentials to obtain conditional probabilities on the
corresponding cliques.
A calibrated clique tree as a distribution
39
𝑄 𝐷𝑗 = 𝜔𝑗 𝑙∈𝒪(𝑗) 𝑛𝑙𝑗(𝑇𝑙𝑗)
𝑄 𝑇𝑗𝑘 = 𝐷𝑗−𝑇𝑗𝑘 𝑄 𝐷𝑗 = 𝑛𝑗𝑘(𝑇𝑗𝑘) 𝑛𝑘𝑗(𝑇𝑗𝑘)
𝑄 𝑌 =
𝑗∈𝑊𝑈 𝑄 𝐷𝑗 (𝑗,𝑘)∈𝐹𝑈 𝑄 𝑇𝑗𝑘
Junction tree algorithm: summary
40
A generic exact inference algorithm for any graphical model Results in marginal probabilities of all cliques by message-
passing algorithm on a junction tree constructed from the
- riginal graph
Can solve multiple queries in a single run
The time and space complexity of this algorithm is exponential
w.r.t. the size of the largest clique
References
41