Junction-tree algorithm Probabilistic Graphical Models Sharif - - PowerPoint PPT Presentation

junction tree algorithm
SMART_READER_LITE
LIVE PREVIEW

Junction-tree algorithm Probabilistic Graphical Models Sharif - - PowerPoint PPT Presentation

Junction-tree algorithm Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani Inference on general GM Now, what if the GM is not a tree-like graph? Can we still directly run message-passing protocol along


slide-1
SLIDE 1

Junction-tree algorithm

Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani

slide-2
SLIDE 2

Inference on general GM

2

 Now, what if the GM is not a tree-like graph?  Can we still directly run message-passing protocol along

its edges?

 For non-trees, we do not have the guarantee that message-

passing will be consistent!

 Construct a graph data-structure from P that has a tree

structure, and run message-passing on it!

 Junction tree algorithm

slide-3
SLIDE 3

Junction-tree algorithm: a general approach

3

 Junction trees as opposed to the sum-product on trees can be

applied on general graphs

 Junction tree as opposed to the elimination algorithm is not

“query-oriented”

 enables us to record and use the intermediated factors to respond to

multiple queries simultaneously

 Upon convergence of the algorithms, we obtain marginal probabilities for all

cliques of the original graph.

slide-4
SLIDE 4

Example: variable elimination and cluster tree

4

 Elimination order: 𝑌6, 𝑌5, 𝑌4, 𝑌3, 𝑌2

Moralized graph 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4

slide-5
SLIDE 5

Example: elimination cliques

5

 Elimination order: 𝑌6, 𝑌5, 𝑌4, 𝑌3, 𝑌2

𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4

slide-6
SLIDE 6

Example: clique tree obtained by VE

6

𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 Elimination order: 𝑌6, 𝑌5, 𝑌4, 𝑌3, 𝑌2  The clique tree contains the cliques (fully connected subsets)

generated as elimination executes

 This cluster graph induced by an execution ofVE is necessarily a tree

 Indeed, after an elimination, the corresponding elimination clique will not be

reappear

slide-7
SLIDE 7

Example: clique tree obtained by VE

7

𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 Elimination order: 𝑌6, 𝑌5, 𝑌4, 𝑌3, 𝑌2  The clique tree contains the cliques (fully connected subsets)

generated as elimination executes

 This graph induced by an execution ofVE is necessarily a tree

 Indeed, after an elimination, the corresponding elimination clique will not be

reappear

Maximal cliques

slide-8
SLIDE 8

Example: Elimination ≡ message passing on a clique tree

8

This slide has been adopted from Eric Zing, PGM 10708, CMU.

slide-9
SLIDE 9

Computation reuse

9

 Another query ...  Messages 𝑛𝑔

and 𝑛ℎ are reused, others need to be recomputed

This slide has been adopted from Eric Zing, PGM 10708, CMU.

slide-10
SLIDE 10

Cluster tree

10

 Cluster tree is a singly connected graph (i.e., exactly one

path between each pair of nodes) in which the nodes are the clusters of an underlying graph

 A separator set is defined each linked pair of clusters

contain the variables in the intersection of the clusters

𝑌𝐵, 𝑌𝐶 𝑌𝐶, 𝑌𝐷 𝑌𝐵 𝑌𝐶 𝑌𝐷 separator set 𝑌𝐶

slide-11
SLIDE 11

Junction tree property

11

 Junction tree property: If a variable appears in the two

cliques in the tree, it must appear in all cliques on the paths connecting them

 For every pair of cliques 𝐷𝑗 and 𝐷

𝑘, all cliques on the path

between 𝐷𝑗 and 𝐷

𝑘 contain 𝑇𝑗𝑘 = 𝐷𝑗 ∩ 𝐷 𝑘

 Also called as running intersection property  The cluster tree that satisfies the running intersection

property is called junction tree.

slide-12
SLIDE 12

Clique tree usefulness

12

 Clique tree provides a structure for caching computations

 Multiple

queries can be performed much more efficiently than performingVE for each one separately.

 dictates

a partial

  • rder
  • ver

the

  • perations

that are performed on factors to reach a better computational complexity

slide-13
SLIDE 13

Theorem

13

 The tree induced by a variable elimination algorithm

satisfies running intersection property

 Proof:

 Let 𝐷 and 𝐷′ be two clusters that contain 𝑌 and 𝐷𝑌 be the

cluster where 𝑌 is eliminated, we will prove that 𝑌 must be present in every clique on the path between 𝐷 and 𝐷𝑌 (and similarly on the path between 𝐷𝑌 and 𝐷′)

 Idea: the computation at 𝐷𝑌 must happen later than the computation

at 𝐷 or 𝐷′

slide-14
SLIDE 14

Clique trees from variable elimination

14

 Each clique in the clique tree induced by VE is also a clique in

the induced graph and vice versa.

 However, for inference we can reduce the clique tree to

contain only maximal cliques of the induced graph.

𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4

slide-15
SLIDE 15

Triangulated graphs

15

 What class of graphs have junction tree?  A triangulated (or chordal) graph contains no cycles with four

  • r more nodes in which there is no chord

 Triangulation is the necessary and sufficient condition for a

graph to have a junction tree

 only triangulated graphs have the property that their cluster trees are

junction trees.

slide-16
SLIDE 16

Elimination algorithm: triangulation

16

 Every induced graph (by variable elimination) is chordal  For, any chordal graph there is an elimination ordering that

does not add any fill edges

 In general, finding the best triangulation is NP-hard but some

good heuristics exist

Moralized graph 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 Induced graph 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1, 𝑌3, 𝑌4, 𝑌1, 𝑌5, 𝑌6

slide-17
SLIDE 17

Building a junction tree

17

 Different

junction trees are

  • btained

for different triangulations

 Obtained from different elimination orders (and different maximum

spanning trees).

 Complexity of junction tree algorithms

 The time and space complexity is dominated by the size of the largest

clique in the junction tree (exponential in the size of the largest clique)

 Finding the junction tree with the smallest cliques is an NP-

hard problem.

 finding the optimum ordering in the Elimination algorithm is NP-hard  but for many graph optimum or near-optimum can often be heuristically

found

slide-18
SLIDE 18

Junction-tree construction

18

 Construct the undirected graph  Triangulate the graph

 e.g., Find an induced graph resulted from VE with a specified elimination

  • rder of nodes

 Find the set of maximal elimination cliques of the triangulated

graph

 Build a weighted, complete graph over these maximal cliques.

 Weight each edge (between cliques 𝐷𝑗 and 𝐷

𝑘 as 𝐷𝑗 ∩ 𝐷 𝑘 )

 Find a Maximal SpanningTree as a junction tree for 𝐻

 A cluster-tree is a junction-tree iff it is maximal spanning tree

slide-19
SLIDE 19

Junction tree construction: Example

19

𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4 𝑌1 𝑌2 𝑌3 𝑌6 𝑌5 𝑌4

{𝑌1, 𝑌2, 𝑌3} {𝑌2, 𝑌3, 𝑌5} {𝑌2, 𝑌5, 𝑌6} {𝑌2, 𝑌4} {𝑌2, 𝑌3} {𝑌2} {𝑌2} {𝑌2, 𝑌5} {𝑌2} {𝑌2} {𝑌1, 𝑌2, 𝑌3} {𝑌2, 𝑌3, 𝑌5} {𝑌2, 𝑌5, 𝑌6} {𝑌2, 𝑌4} {𝑌2, 𝑌3} {𝑌2} {𝑌2} {𝑌2, 𝑌5} {𝑌2} {𝑌2}

slide-20
SLIDE 20

Junction tree algorithm

20

 Given a factorized probability distribution 𝑄 with the Markov

network 𝐼, builds a junction tree 𝑈 based on 𝐼

 For each clique, it finds the marginal probability over the

variables in that clique

 Message-passing sum product (Shafer-Shenoy algorithm)

 Run a message-passing algorithm on the junction tree constructed according

to the distribution

 Belief update: Local consistency preservation (Hugin algorithm)

 rescaling (update) equations

slide-21
SLIDE 21

Junction tree algorithm: inference

21

 Junction tree inference algorithm is a message passing on

a junction tree structure.

 Each clique starts with a set of initial factors.  Each clique sends one message to each neighbor in a schedule.  Finally, for each clique, the marginal over its variables is

computed.

slide-22
SLIDE 22

Junction tree algorithm: inference

22

 Junction tree inference algorithm is a message passing on

a junction tree structure.

 Each clique starts with a set of initial factors.

 We assign a factor in the distribution 𝑄 to one and only one clique in

𝑈 if the scope of the factor is a subset of the variables in that clique

𝜔𝑗 =

𝜚∈𝐺𝑗

𝜚 𝐺𝑗 shows the set of factors assigned to clique 𝐷𝑗

slide-23
SLIDE 23

Junction tree algorithm: inference

23

 Junction tree inference algorithm is a message passing on

a junction tree structure.

 Each clique starts with a set of initial factors.  Each clique sends one message to each neighbor in a schedule.

 Each clique multiplies the incoming messages and its potential, sum

  • ut over one or more variables and send an outcoming message.

𝑛𝑗𝑘 𝑇𝑗𝑘 =

𝐷𝑗−S𝑗𝑘

𝜔𝑗

𝑙∈𝒪 𝑗 −{j}

𝑛𝑙𝑗(𝑇𝑙𝑗)

𝐷

𝑘

𝑇𝑗𝑘 𝐷𝑗 𝑛𝑗𝑘 𝑇𝑗𝑘

slide-24
SLIDE 24

Separation set

24

 Theorem 1: In a clique tree induced by a variable

elimination algorithm, let 𝑛𝑗𝑘 be a message that 𝐷𝑗 sends to the neighboring cluster 𝐷

𝑘 then the scope of this

message is 𝑇𝑗𝑘 = 𝐷𝑗⋂𝐷

𝑘

 Theorem

2: A cluster tree satisfies running intersection property if and

  • nly

if for every separation set 𝑇𝑗𝑘, 𝑊

≺ 𝑗,𝑘 and 𝑊 ≻ 𝑗,𝑘 are separated in 𝐼

given 𝑇𝑗𝑘

𝑊

≺ 𝑗,𝑘 : set of all variables in the scope of all

cliques in the 𝐷𝑗 side of the edge (𝑗, 𝑘)

slide-25
SLIDE 25

Junction tree algorithm: inference

25

 Junction tree inference algorithm is a message passing on

a junction tree structure.

 Each clique starts with a set of initial factors.  Each clique sends one message to each neighbor in a schedule.  Finally, for each clique, the marginal over its variables is

computed.

 After message-passing, for each clique, by combining its potential with

the received messages from its neighbors, we can compute the marginal over its variables. 𝑄(𝐷𝑠) ∝ 𝜔𝑠

𝑙∈𝒪(𝑠)

𝑛𝑙𝑠(𝑇𝑙𝑠)

Marginal on a clique as a product of the initial potential and the messages from its neighbors

slide-26
SLIDE 26

Junction-tree message passing: Shafer-Shenoy algorithm

26

𝑛𝑗𝑘 𝑇𝑗𝑘 =

𝐷𝑗−S𝑗𝑘

𝜔𝑗

𝑙∈𝒪 𝑗 −{j}

𝑛𝑙𝑗(𝑇𝑙𝑗) 𝑄(𝐷𝑠) ∝ 𝜔𝑠

𝑙∈𝒪(𝑠)

𝑛𝑙𝑠(𝑇𝑙𝑠)

𝐷

𝑘

𝑇𝑗𝑘 𝐷𝑗 𝑛𝑗𝑘 𝑇𝑗𝑘 𝑛𝑘𝑗 𝑇𝑗𝑘

𝑄(𝑇𝑗𝑘) ∝ 𝑛𝑗𝑘 𝑇𝑗𝑘 𝑛𝑘𝑗 𝑇𝑗𝑘

𝜔𝑗 =

𝜚∈𝐺𝑗

𝜚 𝐺𝑗 shows the set of factors assigned to clique 𝐷𝑗

slide-27
SLIDE 27

Junction-tree algorithm: correctness

27

 If 𝑌 is eliminated when a message is sent from 𝐷𝑗 to a neighboring 𝐷

𝑘 such

that 𝑌 ∈ 𝐷𝑗 and 𝑌 ∉ 𝐷

𝑘, then 𝑌 does not appear in the tree on the 𝐷 𝑘 side

  • f the edge (𝑗, 𝑘) after elimination (according to junction-tree property)

𝑊

≺ 𝑗,𝑘 : set of all variables in the scope of all

cliques in the 𝐷𝑗 side of the edge (𝑗, 𝑘) 𝐺≺ 𝑗,𝑘 : set of factors in the cliques in the 𝐷𝑗 side

  • f the edge (𝑗, 𝑘)

𝐺𝑗: set of factors in the clique 𝐷𝑗

slide-28
SLIDE 28

Junction-tree algorithm: correctness

28  Induction on the length of the path from the leaves:

Base step: leaves

Inductive step:

=

𝐷𝑗\𝑇𝑗𝑘 𝑊≺ 𝑗1,𝑗

𝑊≺ 𝑗𝑙,𝑗 𝜚∈𝐺≺ 𝑗1,𝑗

𝜚 …

𝜚∈𝐺≺ 𝑗𝑙,𝑗

𝜚

𝜚∈𝐺𝑗

𝜚 =

𝐷𝑗\𝑇𝑗𝑘 𝜚∈𝐺𝑗

𝜚

𝑊≺ 𝑗1,𝑗 𝜚∈𝐺≺ 𝑗1,𝑗

𝜚 …

𝑊≺ 𝑗𝑙,𝑗 𝜚∈𝐺≺ 𝑗𝑙,𝑗

𝜚 =

𝐷𝑗\𝑇𝑗𝑘

𝜔𝑗 × 𝑛𝑗1→𝑗 × ⋯ × 𝑛𝑗𝑙→𝑗

𝑊

≺ 𝑗,𝑘 is a disjoint union of 𝑊 ≺ 𝑗𝑙,𝑗

for 𝑙 = 1, … , 𝑛 and 𝐷𝑗\𝑇𝑗𝑘 𝐷𝑗1 𝐷𝑗𝑛 … 𝐷𝑗 𝐷

𝑘

 𝑛𝑗→𝑘 𝑇𝑗𝑘 =

𝐷𝑗\S𝑗𝑘

𝜔𝑗

𝑙∈𝒪 𝑗 −{j}

𝑛𝑙𝑗(𝑇𝑙𝑗) =

𝑊≺ 𝑗,𝑘 𝜚∈𝐺≼ 𝑗,𝑘

𝜚

slide-29
SLIDE 29

Message passing schedule

29

 A two-pass message-passing schedule: arbitrarily pick a

node as the root

 First pass: starting at the leaves and proceeds inward

 each node passes a message to its parent.  continues until the root has obtained messages from all of its

adjoining nodes.

 Second pass: starting at the root and passing the messages back

  • ut

 messages are passed in the reverse direction.  continues until all leaves have received their messages.

 We maintained initial clique potentials and messages in

this approach

slide-30
SLIDE 30

Junction tree algorithm Belief update perspective: Hugin algorithm

30

 We intend to find two sets of potential functions:

 Clique potentials: on each clique 𝒀𝐷, we define a potential function or belief

𝜔 𝒀𝐷 that is proportional to the marginal probability on that clique 𝜔 𝑌𝐷 ∝ 𝑄 𝑌𝐷

 Separator potentials: on each separator set 𝒀𝑇 , we define a potential

function or belief 𝜚 𝒀𝑇 that is proportional to the marginal probability on 𝒀𝑇

𝜔 𝑌𝑇 ∝ 𝑄 𝑌𝑇

 Enables us to obtain local representation of marginal probabilities in

cliques

𝑊 𝑋 𝑇 𝜚𝑇 𝜔𝑊 𝜔𝑋

slide-31
SLIDE 31

Extended representation of joint probability

31

 We intend to find extended representation:

𝑄 𝒀 ∝ 𝐷 𝜔𝐷(𝒀𝐷) 𝑇 𝜚𝑇(𝒀𝑇)

 where the global representation

𝐷 𝜔𝐷(𝒀𝐷) 𝑇 𝜚𝑇(𝒀𝑇) corresponds to the joint

probabilities

 and the local representations 𝜔𝐷(𝒀𝐷) and 𝜚𝑇(𝒀𝑇) correspond to

marginal probabilities

slide-32
SLIDE 32

Consistency

32

 Consistency: since the potentials are required to represent

marginal probabilities, they must give the same marginals for the nodes that they have in common

 Consistency is a necessary and sufficient condition for the inference algorithm to

find potentials that are marginals  We will first introduce local consistency:

𝜚𝑇𝑗𝑘 =

𝐷𝑗\S𝑗𝑘

𝜔𝑗 =

𝐷𝑘\S𝑗𝑘

𝜔𝑘

slide-33
SLIDE 33

Local consistency (updates): Message passing with division

33

 Updating 𝑋 based on 𝑊 (passing information from 𝑊 to 𝑋):

𝜚𝑇

∗ = 𝑊\S

𝜔𝑊 𝜔𝑋

∗ = 𝜚𝑇 ∗

𝜚𝑇 𝜔𝑋 𝜔𝑊

∗ = 𝜔𝑊

 Updating 𝑊 based on 𝑋 (passing information from 𝑋 to 𝑊):

𝜚𝑇

∗∗ = 𝑋\S

𝜔𝑋

𝜔𝑊

∗∗ = 𝜚𝑇 ∗∗

𝜚𝑇

∗ 𝜔𝑊 ∗

𝜔𝑋

∗∗ = 𝜔𝑋 ∗ 𝑊 𝑋 𝑇

𝜚𝑇 𝜔𝑊 𝜔𝑋 𝑊 𝑋 𝑇 𝜚𝑇 𝜔𝑊 𝜔𝑋 marginalization rescaling The separator potentials have been initialized to unity

slide-34
SLIDE 34

Properties of updates

34

 Invariant

joint: after these updates, the joint probability remains unchanged

𝜔𝑋

∗ 𝜔𝑊 ∗

𝜚𝑇

= 𝜔𝑋𝜔𝑊 𝜚𝑇

 Local consistency: 𝜔𝑊

∗∗ and 𝜔𝑋 ∗∗ are consistent with

respect to 𝑇

𝑄 𝒀𝑇 = 𝜚𝑇

∗∗ = 𝑋\S

𝜔𝑋

∗∗ = 𝑊\S

𝜔𝑊

∗∗

𝑋\S

𝜔𝑋

∗∗ = 𝜚𝑇 ∗∗ = 𝜚𝑇 ∗∗ 𝜚𝑇 ∗

𝜚𝑇

∗ = 𝜚𝑇 ∗∗ 𝑊\S 𝜔𝑊 ∗

𝜚𝑇

=

𝑊\S

𝜚𝑇

∗∗

𝜚𝑇

∗ 𝜔𝑊 ∗ = 𝑊\S

𝜔𝑊

∗∗

slide-35
SLIDE 35

Global consistency

35

 To ensure global consistency from enforcing local consistency,

the variables that appear in two different cliques must appear also in all of the cliques on the path

 Junction-tree property: Local consistency ⇒ global consistency  Thus, it will suffice to arrange the cliques into a junction tree

and only enforce local consistency on the neighboring cliques

 Instead of enforcing consistency on all pair of overlapping cliques

 The information flow on the path to ensure consistency of the overlapping

cliques

slide-36
SLIDE 36

Message-passing in a clique-tree

36

 We

indeed perform local updates (to maintain local consistency) when we have multiple overlapping cliques

 We need a protocol that constrains the order in which updates are

performed.

 Updates are arranged such that local consistency between a clique and

its neighbors is not ruined by subsequent updates between the clique and other neighbors

 Message-passing protocol: a clique can send a message to a neighboring clique

  • nly when it has received messages from all of its other neighbors
slide-37
SLIDE 37

Correctness of junction-tree algorithm

37

 Theorem: Let 𝑄(𝒚, 𝒇) be represented by the clique potentials

𝜔𝐷 and separator potentials 𝜚𝑇 of a junction tree (i.e., 𝑄(𝒚, 𝒇) ∝

𝐷 𝜔𝐷 𝑇 𝜚𝑇). When the junction tree terminates the clique

potentials and the separator potentials are proportional to local marginals.

slide-38
SLIDE 38

Hugin algorithm: summary

38  Complication

directed model → undirected moralized graph

Graph triangulation

Creating a junction tree using maximum spanning tree approach

 Initialization & evidence

Each potential of the original graph (possibly sliced) is multiplied onto exactly one clique of junction tree. Separators are initialized to unity.

 Propagation of probabilities

Propagation of the probabilities by applying update equation according to a schedule

𝜚𝑇

∗ = 𝑊\S

𝜔𝑊 𝜔𝑋

∗ = 𝜚𝑇 ∗

𝜚𝑇 𝜔𝑋

When the algorithm terminates, the clique potential and separator potentials are proportional to marginal probabilities

 Normalize the clique potentials to obtain conditional probabilities on the

corresponding cliques.

slide-39
SLIDE 39

A calibrated clique tree as a distribution

39

𝑄 𝐷𝑗 = 𝜔𝑗 𝑙∈𝒪(𝑗) 𝑛𝑙𝑗(𝑇𝑙𝑗)

𝑄 𝑇𝑗𝑘 = 𝐷𝑗−𝑇𝑗𝑘 𝑄 𝐷𝑗 = 𝑛𝑗𝑘(𝑇𝑗𝑘) 𝑛𝑘𝑗(𝑇𝑗𝑘)

𝑄 𝑌 =

𝑗∈𝑊𝑈 𝑄 𝐷𝑗 (𝑗,𝑘)∈𝐹𝑈 𝑄 𝑇𝑗𝑘

slide-40
SLIDE 40

Junction tree algorithm: summary

40

 A generic exact inference algorithm for any graphical model  Results in marginal probabilities of all cliques by message-

passing algorithm on a junction tree constructed from the

  • riginal graph

 Can solve multiple queries in a single run

 The time and space complexity of this algorithm is exponential

w.r.t. the size of the largest clique

slide-41
SLIDE 41

References

41

 D. Koller and N. Friedman, “Probabilistic Graphical Models:

Principles and Techniques”, Chapter 10.

 M.I. Jordan, “An

Introduction to Probabilistic Graphical Models”, Chapter 17.