Undirected Graphical Models Michael Gutmann Probabilistic Modelling - - PowerPoint PPT Presentation

undirected graphical models
SMART_READER_LITE
LIVE PREVIEW

Undirected Graphical Models Michael Gutmann Probabilistic Modelling - - PowerPoint PPT Presentation

Undirected Graphical Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap The number of free parameters in probabilistic models increases with


slide-1
SLIDE 1

Undirected Graphical Models

Michael Gutmann

Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh

Spring semester 2018

slide-2
SLIDE 2

Recap

◮ The number of free parameters in probabilistic models

increases with the number of random variables involved.

◮ Making statistical independence assumptions reduces the

number of free parameters that need to be specified.

◮ Starting with the chain rule and an ordering of the random

variables, we used statistical independencies to simplify the representation.

◮ We thus obtained a factorisation in terms of a product of

conditional pdfs that we visualised as a DAG.

◮ In turn, we used DAGs to define sets of distributions

(“directed graphical models”).

◮ We discussed independence properties satisfied by the

distributions, d-separation, and the equivalence to the factorisation.

Michael Gutmann Undirected Graphical Models 2 / 51

slide-3
SLIDE 3

The directionality in directed graphical models

◮ So far we mainly exploited the property

x ⊥ ⊥ y | z ⇐ ⇒ p(y|x, z) = p(y|z)

◮ But when working with p(y|x, z) we impose an ordering or

directionality from x and z to y.

◮ Directionality matters in directed graphical models x z y versus x z y ◮ In some cases, directionality is natural but in others we do not

want to choose one direction over another.

◮ We now discuss how to represent independencies in a

symmetric manner without assuming a directionality or

  • rdering of the variables.

Michael Gutmann Undirected Graphical Models 3 / 51

slide-4
SLIDE 4

Program

  • 1. Representing probability distributions without imposing a

directionality between the random variables

  • 2. Undirected graphs, separation, and statistical independencies
  • 3. Definition of undirected graphical models
  • 4. Further independencies in undirected graphical models

Michael Gutmann Undirected Graphical Models 4 / 51

slide-5
SLIDE 5

Program

  • 1. Representing probability distributions without imposing a

directionality between the random variables Factorisation and statistical independence Gibbs distributions Visualising Gibbs distributions with undirected graphs Conditioning corresponds to removing nodes and edges from the graph

  • 2. Undirected graphs, separation, and statistical independencies
  • 3. Definition of undirected graphical models
  • 4. Further independencies in undirected graphical models

Michael Gutmann Undirected Graphical Models 5 / 51

slide-6
SLIDE 6

Further characterisation of statistical independence

◮ From tutorials: For non-negative functions a(x, z), b(y, z):

x ⊥ ⊥ y | z ⇐ ⇒ p(x, y, z) = a(x, z)b(y, z)

◮ More general version of p(x, y, z) = p(x|z)p(y|z)p(z) ◮ No directionality or ordering of the variables is imposed. ◮ Unconditional version: For non-negative functions a(x), b(y):

x ⊥ ⊥ y ⇐ ⇒ p(x, y) = a(x)b(y)

◮ The important point is the factorisation of p(x, y, z) into two

factors:

◮ if the factors share a variable z, then we have conditional

independence,

◮ if not, we have unconditional independence. Michael Gutmann Undirected Graphical Models 6 / 51

slide-7
SLIDE 7

Further characterisation of statistical independence

◮ Since p(x, y, z) must sum (integrate) to one, we must have

  • x,y,z

a(x, z)b(y, z) = 1

◮ Normalisation condition often ensured by re-defining

a(x, z)b(y, z): p(x, y, z) = 1 Z φA(x, z)φB(y, z) Z =

  • x,y,z

φA(x, z)φB(y, z)

◮ Z: normalisation constant (related to partition function, see later) ◮ φi: factors (also called potential functions).

Do generally not correspond to (conditional) probabilities. They measure “compatibility”, “agreement”, or “affinity”

Michael Gutmann Undirected Graphical Models 7 / 51

slide-8
SLIDE 8

What does it mean?

x ⊥ ⊥ y | z ⇐ ⇒ p(x, y, z) = 1

Z φA(x, z)φB(y, z)

“⇒” If we want our model to satisfy x ⊥ ⊥ y | z we should write the pdf (pmf) as p(x, y, z) ∝ φA(x, z)φB(y, z) “⇐” If the pdf (pmf) can be written as p(x, y, z) ∝ φA(x, z)φB(y, z) then we have x ⊥ ⊥ y | z

equivalent for unconditional version

Michael Gutmann Undirected Graphical Models 8 / 51

slide-9
SLIDE 9

Example

Consider p(x1, x2, x3, x4) ∝ φ1(x1, x2)φ2(x2, x3)φ3(x4) What independencies does p satisfy?

◮ We can write

p(x1, x2, x3, x4) ∝ [φ1(x1, x2)φ2(x2, x3)]

  • ˜

φ1(x1,x2,x3)

[φ3(x4)] ∝ ˜ φ1(x1, x2, x3)φ3(x4) so that x4 ⊥ ⊥ x1, x2, x3.

◮ Integrating out x4 gives

p(x1, x2, x3) =

  • p(x1, x2, x3, x4)dx4 ∝ φ1(x1, x2)φ2(x2, x3)

so that x1 ⊥ ⊥ x3 | x2

Michael Gutmann Undirected Graphical Models 9 / 51

slide-10
SLIDE 10

Gibbs distributions

◮ Example is a special case of a class of pdfs/pmfs that

factorise as p(x1, . . . , xd) = 1 Z

  • c

φc(Xc)

◮ Xc ⊆ {x1, . . . , xd} ◮ φc are non-negative factors (potential functions)

Do generally not correspond to (conditional) probabilities. They measure “compatibility”, “agreement”, or “affinity”

◮ Z is a normalising constant so that p(x1, . . . , xd) integrates

(sums) to one.

◮ Known as Gibbs (or Boltzmann) distributions ◮ ˜

p(x1, . . . , xd) =

c φc(Xc) is an example of an unnormalised

model: ˜ p ≥ 0 but does not necessarily integrate (sum) to one.

Michael Gutmann Undirected Graphical Models 10 / 51

slide-11
SLIDE 11

Energy-based model

◮ With φc(Xc) = exp (−Ec(Xc)), we have equivalently

p(x1, . . . , xd) = 1 Z exp

  • c

Ec(Xc)

c Ec(Xc) is the energy of the configuration (x1, . . . , xd).

low energy ⇐ ⇒ high probability

Michael Gutmann Undirected Graphical Models 11 / 51

slide-12
SLIDE 12

Example

Other examples of Gibbs distributions: p(x1, . . . , x6) ∝φ1(x1, x2, x4)φ2(x2, x3, x4)φ3(x3, x5)φ4(x3, x6) p(x1, . . . , x6) ∝φ1(x1, x2)φ2(x2, x3)φ3(x2, x5)φ4(x1, x4)φ5(x4, x5) φ6(x5, x6)φ7(x3, x6)? Independencies?

◮ In principle, the independencies follow from

x ⊥ ⊥ y | z ⇐ ⇒ p(x, y, z) ∝ φA(x, z)φB(y, z) with appropriately defined factors φA and φB.

◮ But the mathematical manipulations of grouping together

factors and integrating variables out become unwieldy. Let us use graphs to better see what’s going on.

Michael Gutmann Undirected Graphical Models 12 / 51

slide-13
SLIDE 13

Visualising Gibbs distributions with undirected graphs

p(x1, . . . , xd) ∝

c φc(Xc) ◮ Node for each xi ◮ For all factors φc: draw an undirected edge between all xi and

xj that belong to Xc

◮ Results in a fully-connected subgraph for all xi that are part of

the same factor (this subgraph is called a clique). Example:

Graph for p(x1, . . . , x6) ∝ φ1(x1, x2, x4)φ2(x2, x3, x4)φ3(x3, x5)φ4(x3, x6) x1 x2 x3 x4 x5 x6

Michael Gutmann Undirected Graphical Models 13 / 51

slide-14
SLIDE 14

Effect of conditioning

Let p(x1, . . . , x6) ∝ φ1(x1, x2, x4)φ2(x2, x3, x4)φ3(x3, x5)φ4(x3, x6).

◮ What is p(x1, x2, x4, x5, x6|x3 = α)? ◮ By definition p(x1, x2, x4, x5, x6|x3 = α)

= p(x1, x2, x3 = α, x4, x5, x6)

p(x1, x2, x3 = α, x4, x5, x6)dx1dx2dx4dx5dx6

= φ1(x1, x2, x4)φ2(x2, α, x4)φ3(α, x5)φ4(α, x6)

φ1(x1, x2, x4)φ2(x2, α, x4)φ3(α, x5)φ4(α, x6)dx1dx2dx4dx5dx6

= 1 Z(α)φ1(x1, x2, x4)φα

2 (x2, x4)φα 3 (x5)φα 4 (x6) ◮ Gibbs distribution with derived factors φα i of reduced domain

and new normalisation “constant” Z(α)

◮ Note that Z(α) depends on the conditioning value α.

Michael Gutmann Undirected Graphical Models 14 / 51

slide-15
SLIDE 15

Effect of conditioning

Let p(x1, . . . , x6) ∝ φ1(x1, x2, x4)φ2(x2, x3, x4)φ3(x3, x5)φ4(x3, x6).

◮ Conditional p(x1, x2, x4, x5, x6|x3 = α) is

1 Z(α)φ1(x1, x2, x4)φα

2 (x2, x4)φα 3 (x5)φα 4 (x6) ◮ Conditioning on variables removes the corresponding nodes

and connecting edges from the undirected graph

x1 x2 x4 x5 x6

Michael Gutmann Undirected Graphical Models 15 / 51

slide-16
SLIDE 16

Program

  • 1. Representing probability distributions without imposing a

directionality between the random variables Factorisation and statistical independence Gibbs distributions Visualising Gibbs distributions with undirected graphs Conditioning corresponds to removing nodes and edges from the graph

  • 2. Undirected graphs, separation, and statistical independencies
  • 3. Definition of undirected graphical models
  • 4. Further independencies in undirected graphical models

Michael Gutmann Undirected Graphical Models 16 / 51

slide-17
SLIDE 17

Program

  • 1. Representing probability distributions without imposing a

directionality between the random variables

  • 2. Undirected graphs, separation, and statistical independencies

Separation in undirected graphs Statistical independencies from graph separation Global Markov property I-map

  • 3. Definition of undirected graphical models
  • 4. Further independencies in undirected graphical models

Michael Gutmann Undirected Graphical Models 17 / 51

slide-18
SLIDE 18

Relating graph properties to independencies

◮ Consider p(x1, x2, x3, x4) ∝ φ1(x1, x2)φ2(x2, x3)φ3(x4) from

before

◮ We have seen:

◮ x4 ⊥

⊥ x1, x2, x3

◮ x1 ⊥

⊥ x3 | x2

◮ Graph:

x1 x3 x4 x2

◮ In the graph, x4 is separated from x1, x2, x3.

Starting at x4, we cannot reach x1, x2, or x3 (and vice versa). In other words, all trails from x4 to x1, x2, x3 are “blocked”.

◮ In the graph, x1 and x3 are separated by x2. In other words,

all trails from x1 to x3 are blocked by x2

(when removing x2 from the graph, we cannot reach x3 from x1 and vice versa)

Michael Gutmann Undirected Graphical Models 18 / 51

slide-19
SLIDE 19

Relating graph properties to independencies

◮ Example:

p(x1, . . . , x6) ∝ φ1(x1, x2, x4)φ2(x2, x3, x4)φ3(x3, x5)φ4(x3, x6)

◮ Graph:

x1 x2 x4 x5 x6 x3

◮ x3 separates {x1, x2, x4} and {x5, x6}

In other words, x3 blocks all trails from {x1, x2, x4} to {x5, x6}

◮ Do we have x1, x2, x4 ⊥

⊥ x5, x6 | x3?

Michael Gutmann Undirected Graphical Models 19 / 51

slide-20
SLIDE 20

Relating graph properties to independencies

p(x) ∝ φ1(x1, x2, x4)φ2(x2, x3, x4)φ3(x3, x5)φ4(x3, x6)

◮ Do we have x1, x2, x4 ⊥

⊥ x5, x6 | x3?

◮ Group the factors

p(x) ∝ φ1(x1, x2, x4)φ2(x2, x3, x4)

  • φA(x1,x2,x4,x3)

φ3(x3, x5)φ4(x3, x6)

  • φB(x5,x6,x3)

◮ Takes the form

p(x) ∝ φA(x, z)φB(y, z) with x ≡ (x1, x2, x4), y ≡ (x5, x6), z = x3

◮ Hence: x1, x2, x4 ⊥

⊥ x5, x6 | x3 holds indeed.

Michael Gutmann Undirected Graphical Models 20 / 51

slide-21
SLIDE 21

Separation in undirected graphs

Let X, Y , Z be three disjoint set of nodes in an undirected graph.

◮ X and Z are separated by Z if every trail from any node in X

to any node in Y passes through at least one node of Z.

◮ In other words:

◮ all trails from X to Y are blocked by Z ◮ removing Z from the graph leaves X and Y disconnected. ◮ Nodes are valves; open by default but closed when part of Z.

X Y Z

Michael Gutmann Undirected Graphical Models 21 / 51

slide-22
SLIDE 22

Statistical independencies from graph separation

Assume p(x1, . . . , xd) ∝

c φc(Xc), with Xc ⊂ {x1, . . . , xd} can

be visualised as the graph below. Do we have x1, x2 ⊥ ⊥ y1, y2 | z1, z2, z3? z1 z2 z3 x1 x2 y1 y2

Michael Gutmann Undirected Graphical Models 22 / 51

slide-23
SLIDE 23

Statistical independencies from graph separation

Assume p(x1, . . . , xd) ∝

c φc(Xc), with Xc ⊂ {x1, . . . , xd} can

be visualised as the graph below. Do we have x ⊥ ⊥ y | z1, z2, z3? u x1 x2 z1 z2 z3 y1 y2 x y

Michael Gutmann Undirected Graphical Models 23 / 51

slide-24
SLIDE 24

Statistical independencies from graph separation

◮ With z = (z1, z2, z3), all xi belong to one of the x, y, z, or u. ◮ We thus have p(x1, . . . , xd) = p(x, y, z, u) and we can group

the factors φc together so that p(x, y, z, u) ∝ φ1(x, z)φ2(y, z)φ3(u, z)

u x1 x2 z1 z2 z3 y1 y2 x y

Michael Gutmann Undirected Graphical Models 24 / 51

slide-25
SLIDE 25

Statistical independencies from graph separation

◮ Integrating (summing) out u gives

p(x, y, z) =

  • u

p(x, y, z, u) (1) ∝

  • u

φ1(x, z)φ2(y, z)φ3(u, z) (2)

(distributive law)

∝ φ1(x, z)φ2(y, z)

  • u

φ3(u, z) (3) ∝ φ1(x, z)φ2(y, z)˜ φ(z) (4) ∝ φA(x, z)φB(y, z) (5)

◮ And p(x, y, z) ∝ φA(x, z)φB(y, z) means x ⊥

⊥ y | z

Michael Gutmann Undirected Graphical Models 25 / 51

slide-26
SLIDE 26

Statistical independencies from graph separation

Assume p(x1, . . . , xd) ∝

c φc(Xc), with Xc ⊂ {x1, . . . , xd} can

be visualised as the graph below. We have shown that if x and y are separated by z, then x ⊥ ⊥ y | z. x1 x2 z1 z2 z3 y1 y2 x y

Michael Gutmann Undirected Graphical Models 26 / 51

slide-27
SLIDE 27

Statistical independencies from graph separation

Assume p(x1, . . . , xd) ∝

c φc(Xc), with Xc ⊂ {x1, . . . , xd} can

be visualised as the graph below. So do we have x1, x2 ⊥ ⊥ y1, y2 | z1, z2, z3? x1 x2 z1 z2 z3 y1 y2 x y

Michael Gutmann Undirected Graphical Models 27 / 51

slide-28
SLIDE 28

Statistical independencies from graph separation

◮ From tutorial: x ⊥

⊥ {y, w} | z implies x ⊥ ⊥ y | z

◮ Hence x ⊥

⊥ y | z1, z2, z3 implies x1, x2 ⊥ ⊥ y1, y2 | z1, z2, z3. x1 x2 z1 z2 z3 y1 y2 x y

Michael Gutmann Undirected Graphical Models 28 / 51

slide-29
SLIDE 29

Summary

Theorem: Let G be the undirected graph for p(x1, . . . , xd) ∝

c φc(Xc), and

X, Y , Z three disjoint subsets of {x1, . . . , xd}. If, in the graph, X and Y are separated by Z, then X ⊥ ⊥ Y | Z.

◮ Important because:

  • 1. the theorem allows us to read out (conditional) independencies

from the undirected graph

  • 2. the theorem shows that graph separation does not indicate

false independence relations. (“Soundness” of the independence assertions.)

◮ We say that p(x1, . . . , xd) satisfies the global Markov property

relative to G.

Michael Gutmann Undirected Graphical Models 29 / 51

slide-30
SLIDE 30

Converse

Theorem: If X and Y are not separated by Z in the graph then X ⊥ ⊥ Y | Z in some probability distributions that factorise according to the graph.

Optional, for those interested: A proof sketch can be found in Section 4.3.1.2

  • f Probabilistic Graphical Models by Koller and Friedman.

Remark: The theorem implies that for some specific factors, we may have X ⊥ ⊥ Y | Z even though X and Y are not separated by

  • Z. The separation criterion only allows us to decide about

independence and not about dependence. It is not “complete”.

Michael Gutmann Undirected Graphical Models 30 / 51

slide-31
SLIDE 31

I-map

(as before for directed graphical models)

◮ A graph is said to be an independency map (I-map) for a set

  • f independencies I if the independencies asserted by the

graph are part of I.

◮ For a undirected graph H, let I(H) be all the independencies

that we can derive via graph separation.

◮ Denote the independencies that a distribution p satisfies by

I(p).

◮ The previous results on graph separation can thus be written

as I(H) ⊆ I(p) for all p that factorise over H

◮ As before, we generally do not have I(H) = I(p). If we have

equality, the graph is said to be a perfect map (P-map) for I(p).

Michael Gutmann Undirected Graphical Models 31 / 51

slide-32
SLIDE 32

Example

◮ p(x1, . . . , x6) ∝ φ1(x1, x2, x4)φ2(x2, x3, x4)φ3(x3, x5)φ4(x3, x6) ◮ Graph

x1 x2 x3 x4 x5 x6

◮ Example independencies:

x1 ⊥ ⊥ {x3, x5, x6} | x2, x4 x2 ⊥ ⊥ x6 | x3 x5 ⊥ ⊥ x6 | x3

◮ But x3 ⊥

⊥ x1 for some distributions that factorise over the graph.

Michael Gutmann Undirected Graphical Models 32 / 51

slide-33
SLIDE 33

Summary

  • 1. Representing probability distributions without imposing a

directionality between the random variables Factorisation and statistical independence Gibbs distributions Visualising Gibbs distributions with undirected graphs Conditioning corresponds to removing nodes and edges from the graph

  • 2. Undirected graphs, separation, and statistical independencies

Separation in undirected graphs Statistical independencies from graph separation Global Markov property I-map

Michael Gutmann Undirected Graphical Models 33 / 51

slide-34
SLIDE 34

Program

  • 1. Representing probability distributions without imposing a

directionality between the random variables

  • 2. Undirected graphs, separation, and statistical independencies
  • 3. Definition of undirected graphical models

Via factorisation according to the graph Undirected graphical models satisfy the global Markov property

  • 4. Further independencies in undirected graphical models

Michael Gutmann Undirected Graphical Models 34 / 51

slide-35
SLIDE 35

Undirected graphical models

◮ We started with a pdf/pmf in the form of a Gibbs distribution,

and associated a undirected graph with it.

◮ We now go the other way around and start with an undirected

graph.

◮ Definition An undirected graphical model based on an

undirected graph with d nodes and associated random variables xi is the set of pdfs/pmfs that factorise as p(x1, . . . , xd) = 1 Z

  • c

φc(Xc) where Z is the normalisation constant, φc(Xc) ≥ 0, and the Xc correspond to the maximal cliques in the graph.

◮ p(x1, . . . , xd) as above are said to factorise according to the

graph.

Michael Gutmann Undirected Graphical Models 35 / 51

slide-36
SLIDE 36

Remarks

◮ The undirected graphical model corresponds to a set of

probability distributions. This is because we left the actual definition of the factors φc(Xc) unspecified.

◮ Other names for an undirected graphical model: Markov

network (MN), Markov random field (MRF)

◮ By definition, all p(x1, . . . , xd) defined by the graph satisfy the

global Markov property relative to the graph.

◮ Since the graph is an I-map, we can use graph separation to

determine independencies that hold for all distributions that factorise according to the graph.

◮ The Xc correspond to maximal cliques in the graph.

Maximal clique: a set of fully connected nodes (clique) that is not contained in another clique.

Michael Gutmann Undirected Graphical Models 36 / 51

slide-37
SLIDE 37

Why maximal cliques?

◮ The mapping from Gibbs distribution to graph is many to one

We may obtain the same graph for different Gibbs distributions, e.g.

p(x) ∝ φ1(x1, x2, x4)φ2(x2, x3, x4)φ3(x3, x5)φ4(x3, x6) p(x) ∝ ˜ φ1(x1, x2)˜ φ2(x1, x4)˜ φ3(x2, x4)˜ φ4(x2, x3)˜ φ5(x3, x4)˜ φ6(x3, x5)˜ φ7(x3, x6) x1 x2 x3 x4 x5 x6

◮ By using maximal cliques, we take a conservative approach

and do not make additional assumptions on the factorisation.

Michael Gutmann Undirected Graphical Models 37 / 51

slide-38
SLIDE 38

Example (pair-wise Markov network)

Graph:

x1 x2 x3 x4 x5 x6

Random variables: x1, . . . , x6 Maximal cliques: all neighbours

{x1, x2} {x2, x3} {x4, x5} φ6{x5, x6} {x1, x4} {x2, x5} φ7{x3, x6}

All models defined by the graph factorise as:

p(x) ∝φ1(x1, x2)φ2(x2, x3)φ3(x4, x5)φ4(x5, x6)φ5(x1, x4)φ6(x2, x5)φ7(x3, x6)

Example of a pairwise Markov network.

Michael Gutmann Undirected Graphical Models 38 / 51

slide-39
SLIDE 39

Example (pair-wise Markov network)

Graph:

x1 x2 x3 x4 x5 x6

Some independencies from global Markov property: x1, x4 ⊥ ⊥ x3, x6 | x2, x5 x1 ⊥ ⊥ x5, x6, x3

  • all \(x1∪ne1)

| x4, x2

ne1

x1 ⊥ ⊥ x6 | x2, x3, x4, x5

  • all without x1,x6

Last two are examples of the “local Markov property” and the “pairwise Markov property” relative to the undirected graph.

Michael Gutmann Undirected Graphical Models 39 / 51

slide-40
SLIDE 40

Program

  • 1. Representing probability distributions without imposing a

directionality between the random variables

  • 2. Undirected graphs, separation, and statistical independencies
  • 3. Definition of undirected graphical models

Via factorisation according to the graph Undirected graphical models satisfy the global Markov property

  • 4. Further independencies in undirected graphical models

Michael Gutmann Undirected Graphical Models 40 / 51

slide-41
SLIDE 41

Program

  • 1. Representing probability distributions without imposing a

directionality between the random variables

  • 2. Undirected graphs, separation, and statistical independencies
  • 3. Definition of undirected graphical models
  • 4. Further independencies in undirected graphical models

Local Markov property Pairwise Markov property Equivalence between factorisation and Markov properties for positive distributions Markov blanket

Michael Gutmann Undirected Graphical Models 41 / 51

slide-42
SLIDE 42

Local Markov property

Denote the set of all nodes by X and the neighbours of a node α by ne(α).

◮ A probability distribution is said to satisfy the local Markov

property relative to an undirected graph if α ⊥ ⊥ X \ (α ∪ ne(α)) | ne(α) for all nodes α ∈ X

◮ If p satisfies the global Markov property, then it satisfies the

local Markov property. This is because ne(α) blocks all trails to remaining nodes.

α

Michael Gutmann Undirected Graphical Models 42 / 51

slide-43
SLIDE 43

Pairwise Markov property

Denote the set of all nodes by X.

◮ A probability distribution is said to satisfy the pairwise Markov

property relative to an undirected graph if α ⊥ ⊥ β | X \ {α, β} for all non-neighbouring α, β ∈ X

◮ If p satisfies the local Markov property, then it satisfies the

pairwise Markov property.

α β

Michael Gutmann Undirected Graphical Models 43 / 51

slide-44
SLIDE 44

Summary

Let p be a pdf/pmf defined by the undirected graph G. p factorises according to G ⇓ p satisfies the global Markov property ⇓ p satisfies the local Markov property ⇓ p satisfies the pairwise Markov property

Michael Gutmann Undirected Graphical Models 44 / 51

slide-45
SLIDE 45

Do we have an equivalence?

◮ In directed graphical models, we had an equivalence of

◮ factorisation, ◮ ordered Markov property, ◮ local directed Markov property, and ◮ global directed Markov property.

◮ Do we have a similar equivalence for undirected graphical

models? Yes, under some very mild condition

Michael Gutmann Undirected Graphical Models 45 / 51

slide-46
SLIDE 46

Intersection property

◮ The intersection property holds for all distributions with

p(x) > 0 for all values of x in its domain.

◮ Excludes deterministic relationships between the variables. ◮ Intersection property: Let A, B, C, D be sets of random

variables If A ⊥ ⊥ B | (C∪D) and A ⊥ ⊥ C | (B∪D) then A ⊥ ⊥ (B∪C) | D

A D C B

Michael Gutmann Undirected Graphical Models 46 / 51

slide-47
SLIDE 47

From pairwise to global Markov property and factorisation

◮ Let p(x1, . . . , xd) be a pdf/pmf that satisfies the intersection

property for all disjoint subsets A, B, C, D of {x1, . . . , xd}. Holds if p is always takes positive values (“positive distributions”).

◮ If p satisfies the pairwise Markov property with respect to an

undirected graph G then

◮ p satisfies the global Markov property with respect to G, and ◮ p factorises according to G.

◮ Hence: equivalence of factorisation and the global, local, and

pairwise Markov properties for positive distributions.

◮ Equivalence known as Hammersely-Clifford theorem. ◮ Important e.g. for learning because prior knowledge may come

in form of conditional independencies (the graph), which we can incorporate by working with Gibbs distributions that factorise accordingly.

Michael Gutmann Undirected Graphical Models 47 / 51

slide-48
SLIDE 48

Summary of equivalences

Factorisation p(x1, . . . , xd) = 1

Z

  • c φc(Xc),

φc(Xc) > 0

  • pairwise Markov property

α ⊥ ⊥ β | {x1, . . . , xd} \ {α, β}

  • local Markov property

α ⊥ ⊥ {x1, . . . , xd} \ (α ∪ ne(α)) | ne(α)

  • global Markov property

all independencies from graph separation Broadly speaking, the graph serves two related purposes:

  • 1. it tells us how distributions factorise
  • 2. it represents the independence assumptions made

Michael Gutmann Undirected Graphical Models 48 / 51

slide-49
SLIDE 49

Markov blanket

What is the minimal set of variables such that knowing their values makes x independent from the rest? From local Markov property: MB(x) = ne(x): x ⊥ ⊥ {all variables \ (x ∪ ne(x))} | ne(x) .

x

Michael Gutmann Undirected Graphical Models 49 / 51

slide-50
SLIDE 50

Program

  • 1. Representing probability distributions without imposing a

directionality between the random variables

  • 2. Undirected graphs, separation, and statistical independencies
  • 3. Definition of undirected graphical models
  • 4. Further independencies in undirected graphical models

Local Markov property Pairwise Markov property Equivalence between factorisation and Markov properties for positive distributions Markov blanket

Michael Gutmann Undirected Graphical Models 50 / 51

slide-51
SLIDE 51

Program recap

  • 1. Representing probability distributions without imposing a directionality

between the random variables Factorisation and statistical independence Gibbs distributions Visualising Gibbs distributions with undirected graphs Conditioning corresponds to removing nodes and edges from the graph

  • 2. Undirected graphs, separation, and statistical independencies

Separation in undirected graphs Statistical independencies from graph separation Global Markov property I-map

  • 3. Definition of undirected graphical models

Via factorisation according to the graph Undirected graphical models satisfy the global Markov property

  • 4. Further independencies in undirected graphical models

Local Markov property Pairwise Markov property Equivalence between factorisation and Markov properties for positive distributions Markov blanket

Michael Gutmann Undirected Graphical Models 51 / 51