Graphical Models Graphical Models Bayesian Networks Siamak - - PowerPoint PPT Presentation

graphical models graphical models
SMART_READER_LITE
LIVE PREVIEW

Graphical Models Graphical Models Bayesian Networks Siamak - - PowerPoint PPT Presentation

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on Previously on Probabilistic Graphical Models Probabilistic Graphical Models Probability distribution and density functions Random variable


slide-1
SLIDE 1

Bayesian Networks

Graphical Models Graphical Models

Siamak Ravanbakhsh

Fall 2019

slide-2
SLIDE 2

Previously on Previously on Probabilistic Graphical Models Probabilistic Graphical Models

Probability distribution and density functions Random variable Bayes' rule Conditional independence Expectation and Variance

slide-3
SLIDE 3

Learning objectives Learning objectives

what is a Bayesian network? factorization conditional independencies how to read it from the graph equivalence class of Bayesian networks how are they related?

slide-4
SLIDE 4

Representing distributions Representing distributions

give a number of random variables

X

, … , X

1 n

how to represent number of parameters exponential in n (curse of dimensionality) need to leverage some structure in P P(X

, … , X )

1 n

slide-5
SLIDE 5

Independence Independence & representation & representation

for discrete domains representation of

exponential in n: P(X = x

, … , x ) =

1 n

θ

i

,…,i

1 n

V al(X

) =

i

{1, … , D} ∀i

O(D }

n

slide-6
SLIDE 6

Independence Independence & representation & representation

X

i

X ∀i, j

j

for discrete domains representation of

exponential in n:

P(X = x

, … , x ) =

1 d n d

P(X =

∏i

i

x

) =

i d

θ

∏i

i,d

P(X = x

, … , x ) =

1 n

θ

i

,…,i

1 n

assuming independence linear-sized representation:

a particular assignment (d) in discrete domain

V al(X

) =

i

{1, … , D} ∀i

O(D }

n

slide-7
SLIDE 7

Independence Independence & representation & representation

X

i

X ∀i, j

j

for discrete domains representation of

exponential in n:

P(X = x

, … , x ) =

1 d n d

P(X =

∏i

i

x

) =

i d

θ

∏i

i,d

P(X = x

, … , x ) =

1 n

θ

i

,…,i

1 n

assuming independence linear-sized representation:

a particular assignment (d) in discrete domain

independence assumption is too restrictive

V al(X

) =

i

{1, … , D} ∀i

O(D }

n

slide-8
SLIDE 8

Using the Using the chain rule chain rule

P(X) = P(X

)P(X ∣

1 2

X

) … P(X ∣

1 n

X

, … , X )

1 n−1

pick an ordering of the variables

slide-9
SLIDE 9

Using the Using the chain rule chain rule

parameterize each term does it compress the representation?

P(X) = P(X

)P(X ∣

1 2

X

) … P(X ∣

1 n

X

, … , X )

1 n−1

  • riginal #params D −

n

1 pick an ordering of the variables

slide-10
SLIDE 10

Using the Using the chain rule chain rule

parameterize each term does it compress the representation?

P(X) = P(X

)P(X ∣

1 2

X

) … P(X ∣

1 n

X

, … , X )

1 n−1

  • riginal #params D −

n

1 new #params

(D − 1) + (D −

2

D) + … + (D −

n

D ) =

n−1

D −

n

1

pick an ordering of the variables

P(X

)

1

P(X

2

X

)

1

P(X

n

X

, … , X )

1 n−1

slide-11
SLIDE 11

Using the Using the chain rule chain rule

P(X) = P(X

)P(X ∣

1 2

X

) … P(X ∣

1 n

X

, … , X )

1 n−1

simplify the conditionals flexible compression of P

slide-12
SLIDE 12

Using the Using the chain rule chain rule

P(X) = P(X

)P(X ∣

1 2

X

) … P(X ∣

1 n

X

, … , X )

1 n−1

simplify the conditionals flexible compression of P A Bayesian network!

slide-13
SLIDE 13

Chain rule: Chain rule: simplification simplification

P(X) = P(X

)P(X ∣

1 2

X

)P(X ∣

1 3

X

, X ) … P(X ∣

1 2 n

X

, … , X )

1 n−1

an extreme form of simplification P(X) = P(X

)P(X ∣

1 2

X

)P(X ∣

1 3

X

) … P(X ∣

1 n

X

)

1

slide-14
SLIDE 14

Chain rule: Chain rule: simplification simplification

P(X) = P(X

)P(X ∣

1 2

X

)P(X ∣

1 3

X

, X ) … P(X ∣

1 2 n

X

, … , X )

1 n−1

an extreme form of simplification P(X) = P(X

)P(X ∣

1 2

X

)P(X ∣

1 3

X

) … P(X ∣

1 n

X

)

1

# params

(D − 1) + (n − 1)(D −

2

D) O(nD )

2

instead of O(D )

n

slide-15
SLIDE 15

Idiot Bayes Idiot Bayes

slide-16
SLIDE 16

...or ...or naive Bayes naive Bayes

P(class, X) = P(class)P(X

2

class)P(X

3

class) … P(X

n

class)

independence assumption: X

i

X

−i

class for classification (use Bayes rule)

P(class ∣ X) ∝ P(class)P(X

2

class)P(X

3

class) … P(X

n

class) Example: medical diagnosis (what if two symptoms are correlated?)

slide-17
SLIDE 17

Simplifying the chain rule: Simplifying the chain rule: general case general case

P(X) = P(X

)P(X ∣

1 2

X

) … P(X ∣

1 n

X

, … , X )

1 n−1

simplify the full conditionals:

slide-18
SLIDE 18

Directed Acyclic Graph (DAG)

Simplifying the chain rule: Simplifying the chain rule: general case general case

P(X) = P(X

)P(X ∣

1 2

X

) … P(X ∣

1 n

X

, … , X )

1 n−1

simplify the full conditionals: represent it using a

P(X) =

P(X ∣

∏i

i

Pa

)

X

i

Bayesian network

a topological ordering

slide-19
SLIDE 19

identifying a DAG has a topological ordering? no directed path from a node to itself?

DAG: DAG: identification identification

slide-20
SLIDE 20

identifying a DAG has a topological ordering? no directed path from a node to itself? Example: is this a DAG? a topological ordering: G,A,B,D,C,E,F

DAG: DAG: identification identification

slide-21
SLIDE 21

identifying a DAG has a topological ordering? no directed path from a node to itself? Example: is this a DAG? a topological ordering: G,A,B,D,C,E,F

DAG: DAG: identification identification

A,B,C,G,D,E,F

slide-22
SLIDE 22

identifying a DAG has a topological ordering? no directed path from a node to itself? Example: is this a DAG? a topological ordering: G,A,B,D,C,E,F how about this?

DAG: DAG: identification identification

A,B,C,G,D,E,F

slide-23
SLIDE 23

Bayesian network (BN): Bayesian network (BN): running example running example

P(I, D, G, S, L) = P(I)P(D)P(G ∣ I, D)P(S ∣ I)P(L ∣ G)

more intelligent A B C better SAT score more difficult better

slide-24
SLIDE 24

Bayesian network (BN): Bayesian network (BN): running example running example

P(I, D, G, S, L) = P(I)P(D)P(G ∣ I, D)P(S ∣ I)P(L ∣ G)

Conditional Probability Table (CPT)

more intelligent A B C better SAT score more difficult better

slide-25
SLIDE 25

Bayesian network (BN): Bayesian network (BN): running example running example

P(I, D, G, S, L) = P(I)P(D)P(G ∣ I, D)P(S ∣ I)P(L ∣ G)

P(i , d , g , s , l ) =

1 2 1

P(i )P(d )P(g ∣

1 2

i , d )P(s ∣

1 1

i )P(l ∣

1

g )

2

Conditional Probability Table (CPT)

more intelligent A B C better SAT score more difficult better

= .7 × .6 × .08 × .8 × .4 ≈ .01

slide-26
SLIDE 26

answering probabilistic queries

P(Y = y ∣ E = e) ?

evidence

Intuition for reasoning in a BN Intuition for reasoning in a BN

slide-27
SLIDE 27

answering probabilistic queries

P(Y = y ∣ E = e) ?

evidence

Intuition for reasoning in a BN Intuition for reasoning in a BN

P(L = l ∣

1

S = s ) =

1 P(S=s )

1

P(L=l ,S=s )

1 1

slide-28
SLIDE 28

answering probabilistic queries

P(Y = y ∣ E = e) ?

evidence an inference problem how to calculate? ... later

Intuition for reasoning in a BN Intuition for reasoning in a BN

P(L = l ∣

1

S = s ) =

1 P(S=s )

1

P(L=l ,S=s )

1 1

P(S = s ) =

1

P(d, i, g, s, l)

∑d,i,g,l

slide-29
SLIDE 29

marginal prior

  • f getting a good letter

marginal posterior

given low intelligence ... and an easy exam

causal reasoning (top­down) P(l ) ≈

1

.50

P(l ∣

1

i ) ≈ .389 P(l ∣

1

i , d ) ≈ .52

Intuition for reasoning in a BN Intuition for reasoning in a BN

more intelligent A B C better SAT score more difficult better

slide-30
SLIDE 30

(marginal) prior

  • f a high intelligence

(marginal) posterior

given a bad letter ... and a bad grade

evidential reasoning (bottom­up)

P(i ) ≈

1

.30

P(i ∣

1

l ) ≈ .14

P(i ∣

1

l , g ) ≈

3

.08

Intuition for reasoning in a BN Intuition for reasoning in a BN

more intelligent A B C better SAT score more difficult better

slide-31
SLIDE 31

prior

  • f a high intelligence

posterior

given a bad letter ... and a bad grade a difficult exam explains away the grade

Explaining away (v­structure)

P(i ) ≈

1

.30

P(i ∣

1

l ) ≈ .14

P(i ∣

1

l , g ) ≈

3

.08

Intuition for Reasoning in BN Intuition for Reasoning in BN

P(i ∣

1

l , g , d ) ≈

3 1

.11

more intelligent A B C better SAT score more difficult better

slide-32
SLIDE 32

DAG: DAG: semantics semantics

associating P with a DAG: factorization of the joint probability: conditional independencies in P from the DAG P(X) =

P(X ∣

∏i

i

Pa

)

X

i

slide-33
SLIDE 33

Bayesian networks: Bayesian networks: factorization factorization

P(I, D, G, S, L) = P(I)P(D)P(G ∣ I, D)P(S ∣ I)P(L ∣ G)

more intelligent A B C better SAT score more difficult better

P(X) =

P(X ∣

∏i

i

Pa

)

X

i

In general

slide-34
SLIDE 34

L ⊥ D, I, S ∣ G quality of the letter (L) only depends on the grade (G)

Bayesian networks: Bayesian networks: conditioinal independencies conditioinal independencies

slide-35
SLIDE 35

L ⊥ D, I, S ∣ G

D ⊥ S ?

quality of the letter (L) only depends on the grade (G)

How about the following assertions?

Bayesian networks: Bayesian networks: conditioinal independencies conditioinal independencies

slide-36
SLIDE 36

L ⊥ D, I, S ∣ G

D ⊥ S ?

quality of the letter (L) only depends on the grade (G)

How about the following assertions?

Bayesian networks: Bayesian networks: conditioinal independencies conditioinal independencies

slide-37
SLIDE 37

L ⊥ D, I, S ∣ G

D ⊥ S ? D ⊥ S ∣ I ?

quality of the letter (L) only depends on the grade (G)

How about the following assertions?

Bayesian networks: Bayesian networks: conditioinal independencies conditioinal independencies

slide-38
SLIDE 38

L ⊥ D, I, S ∣ G

D ⊥ S ? D ⊥ S ∣ I ?

quality of the letter (L) only depends on the grade (G)

How about the following assertions?

Bayesian networks: Bayesian networks: conditioinal independencies conditioinal independencies

slide-39
SLIDE 39

L ⊥ D, I, S ∣ G

D ⊥ S ? D ⊥ S ∣ I ?

quality of the letter (L) only depends on the grade (G)

How about the following assertions?

D ⊥ S ∣ L ?

Bayesian networks: Bayesian networks: conditioinal independencies conditioinal independencies

slide-40
SLIDE 40

L ⊥ D, I, S ∣ G

D ⊥ S ? D ⊥ S ∣ I ?

quality of the letter (L) only depends on the grade (G)

How about the following assertions?

D ⊥ S ∣ L ?

Bayesian networks: Bayesian networks: conditioinal independencies conditioinal independencies

why?

slide-41
SLIDE 41

L ⊥ D, I, S ∣ G

D ⊥ S ? D ⊥ S ∣ I ?

quality of the letter (L) only depends on the grade (G)

How about the following assertions?

D ⊥ S ∣ L ?

Bayesian networks: Bayesian networks: conditioinal independencies conditioinal independencies

read from the graph? why?

slide-42
SLIDE 42

Conditional independencies (CI): Conditional independencies (CI): notation notation

  • 1. set of all CIs of the distribution P
  • 2. set of local CIs from the graph (DAG)
  • 3. set of all (global) CIs from the graph

I(P) I

(G)

I(G)

slide-43
SLIDE 43

D ⊥ I, S I ⊥ D G ⊥ S ∣ I, D S ⊥ G, L, D ∣ I L ⊥ D, I, S ∣ G

graph G

I

(G) =

{ }

Local Local conditional independencies ( conditional independencies (CI CIs) s)

X

i

NonDescendents

X

i

Parents

X

i

for any node X

i

slide-44
SLIDE 44

use the factorized form

Local CIs Local CIs from factorization from factorization

P(X) =

P(X ∣

∏i

i

Pa

)

X

i

P(X

, NonDesc ∣

i X

i

Pa

) =

X

i

P(X

i

Pa

)P(NonDesc ∣

X

i

X

i

Pa

)

X

i

to show

X

i

NonDesc

X

i

Pa

X

i

which means ∀X

i

slide-45
SLIDE 45

S ⊥ G ∣ I

P(D, I, G, S, L) = P(D)P(I)P(G ∣ D, I)P(S ∣ I)P(L ∣ G)

Local CIs Local CIs from factorization: from factorization: example example

given

slide-46
SLIDE 46

S ⊥ G ∣ I

P(D, I, G, S, L) = P(D)P(I)P(G ∣ D, I)P(S ∣ I)P(L ∣ G)

P(G, S ∣ I) =

= P(D,I,G,S,L)

∑d,g,s,l

P(D,I,G,S,L)

∑d,l

Local CIs Local CIs from factorization: from factorization: example example

given

slide-47
SLIDE 47

S ⊥ G ∣ I

P(D, I, G, S, L) = P(D)P(I)P(G ∣ D, I)P(S ∣ I)P(L ∣ G)

= P(D)P(I)P(G∣D,I)P(S∣I)P(L∣G)

∑d,g,s,l

P(D)P(I)P(G∣D,I)P(S∣I)P(L∣G)

∑d,l

P(G, S ∣ I) =

= P(D,I,G,S,L)

∑d,g,s,l

P(D,I,G,S,L)

∑d,l

Local CIs Local CIs from factorization: from factorization: example example

given

slide-48
SLIDE 48

S ⊥ G ∣ I

P(D, I, G, S, L) = P(D)P(I)P(G ∣ D, I)P(S ∣ I)P(L ∣ G)

= P(D)P(I)P(G∣D,I)P(S∣I)P(L∣G)

∑d,g,s,l

P(D)P(I)P(G∣D,I)P(S∣I)P(L∣G)

∑d,l

P(G, S ∣ I) =

= P(D,I,G,S,L)

∑d,g,s,l

P(D,I,G,S,L)

∑d,l

=

P(I)

P(D)P(G∣D,I)P(S∣I)P(L∣G)

∑d,g,s,l P(I)P(S∣I)

P(D)P(G∣D,I)P(L∣G)

∑d,l

Local CIs Local CIs from factorization: from factorization: example example

given

slide-49
SLIDE 49

S ⊥ G ∣ I

P(D, I, G, S, L) = P(D)P(I)P(G ∣ D, I)P(S ∣ I)P(L ∣ G)

= P(D)P(I)P(G∣D,I)P(S∣I)P(L∣G)

∑d,g,s,l

P(D)P(I)P(G∣D,I)P(S∣I)P(L∣G)

∑d,l

P(G, S ∣ I) =

= P(D,I,G,S,L)

∑d,g,s,l

P(D,I,G,S,L)

∑d,l

=

P(I)

P(D)P(G∣D,I)P(S∣I)P(L∣G)

∑d,g,s,l P(I)P(S∣I)

P(D)P(G∣D,I)P(L∣G)

∑d,l

=

1 P(S∣I)

P(D)P(G∣D,I)P(L∣G)

∑d,l

P(S ∣ I)P(G ∣ I)

Local CIs Local CIs from factorization: from factorization: example example

given

slide-50
SLIDE 50

Factorization Factorization from local CIs from local CIs

from local CIs

I

(G) =

{X

i

NonDesc

X

i

Pa

X

i

i}

find a topological ordering (parents before children): use the chain rule simplify using local CIs

X

, … , X

i

1

i

n

P(X) = P(X

) P(X ∣

i

1 ∏j=2

n i

j

X

, … , X )

i

1

i

j−1

P(X) = P(X

) P(X ∣

i

1 ∏j=2

n i

j

Pa

)

X

i j

slide-51
SLIDE 51

Factorization Factorization from local CIs: from local CIs: example example

local CIs

(D ⊥ I, S), (I ⊥ D) , (G ⊥ S ∣ I), (S ⊥ G, L, D ∣ I), (L ⊥ D, I, S ∣ G) I

(G) =

{ }

P(D, I, G, S, L) = P(D)P(I ∣ D)P(G ∣ D, I)P(L ∣ D, I, G)P(S ∣ D, I, G, L)

a topological ordering: D, I, G, L, S

I

(G)

P(D, I, G, S, L) = P(D)P(I)P(G ∣ D, I)P(L ∣ G)P(S ∣ I)

use the chain rule simplify using

slide-52
SLIDE 52

Factorization local CIs Factorization local CIs

P factorizes according to

G

P(X) =

P(X ∣ Pa )

∏i

i X

i

G

⇔ ⇔

holds in P

I

(G)

slide-53
SLIDE 53

Factorization local CIs Factorization local CIs

P factorizes according to

G

I

(G) ⊆

I(P)

P(X) =

P(X ∣ Pa )

∏i

i X

i

G

⇔ ⇔

holds in P

I

(G)

slide-54
SLIDE 54

Factorization local CIs Factorization local CIs

P factorizes according to

G

I

(G) ⊆

I(P)

P(X) =

P(X ∣ Pa )

∏i

i X

i

G

is an I-map for P

G

it does not mislead us about independencies in P

holds in P

I

(G)

slide-55
SLIDE 55

Perfect map ( Perfect map (P-map P-map)

which graph G to use for P?

Perfect MAP: I(G)=I(P)

P may not have a P-map in the form of BN

slide-56
SLIDE 56

Perfect map ( Perfect map (P-map P-map)

which graph G to use for P?

Perfect MAP: I(G)=I(P)

P may not have a P-map in the form of BN

Example:

p(x, y, z) = {1/12, 1/6, if x ⊗ y ⊗ z = 0 if x ⊗ y ⊗ z = 1

(X ⊥ Y ), (Y ⊥ Z), (X ⊥ Z) ∈ I(P) (X ⊥ Y ∣ Z), (Y ⊥ Z ∣ Z), (X ⊥ Z ∣ Y ) ∈ / I(P)

slide-57
SLIDE 57

Perfect map ( Perfect map (P-map P-map)

which graph G to use for P?

Perfect MAP: I(G)=I(P)

P may not have a P-map in the form of BN

Example:

p(x, y, z) = {1/12, 1/6, if x ⊗ y ⊗ z = 0 if x ⊗ y ⊗ z = 1

(X ⊥ Y ), (Y ⊥ Z), (X ⊥ Z) ∈ I(P) (X ⊥ Y ∣ Z), (Y ⊥ Z ∣ Z), (X ⊥ Z ∣ Y ) ∈ / I(P)

slide-58
SLIDE 58

Summary Summary so far so far

simplification of the chain rule

P(X) =

P(X ∣

∏i

i

Pa

)

X

i

Bayes-net represented using a DAG naive Bayes local conditional independencies hold in a Bayes-net imply a Bayes-net Note: motivation is not just compressed representation, but faster inference and learning as well

I = {X

i

NonDesc

X

i

Pa

X

i

i}

slide-59
SLIDE 59

Global Global CIs CIs from the graph from the graph

for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z? global CI: the set of all such CIs

slide-60
SLIDE 60

Global Global CIs CIs from the graph from the graph

factorized form of P global CIs I

(G) ⊆

I(G) ⊆ I(P)

for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z? global CI: the set of all such CIs

slide-61
SLIDE 61

Global Global CIs CIs from the graph from the graph

factorized form of P global CIs I

(G) ⊆

I(G) ⊆ I(P)

algorithm: directed separation (D-separation)

C ⊥ D ∣ B, F ?

for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z? global CI: the set of all such CIs Example:

slide-62
SLIDE 62

Three canonical settings Three canonical settings

  • 1. causal / evidence trail

P(X, Y , Z) = P(X)P(Y ∣X)P(Z ∣ Y )

X Y Z for three random variables

slide-63
SLIDE 63

Three canonical settings Three canonical settings

  • 1. causal / evidence trail

P(X, Y , Z) = P(X)P(Y ∣X)P(Z ∣ Y ) P(Z ∣ X, Y ) =

=

P(X,Y ) P(X,Y ,Z)

=

P(X)P(Y ∣X) P(X)P(Y ∣X)P(Z∣Y )

P(Z ∣ Y )

conditional Independence: marginal independence: P(X, Z)

=

 P(X)P(Z)

X Y Z for three random variables

slide-64
SLIDE 64

Three canonical settings Three canonical settings

  • 2. common cause

P(X, Y , Z) = P(Y )P(X ∣ Y )P(Z ∣ Y )

X Y Z

slide-65
SLIDE 65

Three canonical settings Three canonical settings

  • 2. common cause

P(X, Y , Z) = P(Y )P(X ∣ Y )P(Z ∣ Y ) P(X, Z ∣ Y ) =

=

P(Y ) P(X,Y ,Z)

P(X ∣ Y )P(Z ∣ Y )

conditional independence: marginal independence: P(X, Z)

=

 P(X)P(Z)

X Y Z

slide-66
SLIDE 66

Three canonical settings Three canonical settings

  • 3. common effect

P(X, Y , Z) = P(X)P(Z)P(Y ∣ X, Z)

X Y Z a.k.a. collider, v-structure

slide-67
SLIDE 67

Three canonical settings Three canonical settings

  • 3. common effect

P(X, Y , Z) = P(X)P(Z)P(Y ∣ X, Z)

P(X, Z ∣ Y ) =

=

P(Y ) P(X,Y ,Z) 

P(X ∣ Y )P(Z ∣ Y )

conditional independence: marginal independence:

P(X, Z) =

P(X, Y , Z) =

∑Y P(X)P(Z)

P(Y ∣

∑Y X, Z) = P(X)P(Z)

X Y Z a.k.a. collider, v-structure

slide-68
SLIDE 68

Three canonical settings Three canonical settings

  • 3. common effect

P(X, Z ∣ W)

=

 P(X ∣ W)P(Z ∣ W)

conditional Independence:

w

even observing a descendant of Y makes X, Z dependent

X Y Z

slide-69
SLIDE 69

Putting the three cases together Putting the three cases together

consider all paths between variables in X and Y

X

, X ⊥

1 2

Y

1

Z

, Z

?

1 2 X

1

X

2

Y

1

Z

1

Z

2

slide-70
SLIDE 70

X

1

X

2

Y

1

Z

1

Z

2

so far X ⊥ Y ∣ Z consider all paths between variables in X and Y

X

, X ⊥

1 2

Y

1

Z

, Z

?

1 2

Putting the three cases together Putting the three cases together

slide-71
SLIDE 71

X

1

X

2

Y

1

Z

1

Z

2

consider all paths between variables in X and Y

X

, X ⊥

1 2

Y

1

Z

, Z

?

1 2

Putting the three cases together Putting the three cases together

had we not observed

Z

1

(X

, X ⊥

1 2

Y

1

Z

) ∈

2

I(G)

slide-72
SLIDE 72

(a.k.a. Bayes-Ball algorithm) See whether at least one ball from X reaches Y

X ⊥ Y ∣ Z ?

image from:https://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

Z is shaded

D-seperation D-seperation

slide-73
SLIDE 73

D-separation: D-separation: algorithm algorithm

input: graph G and X, Y, Z

  • utput:

mark the variables in Z and all of their ancestors in G breadth-first-search starting from X stop any trail that reaches a blocked node a node in Y is reached? X ⊥ Y ∣ Z ? unmarked middle of a collider (V-structure) in Z and not a collider Linear time complexity

slide-74
SLIDE 74

D-separation D-separation quiz quiz

slide-75
SLIDE 75

D-separation D-separation quiz quiz

G ⊥ S ∣ ∅?

slide-76
SLIDE 76

D-separation D-separation quiz quiz

G ⊥ S ∣ ∅?

slide-77
SLIDE 77

D-separation D-separation quiz quiz

G ⊥ S ∣ ∅?

D ⊥ L ∣ G?

slide-78
SLIDE 78

D-separation D-separation quiz quiz

G ⊥ S ∣ ∅?

D ⊥ L ∣ G?

slide-79
SLIDE 79

D-separation D-separation quiz quiz

G ⊥ S ∣ ∅?

D ⊥ L ∣ G? D ⊥ I, S ∣ ∅?

slide-80
SLIDE 80

D-separation D-separation quiz quiz

G ⊥ S ∣ ∅?

D ⊥ L ∣ G? D ⊥ I, S ∣ ∅?

slide-81
SLIDE 81

D-separation D-separation quiz quiz

G ⊥ S ∣ ∅?

D ⊥ L ∣ G? D ⊥ I, S ∣ ∅? D, L ⊥ S ∣ I, G?

slide-82
SLIDE 82

D-separation D-separation quiz quiz

G ⊥ S ∣ ∅?

D ⊥ L ∣ G? D ⊥ I, S ∣ ∅? D, L ⊥ S ∣ I, G?

slide-83
SLIDE 83

conditional independencies of the distribution inferred from the graph local CI: global CI: D-separation

Summary Summary

graph and distribution are combined: factorization of the distribution according to the graph

X

i

NonDescendents

X

i

Parents

X

i

P(X) =

P(X ∣

∏i

i

Pa

)

X

i

G

slide-84
SLIDE 84

Summary Summary

factorization of the distribution local conditional independencies global conditional independencies identify the same family of distributions

slide-85
SLIDE 85

Bonus slides Bonus slides

slide-86
SLIDE 86

Equivalence class of DAGs Equivalence class of DAGs

Two DAGs are I-equivalent if I(G) = I(G )

P factorizes on both of these graphs

slide-87
SLIDE 87

Equivalence class of DAGs Equivalence class of DAGs

Two DAGs are I-equivalent if I(G) = I(G )

From d-separation algorithm it is sufficient same undirected skeleton same v-structures P factorizes on both of these graphs

slide-88
SLIDE 88

Equivalence class of DAGs Equivalence class of DAGs

Two DAGs are I-equivalent if I(G) = I(G )

different v-structures, yet I(G) = I(G ) =

slide-89
SLIDE 89

Equivalence class of DAGs Equivalence class of DAGs

Two DAGs are I-equivalent if I(G) = I(G )

different v-structures, yet I(G) = I(G ) =

∅ here, v-structures are irrelevant for I-equivalent because: parents are connected (moral parents!)

slide-90
SLIDE 90

Equivalence class of DAGs Equivalence class of DAGs

Two DAGs are I-equivalent if I(G) = I(G )

I(G) = I(G ) ⇔

same undirected skeleton same immoralities

slide-91
SLIDE 91

Equivalence class of DAGs Equivalence class of DAGs

Two DAGs are I-equivalent if I(G) = I(G )

I(G) = I(G ) ⇔

same undirected skeleton same immoralities

X ⊥ Y ∣ Z ?

Z Z X X Y Y

slide-92
SLIDE 92

I-Equivalence I-Equivalence quiz quiz

do these DAGs have the same set of CIs?

X X Y Y Z Z W W

slide-93
SLIDE 93

I-Equivalence I-Equivalence quiz quiz

do these DAGs have the same set of CIs? no!

X X Y Y Z Z W W

slide-94
SLIDE 94

I-Equivalence I-Equivalence quiz quiz

do these DAGs have the same set of CIs? no!

X X Y Y Z Z W W

X ⊥ Z ∣ W

slide-95
SLIDE 95

Minimal Minimal I-map I-map

G is minimal I-map for P: G is an I-map for P: removing any edge destroys this property

P(X, Y , Z, W) = P(X ∣ Y , Z)P(W)P(Y ∣ Z)P(Z)

Example:

I(G) ⊆ I(P)

which graph G to use for P?

X X X Z Z Z Y Y Y W W W

I­MAP

X Z Y W

  • min. I­MAP
  • min. I­MAP

NOT an I­MAP

slide-96
SLIDE 96

Minimal I-map Minimal I-map from CI from CI

input: or an oracle; an ordering

  • utput: a minimal I-map G

for i=1...n find minimal s.t. set

which graph G to use for P?

I(P)

X

, … , X

1 n

(X

i

X

, … , X −

1 i−1

U ∣ U) U ⊆ {X

, … , X }

1 i−1

X

1

X

n

X

i

Pa

X

i

U

X

i

NonDesc

X

i

Pa

X

i

slide-97
SLIDE 97

Minimal I-map from CI Minimal I-map from CI

input: or an oracle; an ordering

  • utput: a minimal I-map G

which graph G to use for P?

I(P)

X

, … , X

1 n different orderings give different graphs

slide-98
SLIDE 98

Minimal I-map from CI Minimal I-map from CI

input: or an oracle; an ordering

  • utput: a minimal I-map G

which graph G to use for P?

I(P)

X

, … , X

1 n different orderings give different graphs

Example: D,I,S,G,L

(a topological ordering)

L,S,G,I,D L,D,S,I,G

slide-99
SLIDE 99

Perfect MAP ( Perfect MAP (P-MAP P-MAP)

which graph G to use for P?

D,I,S,G,L L,S,G,I,D L,D,S,I,G all the graphs above are minimal I-MAPs Perfect MAP:

I(G) ⊆ I(P) I(G) = I(P)

slide-100
SLIDE 100

Perfect map ( Perfect map (P-map P-map)

which graph G to use for P?

Perfect MAP: I(G)=I(P)

P may not have a P-map in the form of BN

slide-101
SLIDE 101

Perfect map ( Perfect map (P-map P-map)

which graph G to use for P?

Perfect MAP: I(G)=I(P)

P may not have a P-map in the form of BN

Example:

P(x, y, z) = {1/12, 1/6, if x ⊗ y ⊗ z = 0 if x ⊗ y ⊗ z = 1

(X ⊥ Y ), (Y ⊥ Z), (X ⊥ Z) ∈ I(P) (X ⊥ Y ∣ Z), (Y ⊥ Z ∣ Z), (X ⊥ Z ∣ Y ) ∈ / I(P)

slide-102
SLIDE 102

Perfect map ( Perfect map (P-map P-map)

which graph G to use for P?

Perfect MAP: I(G)=I(P)

P may not have a P-map in the form of a BN if P has a P-map: is it unique?

Example:

I(P) = {(X ⊥ Y , Z ∣ ∅), (X ⊥ Y ∣ Z), (X ⊥ Z ∣ Y )}

X X Y Y Z Z

unique up to I-equivalence

slide-103
SLIDE 103

Perfect map ( Perfect map (P-map P-map)

which graph G to use for P?

Perfect MAP: I(G)=I(P)

P may not have a P-map in the form of a BN if P has a P-map: is it unique?

Example:

I(P) = {(X ⊥ Y , Z ∣ ∅), (X ⊥ Y ∣ Z), (X ⊥ Z ∣ Y )}

X X Y Y Z Z

How to find P-MAPs? discussed in learning BNs

unique up to I-equivalence

slide-104
SLIDE 104

Summary Summary

factorization of the dist. local CIs global CIs can be represented using an equivalent class of graphs: alternative factorization different local CIs same global CIs identify the same family of distributions