Preliminaries 32 / 384 Random variables Let V = { V 1 , . . . , V n - - PowerPoint PPT Presentation

preliminaries
SMART_READER_LITE
LIVE PREVIEW

Preliminaries 32 / 384 Random variables Let V = { V 1 , . . . , V n - - PowerPoint PPT Presentation

Chapter 2: Preliminaries 32 / 384 Random variables Let V = { V 1 , . . . , V n } , n 1 , be a set of random variables. Each variable V i V can take on one of m 2 values; for now we consider 2-valued variables: V i = true ,


slide-1
SLIDE 1

Chapter 2:

Preliminaries

32 / 384

slide-2
SLIDE 2

Random variables Let V = {V1, . . . , Vn}, n ≥ 1, be a set of random variables. Each variable Vi ∈ V can take on one of m ≥ 2 values; for now we consider 2-valued variables:

  • Vi = true, denoted by vi;
  • Vi = false, denoted by ¬vi (or by vi).

The set V spans a Boolean Algebra of logical propositions V:

  • T(rue), F(alse) ∈ V;
  • for all variables Vi ∈ V we have that vi ∈ V;
  • for all x ∈ V we have that ¬x ∈ V;
  • for all x, y ∈ V we have that x ∧ y ∈ V and x ∨ y ∈ V.

The elements of V obey the usual rules of propositional logic.

33 / 384

slide-3
SLIDE 3

The joint probability distribution Definition: Let V be the Boolean Algebra of propositions spanned by a set

  • f random variables V . Let Pr : V → [0, 1] be a function such

that

  • Pr is positive: for each x ∈ V we have that Pr(x) ≥ 0 and,

more specifically, Pr(F) = 0;

  • Pr is normed: Pr(T) = 1;
  • Pr is additive: we have, for each x, y ∈ V with x ∧ y ≡ F,

that Pr(x ∨ y) = Pr(x) + Pr(y). The function Pr is a joint probability distribution on V ; the function value Pr(x) is the probability of x.

34 / 384

slide-4
SLIDE 4

Independence of propositions Definition:

Let V be the Boolean Algebra of propositions spanned by a set of random variables V . Let Pr be a joint probability distribution on V .

Two propositions x, y ∈ V are called independent in Pr if Pr(x ∧ y) = Pr(x) · Pr(y) The propositions x, y ∈ V are called conditionally independent given the proposition z ∈ V if we have that Pr(x ∧ y | z) = Pr(x | z) · Pr(y | z)

35 / 384

slide-5
SLIDE 5

The two notions of independence (1)

  • Consider two propositions x, y ∈ V such that x and y are

independent 1:

x y

Can z ∈ V exist such that x and y are dependent given z?

  • Yes:

x y z

1The square has area 1, representing the total probability mass.

36 / 384

slide-6
SLIDE 6

The two notions of independence (2)

  • Consider two propositions x, y ∈ V such that x and y are

dependent:

x y

Can z ∈ V exist such that x and y are conditionally independent given z?

  • Yes:

x y z

37 / 384

slide-7
SLIDE 7

Configurations Let V be a set of random variables and let W ⊆ V .

  • a configuration cW of W is a conjunction of value

assignments to the variables from W ;

  • convention: c∅ = T;
  • w is used to denote a specific configuration of W .
  • W also indicates all possible configurations to the set W

(notation abuse!): W is then considered to be a template that can be filled in with any configuration cW . Example: Let W = {V1, V3, V7}. W = V1 ∧ V3 ∧ V7 denotes a configuration template: filling in values for Vi results in proper propositions/configurations. Some configurations cW of W are: V1 = true ∧ V3 = true ∧ V7 = false v1 ∧ ¬v3 ∧ v7 ¬v1 ∧ v3 ∧ ¬v7

  • 38 / 384
slide-8
SLIDE 8

Conventions and notation In the remainder of this course, for distributions on V :

  • rather than talking about propositions x ∈ V spanned by V

we refer to configurations cV of V Set (bold faced) Singleton Variables/templates (capital) V V Values/configurations cV , v cV , v

  • conjunctions are often left implicit: e.g. v1 v2 denotes v1 ∧ v2;
  • note the following differences (!)

probabilities: Pr(cV ), Pr(cV ), Pr(v), Pr(v), Pr(v | cE) distributions: Pr(V ), Pr(V ), Pr(V | e) distribution sets: Pr(V |E), Pr(V |E)

39 / 384

slide-9
SLIDE 9

Independence of variables Definition: Let V be a set of random variables and let X, Y , Z ⊆ V . Let Pr be a joint distribution on V . The set of variables X is called conditionally independent of the set Y given the set Z in Pr, if we have that Pr(X | Y ∧ Z) = Pr(X | Z) Remarks:

  • the expression Pr(X | Y ∧ Z) = Pr(X | Z) represents that

Pr(cX | cY ∧ cZ) = Pr(cX | cZ) holds for all configurations cX, cY and cZ of X, Y and Z;

  • Pr(X | Y ∧ Z) = Pr(X | Z) ⇒

Pr(X ∧ Y | Z) = Pr(X | Z) · Pr(Y | Z) (what about ⇐?).

40 / 384

slide-10
SLIDE 10

Chapter 3:

Independences and Graphical Representations

41 / 384

slide-11
SLIDE 11

A qualitative notion of independence Observation: People are capable of making statements about independences among variables without having to perform numerical calculations. Conclusion: In human reasoning behaviour, the qualitative notion of independence is more fundamental than the quantitative notion

  • f independence.

43 / 384

slide-12
SLIDE 12

The (probabilistic) independence relation of a joint distribution Definition:

Let V be a set of random variables and let Pr be a joint probability distribution on V .

The independence relation I Pr of Pr is a set I Pr ⊆ P(V ) × P(V ) × P(V ), defined for all X, Y , Z ⊆ V by (X, Z, Y ) ∈ I Pr if and only if Pr(X | Y ∧ Z) = Pr(X | Z) Remarks:

  • (X, Z, Y ) ∈ I Pr will be written as I Pr(X, Z, Y );

(X, Z, Y ) / ∈ I Pr will be written as ¬I Pr(X, Z, Y );

  • a statement I Pr(X, Z, Y ) is called an independence

statement for the joint distribution Pr.

44 / 384

slide-13
SLIDE 13

Properties of I Pr: symmetry Lemma: I Pr(X, Z, Y ) if and only if I Pr(Y , Z, X) Proof: I Pr(X, Z, Y ) ⇐ ⇒ Pr(X | Y ∧ Z) = Pr(X | Z) ⇐ ⇒ Pr(X ∧ Y ∧ Z) Pr(Y ∧ Z) = Pr(X ∧ Z) Pr(Z) ⇐ ⇒ Pr(X ∧ Y ∧ Z) Pr(X ∧ Z) = Pr(Y ∧ Z) Pr(Z) ⇐ ⇒ Pr(Y | X ∧ Z) = Pr(Y | Z) ⇐ ⇒ I Pr(Y , Z, X)

  • 45 / 384
slide-14
SLIDE 14

Properties of I Pr: decomposition Lemma: I Pr(X, Z, Y ∪ W ) ⇒ I Pr(X, Z, Y ) ∧ I Pr(X, Z, W ) Proof: (sketch) (Note: cY ∪ W = cY ∧ cW !) Suppose that Pr(X | Y ∧ W ∧ Z) = Pr(X | Z). Then, by definition, Pr(X ∧ Y ∧ W ∧ Z) = Pr(Y ∧ W ∧ Z) · Pr(X ∧ Z) Pr(Z) For Pr(X | Y ∧ Z) we find that Pr(X | Y ∧ Z) = Pr(X ∧ Y ∧ Z) Pr(Y ∧ Z) =

  • cW Pr(X ∧ Y ∧ Z ∧ cW )

Pr(Y ∧ Z) = Pr(X ∧ Z) Pr(Z) = Pr(X | Z)

  • 46 / 384
slide-15
SLIDE 15

Properties of I Pr: weak union, contraction Lemma:

  • if I Pr(X, Z, Y ∪ W ) then I Pr(X, Z ∪ W , Y ) (weak union);
  • if I Pr(X, Z, W ) and I Pr(X, Z ∪ W , Y ) then

I Pr(X, Z, Y ∪ W ) (contraction)

  • (for strictly positive Pr also the intersection property holds;

see syllabus) Proof: left as exercise 3.1. What about ⇐?

47 / 384

slide-16
SLIDE 16

The definition of the independence relation

Independence relation I Independence relation IPr Joint Pr Distribution Axioms: symmetry, decomposition, weak union, contraction Properties: symmetry, decomposition, weak union, contraction

48 / 384

slide-17
SLIDE 17

The (qualitative) independence relation I Definition:

Let V be a set of random variables and let X, Y , Z, W ⊆ V .

An independence relation I on V is a ternary relation I ⊆ P(V ) × P(V ) × P(V ) that satisfies the following properties:

  • if I(X, Z, Y ) then I(Y , Z, X);
  • if I(X, Z, Y ∪ W ) then I(X, Z, Y ) and I(X, Z, W );
  • if I(X, Z, Y ∪ W ) then I(X, Z ∪ W , Y );
  • if I(X, Z, W ) and I(X, Z ∪ W , Y ) then I(X, Z, Y ∪ W ).

The first property is called the symmetry axiom; the second is called the decomposition axiom; the third is referred to as the weak union axiom; the last one is called contraction.

49 / 384

slide-18
SLIDE 18

An example Lemma:

Let I be an independence relation on a set of random variables V . We have that

if I(X, Z, Y ) and I(X ∪ Z, Y , W ) then I(X, Z, W ) for all X, Y , Z, W ⊆ V . Proof: We observe that I(X ∪ Z, Y , W ) ⇒symm I(W , Y , X ∪ Z) ⇒weakunion ⇒ I(W , Y ∪ Z, X) ⇒symm I(X, Y ∪ Z, W ) From I(X, Z, Y ), I(X, Y ∪ Z, W ) and the contraction axiom we have that I(X, Z, W ∪ Y ); decomposition now gives I(X, Z, W ).

  • 50 / 384
slide-19
SLIDE 19

Representing independences Different ways exist of representing an independence relation:

  • all independence statements of the relation are explicitly

stated;

  • only the independence statements of a suitable subset of the

relation are explicitly stated — all other statements are implicitly represented by means of the axioms;

  • the independence relation is coded in a graph;
  • . . .

51 / 384

slide-20
SLIDE 20

An example Consider V = {V1, V2, V3, V4} and independence relation I on V :

I({V1}, ∅, {V4}) I({V2}, ∅, {V1}) I({V4}, {V1}, {V2}) I({V2}, ∅, {V4}) I({V1, V4}, ∅, {V2}) I({V4}, {V1}, {V3}) I({V3}, ∅, {V4}) I({V2, V4}, ∅, {V1}) I({V4}, {V1}, {V2, V3}) I({V4}, ∅, {V1}) I({V2}, ∅, {V1, V4}) I({V1}, {V2}, {V4}) I({V4}, ∅, {V2}) I({V1}, ∅, {V2, V4}) I({V3}, {V2}, {V4}) I({V4}, ∅, {V3}) I({V2}, {V1}, {V4}) I({V1, V3}, {V2}, {V4}) I({V1, V2}, ∅, {V4}) I({V3}, {V1}, {V4}) I({V4}, {V2}, {V1}) I({V1, V3}, ∅, {V4}) I({V2, V3}, {V1}, {V4}) I({V4}, {V2}, {V3}) I({V2, V3}, ∅, {V4}) I({V4}, {V1, V2}, {V3}) I({V4}, {V2}, {V1, V3}) I({V4}, ∅, {V1, V2}) I({V2}, {V1, V3}, {V4}) I({V1}, {V3}, {V4}) I({V4}, ∅, {V1, V3}) I({V4}, {V1, V3}, {V2}) I({V2}, {V3}, {V4}) I({V4}, ∅, {V2, V3}) I({V1}, {V2, V3}, {V4}) I({V1, V2}, {V3}, {V4}) I({V1, V2, V3}, ∅, {V4}) I({V4}, {V2, V3}, {V1}) I({V1}, {V4}, {V2}) I({V4}, ∅, {V1, V2, V3}) I({V4}, {V3}, {V1, V2}) I({V2}, {V4}, {V1}) I({V1}, ∅, {V2}) I({V4}, {V3}, {V1}) I({V3}, {V1, V2}, {V4})

52 / 384

slide-21
SLIDE 21

The representation of an independence relation in an undirected graph Consider an independence relation I and an undirected graph: the global idea is:

  • represent each variable Vi by a node Vi in the graph, and v.v.;
  • code the independence statements of I by means of missing

edges.

53 / 384

slide-22
SLIDE 22

The separation criterion: introduction Definition: Let G = (V G, EG) be an undirected graph with edges EG and nodes V G = {V1, . . . , Vn}, n > 1. Let s be a path in G from a node Vi to a node Vj. The path s is blocked by a set of nodes Z ⊆ V G, if at least one node from Z is on the path s. If s is not blocked by Z, the path is called active given Z.

54 / 384

slide-23
SLIDE 23

The separation criterion Definition: Let G = (V G, EG) be an undirected graph. Let X, Y , Z ⊆ V G be sets of nodes in G. The set Z separates the set X from Y in G— Notation: X |Z |Y G— if every simple path in G from a node in X to a node in Y is blocked by Z. Remarks:

  • the above notion is known as the separation criterion for

undirected graphs;

  • if there is no path between the nodes X and Y in a graph G,

then X |∅|Y G.

55 / 384

slide-24
SLIDE 24

An example

V4 V2 V5 V6 V7 V1 V3

Which of the following separation statements are valid? a) {V1} | {V2} | {V3, V6}G b) {V4} | {V2, V5} | {V6}G c) {V4} | {V1, V2, V5} | {V6}G d) {V1} | {V4} | {V5}G e) {V1, V5, V6} | ∅ | {V7}G f) {V2} | {V5} | {V7}G g) {V1} | {V5} | {V2}G

56 / 384

slide-25
SLIDE 25

Independence relations and undirected graphs Definition:

Let I be an independence relation on a set of random variables V . Let G = (V G, EG) be an undirected graph with V G = V .

  • graph G is called a dependency map ( D-map ) for I if for all

X, Y , Z ⊆ V we have: if I(X, Z, Y ) then X | Z | Y G;

  • graph G is called an independency map ( I-map ) for I if for

all X, Y , Z ⊆ V we have: if X | Z | Y G then I(X, Z, Y );

  • graph G is called a perfect map ( P-map ) for I if G is both a

dependency map and an independency map for I.

57 / 384

slide-26
SLIDE 26

undirected D-maps: what can they tell?

Let I be an independence relation and G an undirected graph.

Consider a D-map for I, then V1 and V2 neighbours = ⇒ V1, V2 dependent ¬{V1} | Z | {V2}G ¬I({V1}, Z, {V2}) V1 and V2 non-neighbours = ⇒ ?? {V1} | Z | {V2}G dependent independent conditionally independent Note: statements hold for all Z ⊆ V G \ ({V1} ∪ {V2})!

58 / 384

slide-27
SLIDE 27

An example Consider the independence relation I on V = {V1, . . . , V4}, defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) Which of the following undirected graphs are examples of D-maps for I ?

V1 V2 V3 V4 V1 V2 V3 V4 V1 V2 V3 V4 V1 V2 V3 V4

59 / 384

slide-28
SLIDE 28

Undirected I-maps: what can they tell?

Let I be an independence relation and G an undirected graph.

Consider an I-map for I, then V1 and V2 non-neighbours = ⇒ V1, V2 (cond.) independent {V1} | Z | {V2}G I({V1}, Z, {V2}) V1 and V2 neighbours = ⇒ ?? ¬{V1} | Z | {V2}G dependent independent conditionally independent Note: statements hold for all Z ⊆ V G \ ({V1} ∪ {V2})!

60 / 384

slide-29
SLIDE 29

An example Consider the independence relation I on V = {V1, . . . , V4}, defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) Which of the following undirected graphs are examples of I-maps for I ?

V1 V2 V3 V4 V1 V2 V3 V4 V1 V2 V3 V4 V1 V2 V3 V4

61 / 384

slide-30
SLIDE 30

Properties of I Let I be an independence relation on a set of random variables V . Lemma: Every independence relation I has an undirected D-map. Proof: The undirected graph G = (V , ∅) is a D-map for I.

  • Lemma:

Every independence relation I has an undirected I-map. Proof: The undirected graph G′ = (V , V × V ) is an I-map for I.

  • 62 / 384
slide-31
SLIDE 31

An example Consider the independence relation I on V = {V1, . . . , V4}, defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) The following undirected graph is a perfect map for I:

V1 V2 V3 V4

Is this P-map for I unique ? Does every I have a P-map ?

63 / 384

slide-32
SLIDE 32

An example Consider an experiment with two coins and a bell: the bell sounds iff the two coins have the same outcome after a toss. Consider: variable C1: the outcome of tossing coin one; variable C2: the outcome of tossing coin two; variable B: whether or not the bell sounds; independence relation I for this experiment. We have, among others, that I({C1}, ∅, {C2}) ¬I({C1}, {B}, {C2}) I({C1}, ∅, {B}) ¬I({C1}, {C2}, {B}) I({C2}, ∅, {B}) ¬I({C2}, {C1}, {B}) This independence relation is an example of an independence relation with an induced dependency.

64 / 384

slide-33
SLIDE 33

An example Reconsider the experiment with the two coins and the bell.

  • the following graph is a D-map for the independence relation

I of this experiment:

C1 C2 B

  • the following graph is an I-map for I:

C1 C2 B

  • Does I have a perfect map ?

65 / 384

slide-34
SLIDE 34

The representation of an independence relation in a directed graph Consider an independence relation I and a directed graph G: The global idea is:

  • represent each variable Vi of I by a node Vi in G, and v.v.;
  • code the independence statements of I by means of missing

arcs in the graph;

  • use the direction of the arcs to represent induced

dependencies.

66 / 384

slide-35
SLIDE 35

Introduction The formalism of the directed graph is more expressive than the formalism of the undirected graph:

V2 V3 V1

vs.

V2 V3 V1 V2 V3 V1 V2 V3 V1

67 / 384

slide-36
SLIDE 36

Causality ? Consider the following examples:

length age reading weather harvest grain price burglar burglary alarm earthquake

68 / 384

slide-37
SLIDE 37

Introduction, continued We aim to represent the following (in)dependences with directed graphs:

  • I({V2}, ∅, {V3}) and ¬I({V2}, {V1}, {V3}):

V2 V3 V1

  • I({V2}, {V1}, {V3}) and ¬I({V2}, ∅, {V3}):

V2 V3 V1

  • I({V2}, {V1}, {V3}) and ¬I({V2}, ∅, {V3}):

V2 V3 V1

69 / 384

slide-38
SLIDE 38

The d-separation criterion: introduction Definition: Let G = (V G, AG) be an acyclic directed graph (DAG), and let s be a chain in G between Vi and Vj ∈ V G. Chain s is blocked (or: in-active) by a set Z ⊆ V G if s contains a node W for which one of the following holds:

  • W ∈ Z and W has at most one incoming arc on chain s:

W Vi/Vj = W W = Vi/Vj W

  • σ∗(W) ∩ Z = ∅ and W has two incoming arcs on chain s:

W

70 / 384

slide-39
SLIDE 39

An example Consider the following DAG and some of its chains:

V1 V2 V3 V4 V6 V5 V7

1) V4, V2, V5 from V4 to V5 2) V1, V2, V5, V6, V7 from V1 to V7 3) V3, V4, V6, V5 from V3 to V5 4) V2, V4 from V2 to V4 Which of these chains is blocked by which of the following sets? {V2}, {V5}, {V2, V5}, {V4}, {V6}, {V4, V6}

71 / 384

slide-40
SLIDE 40

The d-separation criterion Definition:

Let G = (V G, AG) be an acyclic directed graph. Let X, Y , Z ⊆ V G be sets of nodes in G.

The set Z d-separates X from Y in G—notation: X | Z | Y d

G

—if every simple chain in G from a node in X to a node in Y is blocked by Z. Remarks:

  • The above notion is known as the d-separation criterion;
  • X |∅|Y d

G indicates that all chains between X and Y , if

any, contain a head-to-head node;

  • if X and Y are not d-separated by Z, we say that they are

d-connected given Z.

72 / 384

slide-41
SLIDE 41

An example Consider the following DAG and d-separation statements:

V1 V2 V4 V3 V5

a) {V1} | {V2, V3} | {V5}d

G

b) {V1} | {V4} | {V5}d

G

c) {V2} | {V1} | {V3}d

G

d) {V2} | {V1, V5} | {V3}d

G

e) {V2} | ∅ | {V3}d

G

f) {V1} | {V3, V4} | {V2}d

G

Which d-separation statements are valid in the graph ?

73 / 384

slide-42
SLIDE 42

Bayes-Ball for determining d-separation Determine if X | Z | Y d

G by dropping bouncing balls at X

and following the 10 rules of Bayes-ball:

  • Z is shaded
  • a chain is active until a ball travelling along it meets a

stop

  • any node visited by a Bayes ball cannot be in Y

74 / 384

slide-43
SLIDE 43

Independence relations and directed graphs Definition:

Let I be an independence relation on a set of random variables V . Let G = (V G, AG) be an acyclic directed graph with V G = V .

  • the graph G is called a (directed) dependency map ( D-map )

for I if for every X, Y , Z ⊆ V we have that: if I(X, Z, Y ) then X |Z |Y d

G;

  • the graph G is called a (directed) independency map

( I-map ) for I if for every X, Y , Z ⊆ V we have that: if X |Z |Y d

G then I(X, Z, Y );

  • the graph G is called a (directed) perfect map ( P-map ) for I

if G is both a dependency map and an independency map for I.

75 / 384

slide-44
SLIDE 44

Directed D-maps: what can they tell?

Let I be an independence relation and G a DAG.

Consider a D-map for I, then V1 and V2 neighbours = ⇒ V1, V2 dependent ¬{V1} | Z | {V2}G ¬I({V1}, Z, {V2}) V1 and V2 non-neighbours = ⇒ ?? {V1} | Z | {V2}G dependent independent

conditionally dependent (Z = ∅) conditionally independent (Z = ∅)

Note: statements hold for all Z ⊆ V G \ ({V1} ∪ {V2})!

76 / 384

slide-45
SLIDE 45

An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, ∅, {V2}) and I({V1, V2}, {V3}, {V4}) Which of the following DAGs are D-maps for I ?

V1 V3 V2 V4 V1 V3 V2 V4 V1 V3 V2 V4 V1 V3 V2 V4

77 / 384

slide-46
SLIDE 46

Directed I-maps

Let I be an independence relation and G a DAG.

Consider an I-map for I, then V1 and V2 non-neighbours = ⇒ V1, V2 (cond.) independent, or

  • cond. dependent (= induced)

{V1} | Z | {V2}G I({V1}, Z, {V2}) V1 and V2 neighbours = ⇒ ?? ¬{V1} | Z | {V2}G dependent independent conditionally dependent conditionally independent Note: statements hold for all Z ⊆ V G \ ({V1} ∪ {V2})!

78 / 384

slide-47
SLIDE 47

An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, ∅, {V2}) and I({V1, V2}, {V3}, {V4}) Which of the following DAGs are I-maps for I ?

V1 V3 V2 V4 V1 V3 V2 V4 V1 V3 V2 V4 V1 V3 V2 V4

79 / 384

slide-48
SLIDE 48

An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, ∅, {V2}) and I({V1, V2}, {V3}, {V4}) The following DAG is a perfect map for I:

V1 V3 V2 V4

Is this P-map for I unique ?

80 / 384

slide-49
SLIDE 49

An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) The relation I does not have a directed perfect map. Consider for example the following DAG G:

V1 V2 V3 V4

In graph G we have that {V1} | {V2, V3} | {V4}d

G, but also that

{V2} | {V1} | {V3}d

G !

81 / 384

slide-50
SLIDE 50

Independence relations and their graphical representation

directed acyclic graphs undirected graphs GRAPH- ISOMORPH independence relations

(Graph-isomorph: independence relation with perfect map.)

82 / 384

slide-51
SLIDE 51

An I-map or a D-map ? Reconsider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) Compare the following two representations of independence relation I: V1 V2 V3 V4 a D-map and V1 V2 V3 V4 an I-map

83 / 384

slide-52
SLIDE 52

Recall what we were looking for. . .

  • Compact representation of independence relation of Pr;
  • Factorise joint more efficiently than with chain rule → store

(conditional) distributions involving less variables: Pr(V ) = Pr(Vn | Vn−1 ∧ . . . ∧ V1) · . . . · Pr(V2 | V1) · Pr(V1) (chain rule) = . . . = . . . = Pr(Vn) · . . . · Pr(V2) · Pr(V1) (assuming mutual independence among all Vi)

  • Pr(X ∧ Y ) = Pr(X) · Pr(Y ) is mathematically correct only if

X is truly independent of Y

84 / 384

slide-53
SLIDE 53

A minimal I-map Definition:

Let I be an independence relation on a set of random variables V . Let G = (V G, AG) be a graph with V G = V .

The graph G is called a minimal I-map for I if the following conditions hold:

  • G is an I-map for I, and
  • no proper subgraph of G is an I-map for I.

85 / 384

slide-54
SLIDE 54

An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) The following DAG is a minimal I-map for I:

V1 V2 V3 V4

Is this minimal I-map for I unique ?

86 / 384

slide-55
SLIDE 55

Directed or undirected ? (I) Directed and undirected I-maps are related. Definition: The moral graph of a DAG G = (V G, AG) is the undirected graph obtained as follows:

  • for each Vk ∈ V G add an edge between each pair of

unconnected parents Vi, Vj ∈ ρG(Vk);

  • drop the directions of all arcs.

Definition: A graph is triangulated or chordal if any loop of length ≥ 4 contains a shortcut. Proposition: Let I be an independence relation over V . Consider graphs G = (V G, AG) and G′ = (V , EG′). Then,

moralisation+drop direction

= ⇒ G is an I-map for I G′ is an I-map for I ⇐ =

triangulation+add direction

87 / 384

slide-56
SLIDE 56

Directed or undirected ? (II)

Consider independence relation I Pr over V and graph G with V = V G. Consider the following properties (partly proven later):

  • Let G be a directed acyclic graph. Then G is a directed

I-map of I Pr ⇐ ⇒ Pr can be written as Pr(V ) =

  • Vi

Pr(Vi | ρG(Vi))

  • Let G be an undirected graph. Then G is an undirected

I-map of I Pr ⇐ ⇒ Pr can be written as Pr(V ) = K ·

  • Ci

Φ(Ci) for some normalisation factor K. ← − what’s the meaning

  • f

these clique potentials?!?

88 / 384