Chapter 2:
Preliminaries
32 / 384
Preliminaries 32 / 384 Random variables Let V = { V 1 , . . . , V n - - PowerPoint PPT Presentation
Chapter 2: Preliminaries 32 / 384 Random variables Let V = { V 1 , . . . , V n } , n 1 , be a set of random variables. Each variable V i V can take on one of m 2 values; for now we consider 2-valued variables: V i = true ,
Chapter 2:
32 / 384
Random variables Let V = {V1, . . . , Vn}, n ≥ 1, be a set of random variables. Each variable Vi ∈ V can take on one of m ≥ 2 values; for now we consider 2-valued variables:
The set V spans a Boolean Algebra of logical propositions V:
The elements of V obey the usual rules of propositional logic.
33 / 384
The joint probability distribution Definition: Let V be the Boolean Algebra of propositions spanned by a set
that
more specifically, Pr(F) = 0;
that Pr(x ∨ y) = Pr(x) + Pr(y). The function Pr is a joint probability distribution on V ; the function value Pr(x) is the probability of x.
34 / 384
Independence of propositions Definition:
Let V be the Boolean Algebra of propositions spanned by a set of random variables V . Let Pr be a joint probability distribution on V .
Two propositions x, y ∈ V are called independent in Pr if Pr(x ∧ y) = Pr(x) · Pr(y) The propositions x, y ∈ V are called conditionally independent given the proposition z ∈ V if we have that Pr(x ∧ y | z) = Pr(x | z) · Pr(y | z)
35 / 384
The two notions of independence (1)
independent 1:
x y
Can z ∈ V exist such that x and y are dependent given z?
x y z
1The square has area 1, representing the total probability mass.
36 / 384
The two notions of independence (2)
dependent:
x y
Can z ∈ V exist such that x and y are conditionally independent given z?
x y z
37 / 384
Configurations Let V be a set of random variables and let W ⊆ V .
assignments to the variables from W ;
(notation abuse!): W is then considered to be a template that can be filled in with any configuration cW . Example: Let W = {V1, V3, V7}. W = V1 ∧ V3 ∧ V7 denotes a configuration template: filling in values for Vi results in proper propositions/configurations. Some configurations cW of W are: V1 = true ∧ V3 = true ∧ V7 = false v1 ∧ ¬v3 ∧ v7 ¬v1 ∧ v3 ∧ ¬v7
Conventions and notation In the remainder of this course, for distributions on V :
we refer to configurations cV of V Set (bold faced) Singleton Variables/templates (capital) V V Values/configurations cV , v cV , v
probabilities: Pr(cV ), Pr(cV ), Pr(v), Pr(v), Pr(v | cE) distributions: Pr(V ), Pr(V ), Pr(V | e) distribution sets: Pr(V |E), Pr(V |E)
39 / 384
Independence of variables Definition: Let V be a set of random variables and let X, Y , Z ⊆ V . Let Pr be a joint distribution on V . The set of variables X is called conditionally independent of the set Y given the set Z in Pr, if we have that Pr(X | Y ∧ Z) = Pr(X | Z) Remarks:
Pr(cX | cY ∧ cZ) = Pr(cX | cZ) holds for all configurations cX, cY and cZ of X, Y and Z;
Pr(X ∧ Y | Z) = Pr(X | Z) · Pr(Y | Z) (what about ⇐?).
40 / 384
Chapter 3:
41 / 384
A qualitative notion of independence Observation: People are capable of making statements about independences among variables without having to perform numerical calculations. Conclusion: In human reasoning behaviour, the qualitative notion of independence is more fundamental than the quantitative notion
43 / 384
The (probabilistic) independence relation of a joint distribution Definition:
Let V be a set of random variables and let Pr be a joint probability distribution on V .
The independence relation I Pr of Pr is a set I Pr ⊆ P(V ) × P(V ) × P(V ), defined for all X, Y , Z ⊆ V by (X, Z, Y ) ∈ I Pr if and only if Pr(X | Y ∧ Z) = Pr(X | Z) Remarks:
(X, Z, Y ) / ∈ I Pr will be written as ¬I Pr(X, Z, Y );
statement for the joint distribution Pr.
44 / 384
Properties of I Pr: symmetry Lemma: I Pr(X, Z, Y ) if and only if I Pr(Y , Z, X) Proof: I Pr(X, Z, Y ) ⇐ ⇒ Pr(X | Y ∧ Z) = Pr(X | Z) ⇐ ⇒ Pr(X ∧ Y ∧ Z) Pr(Y ∧ Z) = Pr(X ∧ Z) Pr(Z) ⇐ ⇒ Pr(X ∧ Y ∧ Z) Pr(X ∧ Z) = Pr(Y ∧ Z) Pr(Z) ⇐ ⇒ Pr(Y | X ∧ Z) = Pr(Y | Z) ⇐ ⇒ I Pr(Y , Z, X)
Properties of I Pr: decomposition Lemma: I Pr(X, Z, Y ∪ W ) ⇒ I Pr(X, Z, Y ) ∧ I Pr(X, Z, W ) Proof: (sketch) (Note: cY ∪ W = cY ∧ cW !) Suppose that Pr(X | Y ∧ W ∧ Z) = Pr(X | Z). Then, by definition, Pr(X ∧ Y ∧ W ∧ Z) = Pr(Y ∧ W ∧ Z) · Pr(X ∧ Z) Pr(Z) For Pr(X | Y ∧ Z) we find that Pr(X | Y ∧ Z) = Pr(X ∧ Y ∧ Z) Pr(Y ∧ Z) =
Pr(Y ∧ Z) = Pr(X ∧ Z) Pr(Z) = Pr(X | Z)
Properties of I Pr: weak union, contraction Lemma:
I Pr(X, Z, Y ∪ W ) (contraction)
see syllabus) Proof: left as exercise 3.1. What about ⇐?
47 / 384
The definition of the independence relation
Independence relation I Independence relation IPr Joint Pr Distribution Axioms: symmetry, decomposition, weak union, contraction Properties: symmetry, decomposition, weak union, contraction
48 / 384
The (qualitative) independence relation I Definition:
Let V be a set of random variables and let X, Y , Z, W ⊆ V .
An independence relation I on V is a ternary relation I ⊆ P(V ) × P(V ) × P(V ) that satisfies the following properties:
The first property is called the symmetry axiom; the second is called the decomposition axiom; the third is referred to as the weak union axiom; the last one is called contraction.
49 / 384
An example Lemma:
Let I be an independence relation on a set of random variables V . We have that
if I(X, Z, Y ) and I(X ∪ Z, Y , W ) then I(X, Z, W ) for all X, Y , Z, W ⊆ V . Proof: We observe that I(X ∪ Z, Y , W ) ⇒symm I(W , Y , X ∪ Z) ⇒weakunion ⇒ I(W , Y ∪ Z, X) ⇒symm I(X, Y ∪ Z, W ) From I(X, Z, Y ), I(X, Y ∪ Z, W ) and the contraction axiom we have that I(X, Z, W ∪ Y ); decomposition now gives I(X, Z, W ).
Representing independences Different ways exist of representing an independence relation:
stated;
relation are explicitly stated — all other statements are implicitly represented by means of the axioms;
51 / 384
An example Consider V = {V1, V2, V3, V4} and independence relation I on V :
I({V1}, ∅, {V4}) I({V2}, ∅, {V1}) I({V4}, {V1}, {V2}) I({V2}, ∅, {V4}) I({V1, V4}, ∅, {V2}) I({V4}, {V1}, {V3}) I({V3}, ∅, {V4}) I({V2, V4}, ∅, {V1}) I({V4}, {V1}, {V2, V3}) I({V4}, ∅, {V1}) I({V2}, ∅, {V1, V4}) I({V1}, {V2}, {V4}) I({V4}, ∅, {V2}) I({V1}, ∅, {V2, V4}) I({V3}, {V2}, {V4}) I({V4}, ∅, {V3}) I({V2}, {V1}, {V4}) I({V1, V3}, {V2}, {V4}) I({V1, V2}, ∅, {V4}) I({V3}, {V1}, {V4}) I({V4}, {V2}, {V1}) I({V1, V3}, ∅, {V4}) I({V2, V3}, {V1}, {V4}) I({V4}, {V2}, {V3}) I({V2, V3}, ∅, {V4}) I({V4}, {V1, V2}, {V3}) I({V4}, {V2}, {V1, V3}) I({V4}, ∅, {V1, V2}) I({V2}, {V1, V3}, {V4}) I({V1}, {V3}, {V4}) I({V4}, ∅, {V1, V3}) I({V4}, {V1, V3}, {V2}) I({V2}, {V3}, {V4}) I({V4}, ∅, {V2, V3}) I({V1}, {V2, V3}, {V4}) I({V1, V2}, {V3}, {V4}) I({V1, V2, V3}, ∅, {V4}) I({V4}, {V2, V3}, {V1}) I({V1}, {V4}, {V2}) I({V4}, ∅, {V1, V2, V3}) I({V4}, {V3}, {V1, V2}) I({V2}, {V4}, {V1}) I({V1}, ∅, {V2}) I({V4}, {V3}, {V1}) I({V3}, {V1, V2}, {V4})
52 / 384
The representation of an independence relation in an undirected graph Consider an independence relation I and an undirected graph: the global idea is:
edges.
53 / 384
The separation criterion: introduction Definition: Let G = (V G, EG) be an undirected graph with edges EG and nodes V G = {V1, . . . , Vn}, n > 1. Let s be a path in G from a node Vi to a node Vj. The path s is blocked by a set of nodes Z ⊆ V G, if at least one node from Z is on the path s. If s is not blocked by Z, the path is called active given Z.
54 / 384
The separation criterion Definition: Let G = (V G, EG) be an undirected graph. Let X, Y , Z ⊆ V G be sets of nodes in G. The set Z separates the set X from Y in G— Notation: X |Z |Y G— if every simple path in G from a node in X to a node in Y is blocked by Z. Remarks:
undirected graphs;
then X |∅|Y G.
55 / 384
An example
V4 V2 V5 V6 V7 V1 V3
Which of the following separation statements are valid? a) {V1} | {V2} | {V3, V6}G b) {V4} | {V2, V5} | {V6}G c) {V4} | {V1, V2, V5} | {V6}G d) {V1} | {V4} | {V5}G e) {V1, V5, V6} | ∅ | {V7}G f) {V2} | {V5} | {V7}G g) {V1} | {V5} | {V2}G
56 / 384
Independence relations and undirected graphs Definition:
Let I be an independence relation on a set of random variables V . Let G = (V G, EG) be an undirected graph with V G = V .
X, Y , Z ⊆ V we have: if I(X, Z, Y ) then X | Z | Y G;
all X, Y , Z ⊆ V we have: if X | Z | Y G then I(X, Z, Y );
dependency map and an independency map for I.
57 / 384
undirected D-maps: what can they tell?
Let I be an independence relation and G an undirected graph.
Consider a D-map for I, then V1 and V2 neighbours = ⇒ V1, V2 dependent ¬{V1} | Z | {V2}G ¬I({V1}, Z, {V2}) V1 and V2 non-neighbours = ⇒ ?? {V1} | Z | {V2}G dependent independent conditionally independent Note: statements hold for all Z ⊆ V G \ ({V1} ∪ {V2})!
58 / 384
An example Consider the independence relation I on V = {V1, . . . , V4}, defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) Which of the following undirected graphs are examples of D-maps for I ?
V1 V2 V3 V4 V1 V2 V3 V4 V1 V2 V3 V4 V1 V2 V3 V4
59 / 384
Undirected I-maps: what can they tell?
Let I be an independence relation and G an undirected graph.
Consider an I-map for I, then V1 and V2 non-neighbours = ⇒ V1, V2 (cond.) independent {V1} | Z | {V2}G I({V1}, Z, {V2}) V1 and V2 neighbours = ⇒ ?? ¬{V1} | Z | {V2}G dependent independent conditionally independent Note: statements hold for all Z ⊆ V G \ ({V1} ∪ {V2})!
60 / 384
An example Consider the independence relation I on V = {V1, . . . , V4}, defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) Which of the following undirected graphs are examples of I-maps for I ?
V1 V2 V3 V4 V1 V2 V3 V4 V1 V2 V3 V4 V1 V2 V3 V4
61 / 384
Properties of I Let I be an independence relation on a set of random variables V . Lemma: Every independence relation I has an undirected D-map. Proof: The undirected graph G = (V , ∅) is a D-map for I.
Every independence relation I has an undirected I-map. Proof: The undirected graph G′ = (V , V × V ) is an I-map for I.
An example Consider the independence relation I on V = {V1, . . . , V4}, defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) The following undirected graph is a perfect map for I:
V1 V2 V3 V4
Is this P-map for I unique ? Does every I have a P-map ?
63 / 384
An example Consider an experiment with two coins and a bell: the bell sounds iff the two coins have the same outcome after a toss. Consider: variable C1: the outcome of tossing coin one; variable C2: the outcome of tossing coin two; variable B: whether or not the bell sounds; independence relation I for this experiment. We have, among others, that I({C1}, ∅, {C2}) ¬I({C1}, {B}, {C2}) I({C1}, ∅, {B}) ¬I({C1}, {C2}, {B}) I({C2}, ∅, {B}) ¬I({C2}, {C1}, {B}) This independence relation is an example of an independence relation with an induced dependency.
64 / 384
An example Reconsider the experiment with the two coins and the bell.
I of this experiment:
C1 C2 B
C1 C2 B
65 / 384
The representation of an independence relation in a directed graph Consider an independence relation I and a directed graph G: The global idea is:
arcs in the graph;
dependencies.
66 / 384
Introduction The formalism of the directed graph is more expressive than the formalism of the undirected graph:
V2 V3 V1
vs.
V2 V3 V1 V2 V3 V1 V2 V3 V1
67 / 384
Causality ? Consider the following examples:
length age reading weather harvest grain price burglar burglary alarm earthquake
68 / 384
Introduction, continued We aim to represent the following (in)dependences with directed graphs:
V2 V3 V1
V2 V3 V1
V2 V3 V1
69 / 384
The d-separation criterion: introduction Definition: Let G = (V G, AG) be an acyclic directed graph (DAG), and let s be a chain in G between Vi and Vj ∈ V G. Chain s is blocked (or: in-active) by a set Z ⊆ V G if s contains a node W for which one of the following holds:
W Vi/Vj = W W = Vi/Vj W
W
70 / 384
An example Consider the following DAG and some of its chains:
V1 V2 V3 V4 V6 V5 V7
1) V4, V2, V5 from V4 to V5 2) V1, V2, V5, V6, V7 from V1 to V7 3) V3, V4, V6, V5 from V3 to V5 4) V2, V4 from V2 to V4 Which of these chains is blocked by which of the following sets? {V2}, {V5}, {V2, V5}, {V4}, {V6}, {V4, V6}
71 / 384
The d-separation criterion Definition:
Let G = (V G, AG) be an acyclic directed graph. Let X, Y , Z ⊆ V G be sets of nodes in G.
The set Z d-separates X from Y in G—notation: X | Z | Y d
G
—if every simple chain in G from a node in X to a node in Y is blocked by Z. Remarks:
G indicates that all chains between X and Y , if
any, contain a head-to-head node;
d-connected given Z.
72 / 384
An example Consider the following DAG and d-separation statements:
V1 V2 V4 V3 V5
a) {V1} | {V2, V3} | {V5}d
G
b) {V1} | {V4} | {V5}d
G
c) {V2} | {V1} | {V3}d
G
d) {V2} | {V1, V5} | {V3}d
G
e) {V2} | ∅ | {V3}d
G
f) {V1} | {V3, V4} | {V2}d
G
Which d-separation statements are valid in the graph ?
73 / 384
Bayes-Ball for determining d-separation Determine if X | Z | Y d
G by dropping bouncing balls at X
and following the 10 rules of Bayes-ball:
stop
74 / 384
Independence relations and directed graphs Definition:
Let I be an independence relation on a set of random variables V . Let G = (V G, AG) be an acyclic directed graph with V G = V .
for I if for every X, Y , Z ⊆ V we have that: if I(X, Z, Y ) then X |Z |Y d
G;
( I-map ) for I if for every X, Y , Z ⊆ V we have that: if X |Z |Y d
G then I(X, Z, Y );
if G is both a dependency map and an independency map for I.
75 / 384
Directed D-maps: what can they tell?
Let I be an independence relation and G a DAG.
Consider a D-map for I, then V1 and V2 neighbours = ⇒ V1, V2 dependent ¬{V1} | Z | {V2}G ¬I({V1}, Z, {V2}) V1 and V2 non-neighbours = ⇒ ?? {V1} | Z | {V2}G dependent independent
conditionally dependent (Z = ∅) conditionally independent (Z = ∅)
Note: statements hold for all Z ⊆ V G \ ({V1} ∪ {V2})!
76 / 384
An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, ∅, {V2}) and I({V1, V2}, {V3}, {V4}) Which of the following DAGs are D-maps for I ?
V1 V3 V2 V4 V1 V3 V2 V4 V1 V3 V2 V4 V1 V3 V2 V4
77 / 384
Directed I-maps
Let I be an independence relation and G a DAG.
Consider an I-map for I, then V1 and V2 non-neighbours = ⇒ V1, V2 (cond.) independent, or
{V1} | Z | {V2}G I({V1}, Z, {V2}) V1 and V2 neighbours = ⇒ ?? ¬{V1} | Z | {V2}G dependent independent conditionally dependent conditionally independent Note: statements hold for all Z ⊆ V G \ ({V1} ∪ {V2})!
78 / 384
An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, ∅, {V2}) and I({V1, V2}, {V3}, {V4}) Which of the following DAGs are I-maps for I ?
V1 V3 V2 V4 V1 V3 V2 V4 V1 V3 V2 V4 V1 V3 V2 V4
79 / 384
An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, ∅, {V2}) and I({V1, V2}, {V3}, {V4}) The following DAG is a perfect map for I:
V1 V3 V2 V4
Is this P-map for I unique ?
80 / 384
An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) The relation I does not have a directed perfect map. Consider for example the following DAG G:
V1 V2 V3 V4
In graph G we have that {V1} | {V2, V3} | {V4}d
G, but also that
{V2} | {V1} | {V3}d
G !
81 / 384
Independence relations and their graphical representation
directed acyclic graphs undirected graphs GRAPH- ISOMORPH independence relations
(Graph-isomorph: independence relation with perfect map.)
82 / 384
An I-map or a D-map ? Reconsider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) Compare the following two representations of independence relation I: V1 V2 V3 V4 a D-map and V1 V2 V3 V4 an I-map
83 / 384
Recall what we were looking for. . .
(conditional) distributions involving less variables: Pr(V ) = Pr(Vn | Vn−1 ∧ . . . ∧ V1) · . . . · Pr(V2 | V1) · Pr(V1) (chain rule) = . . . = . . . = Pr(Vn) · . . . · Pr(V2) · Pr(V1) (assuming mutual independence among all Vi)
X is truly independent of Y
84 / 384
A minimal I-map Definition:
Let I be an independence relation on a set of random variables V . Let G = (V G, AG) be a graph with V G = V .
The graph G is called a minimal I-map for I if the following conditions hold:
85 / 384
An example Consider the independence relation I on V = {V1, . . . , V4} defined by I({V1}, {V2, V3}, {V4}) and I({V2}, {V1, V4}, {V3}) The following DAG is a minimal I-map for I:
V1 V2 V3 V4
Is this minimal I-map for I unique ?
86 / 384
Directed or undirected ? (I) Directed and undirected I-maps are related. Definition: The moral graph of a DAG G = (V G, AG) is the undirected graph obtained as follows:
unconnected parents Vi, Vj ∈ ρG(Vk);
Definition: A graph is triangulated or chordal if any loop of length ≥ 4 contains a shortcut. Proposition: Let I be an independence relation over V . Consider graphs G = (V G, AG) and G′ = (V , EG′). Then,
moralisation+drop direction
= ⇒ G is an I-map for I G′ is an I-map for I ⇐ =
triangulation+add direction
87 / 384
Directed or undirected ? (II)
Consider independence relation I Pr over V and graph G with V = V G. Consider the following properties (partly proven later):
I-map of I Pr ⇐ ⇒ Pr can be written as Pr(V ) =
Pr(Vi | ρG(Vi))
I-map of I Pr ⇐ ⇒ Pr can be written as Pr(V ) = K ·
Φ(Ci) for some normalisation factor K. ← − what’s the meaning
these clique potentials?!?
88 / 384