- 21. Independence in PGMs;
Example PGMs
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: - - PowerPoint PPT Presentation
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Statistical Machine Learning (S2 2017) Lecture 21 Independence PGMs encode assumption of statistical independence between
Example PGMs
Lecture 21 Statistical Machine Learning (S2 2017)
2
Lecture 21 Statistical Machine Learning (S2 2017)
* Node table: Pr πβπππ|πππ πππ’π‘ * Child directly depends on parents
3
S T L Pr π1, π3, β¦ , π5 = β Pr π8|π
9 β πππ πππ’π‘(π8) 5 8=1
Graph encodes:
Lecture 21 Statistical Machine Learning (S2 2017)
4
* Marginal independence P(X, Y) = P(X) P(Y) * Conditional independence P(X, Y | Z) = P(X | Z) P(Y | Z)
* RVs in set A are independent of RVs in set B, when given the values of RVs in C. * Symmetric: can swap roles of A and B * A A β₯ β₯ B B denotes marginal independence, C = β
* Caveat: dependence does not follow in general when X and Y are not independent
Lecture 21 Statistical Machine Learning (S2 2017)
* X β Y? Yes β P(X, Y) = P(X) P(X)
5
X Y X Y Z
Lecture 21 Statistical Machine Learning (S2 2017)
* X β Z? No β π π, π = β π π π π π(π|π, π)
* X β Y? Yes β π π, π = β π π π π π π π, π
= π π π(π)
6
X Y Z
Marginal independence denoted Xβ₯Y
Lecture 21 Statistical Machine Learning (S2 2017)
7
X Y Z
X Y Z
π π, π = β π π π π π π π|π
β¦ No π π, π = β π π π π π π π|π
... No
Lecture 21 Statistical Machine Learning (S2 2017)
* however, must account for edge directions * relates (loosely) to causality: if edges encode causal links, can X affect (cause) Y?
* no edges, in any direction Γ independent * intervening node with incoming edges from X and Y (aka head-to-head) Γ independent * head-to-tail, tail-to-tail Γ not (necessarily) independent
8
Lecture 21 Statistical Machine Learning (S2 2017)
* Test by trying to show P(X,Y|Z) = P(X|Z) P(Y|Z).
9
X Y Z X Y Z X Y Z
Lecture 21 Statistical Machine Learning (S2 2017)
10
X Y Z
P(X, Y |Z) = P(Z)P(X|Z)P(Y |Z) P(Z) = P(X|Z)P(Y |Z)
X Y Z
P(X, Y |Z) = P(X)P(Z|X)P(Y |Z) P(Z) = P(X|Z)P(Z)P(Y |Z) P(Z) = P(X|Z)P(Y |Z)
Lecture 21 Statistical Machine Learning (S2 2017)
* cannot factorise the last canonical graph
* E.g., X and Y are binary coin flips, and Z is whether they land the same side up. Given Z, then X and Y become completely dependent (deterministic). * A.k.a. Berkson's paradox N.b., Marginal dependence β conditional independence!
11
X Y Z
Lecture 21 Statistical Machine Learning (S2 2017)
* P(A=1|W=1) = 0.004 * P(A=1|D=1,W=1) = 0.003 * P(A=1|D=0,W=1) = 0.005
12
A D P(W=1 |A,D) 0.1 1 0.3 1 0.5 1 1 0.8 A Prob 0.999 1 0.001 D Prob 0.9 1 0.1 A D W
Lecture 21 Statistical Machine Learning (S2 2017)
* attempt factorise to test Aβ₯D π G
13
A D W G
P(A, D|G) = X
W
P(A)P(D)P(W|A, D)P(G|W) = P(A)P(D)P(G|A, D)
A D G
Lecture 21 Statistical Machine Learning (S2 2017)
* marginal independence relates (loosely) to causality: if edges encode causal links, can X affect (cause or be caused by) Y? * conditional independence less intuitive
* based on paths separating nodes, i.e., do they contain nodes with head-to-head, head-to-tail or tail-to-tail links? * can all [undirected!] paths connecting two nodes be blocked by an independence relation?
14
Lecture 21 Statistical Machine Learning (S2 2017)
15
CTL FG GRL FA AS
Lecture 21 Statistical Machine Learning (S2 2017)
* understand what independence assumptions are being made; not just the obvious ones * informs trade-off between expressiveness and complexity
* computing of conditional / marginal distributions must respect in/dependences between RVs * affects complexity (space, time) of inference
16
Lecture 21 Statistical Machine Learning (S2 2017)
* what conditioning variables can be safely dropped from P(Xj | X1, X2, β¦, Xj-1, Xj+1, β¦, Xn)?
17
Lecture 21 Statistical Machine Learning (S2 2017)
18
Lecture 21 Statistical Machine Learning (S2 2017)
* Edges undirected
* Each node a r.v. * Each clique C has βfactorβ ΟT π
9: π β π· β₯ 0
* Joint β product of factors
* Edged directed
* Each node a r.v. * Each node has conditional π π8|π
9 β πππ πππ’π‘(π8)
* Joint = product of condβls
19
Key difference = normalisation
Lecture 21 Statistical Machine Learning (S2 2017)
* Clique: a set of fully connected nodes (e.g., A-D, C-D, C-D-F) * Maximal clique: largest cliques in graph (not C-D, due to C-D-F)
* where Ο is a positive function and Z is the normalising βpartitionβ function
20
A E D B C F
P(a, b, c, d, e, f) = 1 Z Ο1(a, b)Ο2(b, c)Ο3(a, d)Ο4(d, c, f)Ο5(d, e)
Z = X
a,b,c,d,e,f
Ο1(a, b)Ο2(b, c)Ο3(a, d)Ο4(d, c, f)Ο5(d, e)
Lecture 21 Statistical Machine Learning (S2 2017)
* conditional independence relations = graph connectivity * if all paths between nodes in set X and Y pass through an
21
A E D B C F
Lecture 21 Statistical Machine Learning (S2 2017)
* each conditional probability term is included in one factor function, Οc * clique structure links groups of variables, i.e., * normalisation term trivial, Z = 1
22
{{Xi} βͺ XΟi, βi}
P(X1, X2, . . . , Xk) =
k
Y
i=1
Pr(Xi|XΟi)
Lecture 21 Statistical Machine Learning (S2 2017)
23
CTL FG GRL FA AS CTL FG GRL FA AS
Lecture 21 Statistical Machine Learning (S2 2017)
* generalisation of D-PGM * simpler means of modelling without the need for per- factor normalisation * general inference algorithms use U-PGM representation (supporting both types of PGM)
* (slightly) weaker independence * calculating global normalisation term (Z) intractable in general (but tractable for chains/trees, e.g., CRFs)
24
Lecture 21 Statistical Machine Learning (S2 2017)
* marginal vs conditional independence * explaining away, Markov blanket * undirected PGMs & relation to directed PGMs
25