CS480/680 Winter 2020 Zahra Sheikhbahaee
CS480/680 Machine Learning Lecture 8: January 30th, 2020
Graphical Models Zahra Sheikhbahaee
University of Waterloo
1
CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical - - PowerPoint PPT Presentation
CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1 Outline Graphical Model Bayesian Network Conditional Independency
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
1
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
2
CS480/680 Winter 2020 Zahra Sheikhbahaee
π π¦ = $
%
π(π¦, π§)
π π¦, π§ = π π¦ π§ π π§ From these we have Bayesβ theorem π π§ π¦ = π π¦ π§ π π§ π π¦ The normalization factor π π¦ = * π π¦ π§ π π§ ππ§
University of Waterloo
3
CS480/680 Winter 2020 Zahra Sheikhbahaee
distributions in a probabilistic model. The whole goal of graphical model is to show the conditional independent property of probability distributions.
probability of the random variable, π(π).
influence on π. This influence is specified by the conditional probability directed acyclic π(π|π).
variables can be decomposed into a product of factors each depending only on a subset of the variables.
University of Waterloo
4
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
5
Example: Tracey leaves her house and realises that her grass is wet Rain causes the grass to get wet: The random variables are binary; they are either true or false.
π(~π|π) = 0.1
sprinkler is used. π(π|~π) = 0.2
π(~π|~π) = 0.8
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
6
Example: Rain causes the grass to get wet:
π π = $
:
π π, π = π π|π π π + π π ~π π ~π = 0.9Γ0.4 + 0.2Γ0.6 = 0.48
that it did not, it would be as low as 0.2; not knowing whether it rained or not, the probability is 0.48.
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
7
Bayesβ rule helps us to invert the dependencies and have a diagnosis.
CS480/680 Winter 2020 Zahra Sheikhbahaee
(π = 1 means that it has been raining, and 0 otherwise).
(π = 1 means that Tracey has forgotten to turn off the sprinkler, and 0
(πΎ = 1 means that Jack's grass is wet, and 0 otherwise).
(π = 1 means that Tracey's Grass is wet, and 0 otherwise). A model of Tracey's world corresponds to a probability distribution on the joint set of the variables of interest π(π, πΎ, π, π). There are 2D = 16 states. Using Bayesβ Rule, we have: π π, πΎ, π, π = π π πΎ, π, π π πΎ, π, π = π π πΎ, π, π π πΎ|π, π π π, π = π π πΎ, π, π π πΎ|π, π π π|π)π(π π π πΎ, π, π = π π π, π , π πΎ π, π = π πΎ π , π π π = π π π(π, πΎ, π, π) = π(πβπ, π) π(πΎβπ) π(π) π(π) We need to specify to 4 + 2 + 1 + 1 = 8 values.
University of Waterloo
8
P(R) R=1 0.2 P(S) S=1 0.1
conditioned
P(J|R) J=1 R=1 1 J=1 R=0 0.2
conditioned conditioned P(T|R,S)
T=1 R=1 S=1 0.9 T=1 R=0 S=0
CS480/680 Winter 2020 Zahra Sheikhbahaee
π π = 1|π = 1 = π(π = 1, π = 1) π(π = 1) = βN,: π(π = 1, πΎ, π, π = 1) βN,:,O π(π = 1, π, π, πΎ) = βN,: π π = 1 π, π = 1 π πΎ|π π π)π(π = 1 βN,:,O π(π = 1βπ, π) π(πΎβπ) π(π) π(π) = β: π π = 1 π, π = 1 π π)π(π = 1 β:,O π(π = 1βπ, π) π(π) π(π) = π π = 1 π = 1, π = 1 π π = 1)π(π = 1 + π π = 1 π = 0, π = 1 π π = 0)π(π = 1 β:,O π(π = 1βπ, π) π(π) π(π) = 0.1(0.9Γ0.8 + 1Γ0.2) 0.1 0.9Γ0.8 + 1Γ0.2 + 0.9(0.8Γ0 + 1Γ0.2) = 0.3382 The belief that the sprinkler is on increases above the prior probability 0.1, due to the fact that the grass is wet.
and that Jack's grass is also wet? π π = 1 π = 1, πΎ = 1 = π(π = 1, πΎ = 1, π = 1) π(π = 1, πΎ = 1) = β: π(π = 1, πΎ = 1, π = 1, π) β:,O π(π = 1, πΎ = 1, π, π) = β: π π = 1 π, π = 1 π πΎ = 1|π π π)π(π = 1 β:,O π(π = 1βπ, π) π(πΎ = 1βπ) π(π) π(π) = 1Γ1Γ0.2Γ0.1 + 0.9Γ0.2Γ0.8Γ0.1 0.8Γ0.2 0Γ0.9 + 0.9Γ1 + 0.2Γ1(1Γ0.9 + 1Γ0.1) = 0.0344 0.2144 = 0.1604 The probability that the sprinkler is on, given the extra evidence that Jack's grass is wet, is lower than the probability that the grass is wet given only that Tracey's grass is wet.
University of Waterloo
9
P(R) R=1 0.2 P(S) S=1 0.1
conditioned
P(J|R) J=1 R=1 1 J=1 R=0 0.2
conditioned conditioned P(T|R,S)
T=1 R=0 S=1 0.9 T=1 R=0 S=0 T=1 R=1 S=1 1 T=1 R=1 S=0 1
CS480/680 Winter 2020 Zahra Sheikhbahaee
called a parent of πΆ , and πΆ is a child of π΅ . In addition, the parents of π΅ are the ancestors of πΆ.
πππ πππ’ π‘Z[ . π π¦\, π¦], β¦ , π¦_ = `
Sa\ _
π(π¦S|πππ πππ’ π‘Z[)
you know. When on your rush way home you hear a radio report of an earthquake, the degree of confidence (i.e. belief) that there was a burglary will diminish. π πΉ, πΆ, π, π΅, π· = π πΉ π πΆ π π πΉ π π΅ πΉ, πΆ π(π·|π΅)
University of Waterloo
10
Earthquake Burglary
Radio Alarm Call
CS480/680 Winter 2020 Zahra Sheikhbahaee
earthquake (ancestors) given the node alarm (parent). The node call is the descendent of node alarm and earthquake.
burglary and earthquake.
dimensionality of the network from full joint probability table, 2d β 1 = 31 to 1 + 1 + 4 + 2 + 2 = 10 parameters.
University of Waterloo
11
Earthquake Burglary
Radio Alarm Call
Earthqua ke Burglary
π(π΅, πΉ, πΆ) π π π(π΅|π, π) π(Β¬π΅|π, π) π Β¬π π(π΅|π, Β¬π) π(Β¬π΅|π, Β¬π) Β¬π π π(π΅|Β¬π, π) π(Β¬π΅|Β¬π, π) Β¬π Β¬π π(π΅|Β¬π, Β¬π) π(Β¬π΅|Β¬π, Β¬π)
The conditional probability table for the node alarm, Β¬ means βnotβ.
CS480/680 Winter 2020 Zahra Sheikhbahaee
π(π΅) = π(π΅|πΆ)
π π΅, πΆ = π π΅ π πΆ In this case we write π΅βπΆ. Let π· is the parent of two nodes π΅ and πΆ. Conditional Independence: Variable π΅ and πΆ are conditionally independent events given all states of variable π· if π π΅, πΆ π· = π π΅ π· π(πΆ|π·) This is written as π΅βπΆ|π·.
π π΅, πΆ π· = π(π΅, πΆ, π·) π(π·) = π π· π π΅ π· π(πΆ|π·) π(π·) = π π΅ π· π(πΆ|π·)
University of Waterloo
12
A B C
Tail-to-tail connection
π π¦, π§ = π π¦ π(π§|π¦) π π¦, π§ = π π§ π(π¦|π§) π π¦, π§ = π π¦ π(π§)
CS480/680 Winter 2020 Zahra Sheikhbahaee
π π· π = π π π· π π· π π = π π π· π π· βi π π, π· = π π π· π π· π π π· π π· + π π Β¬π· π Β¬π· = 0.8Γ0.5 0.8Γ0.5 + 0.1Γ0.5 = 0.89 Knowing that it rained increased the probability that the weather is cloudy.
π π π = $
i
π π, π· π = π π π· π π· π + π π Β¬π· π Β¬π· π = π π π· π π π· π π· π π + π π Β¬π· π π Β¬π· π Β¬π· π Β¬π = 0.22
different states of cloudy weather.
become conditionally independent.
University of Waterloo
13
A B C
Tail-to-tail connection
CS480/680 Winter 2020 Zahra Sheikhbahaee
We see here that π΅ and π· are independent given πΆ: Knowing πΆ tells π· everything; knowing the state of π΅ does not add any extra knowledge for π·.
π(π΅, πΆ, π·) = π π΅)π(πΆ π΅ π(π·|πΆ)
Writing the joint this way implies independence:
π π· π΅, πΆ = π(π΅, πΆ, π·) π(π΅, πΆ) = π π΅)π(πΆ π΅ π(π·|πΆ) π π΅ π(πΆ|π΅) = π(π·|πΆ) If πΆ is unobserved, then such a path connects π΅ and π· and renders them dependent. But if πΆ is observed, then this observation blocks the path, and π΅ and π· become conditionally independent.
University of Waterloo
14
A B C
Head-to-tail connection
CS480/680 Winter 2020 Zahra Sheikhbahaee
π π = π π π· π π· + π(π Β¬π· π Β¬π· = 0.38 π π = π π π π π + π π Β¬π π Β¬π· = 0.47 What is the probability of grass being wet given the weather is cloudy? π(π|π·) = π(π|π)π(π|π·) + π(π|Β¬π)π(Β¬π|π·) = 0.76
weather was cloudy that day? University of Waterloo
15
A B C
Head-to-tail connection
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
16
A B C
Head-to-head connection
CS480/680 Winter 2020 Zahra Sheikhbahaee
joint π π = $
:,O
π π, π, π = π π π, π π π, π + π π Β¬π, π π Β¬π, π + π π π, Β¬π π π, Β¬π + π π Β¬π, Β¬π π Β¬π, Β¬π = π π π, π π π)π(π + π π Β¬π, π π Β¬π)π(π + π π π, Β¬π π π)π(Β¬π + π π Β¬π, Β¬π π Β¬π)π(Β¬π = 0.52 If we know that the sprinkler is on, we can check how it will affect the probability of grass being wet π π π = β: π π, π π = π π π, π π π π + π π Β¬π, π π Β¬π π = π π π, π π π + π π Β¬π, π π Β¬π = 0.92 Now π π π > π(π).
wet?
University of Waterloo
17
A B C
Head-to-head connection
CS480/680 Winter 2020 Zahra Sheikhbahaee
graph, whether a set X of variables is independent of another set Y, given a third set Z.
blocked w.r.t. π· if it passes through a node π€ such that either a) The arrows are head-tail or tail-tail and π€ β π·. b) The arrows are head-head and π€ β π· and non of the descendants of π€ are in π·.
vertex π΅ to a vertex πΆ are blocked w.r.t π·.
University of Waterloo
18 Tail-tail connection Head-tail connection Head-head connection
CS480/680 Winter 2020 Zahra Sheikhbahaee
two subgraphs π π π· = β:,O π π, π, π π· = π π, π, π π· + π π, Β¬π, π π· + π π, π, Β¬π π· + π π, Β¬π, Β¬π π· = π π π, π, π· π π, π π· + π π Β¬π, π, π· π Β¬π, π π· + π π π, Β¬π, π· π π, Β¬π π· + π π Β¬π, Β¬π, π· π Β¬π, Β¬π π· = π π π, π π π π· π π π· + π π Β¬π, π π Β¬π π· π π π· + π π π, Β¬π π π π· π Β¬π π· + π π Β¬π, Β¬π π Β¬π π· π Β¬π π·
University of Waterloo
19
CS480/680 Winter 2020 Zahra Sheikhbahaee
University of Waterloo
20
π(πΊ) π(π΅) π(π|π΅, πΊ) π(π|π) π(πΌ|π)
CS480/680 Winter 2020 Zahra Sheikhbahaee
Sa\ _
University of Waterloo
21
Graphical models are ways to represent conditional independency statements