CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical - PowerPoint PPT Presentation

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1

Outline • Graphical Model • Bayesian Network • Conditional Independency • Naïve Bayes University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 2

Review: Probability Theory • Sum rule (marginal distributions) 𝑞 𝑦 = $ 𝑞(𝑦, 𝑧) % • Product rule 𝑞 𝑦, 𝑧 = 𝑞 𝑦 𝑧 𝑞 𝑧 From these we have Bayes’ theorem 𝑞 𝑧 𝑦 = 𝑞 𝑦 𝑧 𝑞 𝑧 𝑞 𝑦 The normalization factor 𝑞 𝑦 = * 𝑞 𝑦 𝑧 𝑞 𝑧 𝑒𝑧 University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 3

Graphical Models • Graphical Models (GMs) are depictions of independence/dependence relationships for distributions in a probabilistic model. The whole goal of graphical model is to show the conditional independent property of probability distributions. • GM is a framework for representing, reasoning with, and learning complex problem. • A graph comprises nodes connected by links (edges). • Each node corresponds to a random variable, 𝑌 , and has a value corresponding to the probability of the random variable, 𝑄(𝑌) . • If there is a directed edge from node 𝑌 to node 𝑍 , this indicates that 𝑌 has a direct influence on 𝑍 . This influence is specified by the conditional probability directed acyclic 𝑄(𝑍|𝑌) . • The graph captures the way in which the joint distribution over all of the random variables can be decomposed into a product of factors each depending only on a subset of the variables. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 4

Bayesian Networks Example: Tracey leaves her house and realises that her grass is wet Rain causes the grass to get wet: The random variables are binary; they are either true or false. • The probability of raining through a day 𝑄(𝑆) = 0.4 • The chance that grass gets wet when it rains 𝑄 𝑋 𝑆 = 0.9 • When it doesn’t rain enough to consider the grass is wet enough 𝑄(~𝑋|𝑆) = 0.1 • The probability of grass gets wet without raining, i.e. when a sprinkler is used. 𝑄(𝑋|~𝑆) = 0.2 • The probability of grass is not wet given that it doesn’t rain 𝑄(~𝑋|~𝑆) = 0.8 University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 5

Bayesian Networks Example: Rain causes the grass to get wet: • The joint distribution of 𝑄 𝑆, 𝑋 = 𝑄 𝑆 𝑄(𝑋|𝑆) • The individual (marginal) probability of wet grass can be computed by summing up over the possible values that its parent node can take 𝑄 𝑋 = $ 𝑄 𝑆, 𝑋 = 𝑄 𝑋|𝑆 𝑄 𝑆 + 𝑄 𝑋 ~𝑆 𝑄 ~𝑆 = 0.9×0.4 + 0.2×0.6 = 0.48 : • If we knew that it rained, the probability of wet grass would be 0.9 ; if we knew for sure that it did not, it would be as low as 0.2 ; not knowing whether it rained or not, the probability is 0.48 . University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 6

Bayesian Networks Example: Rain causes the grass to get wet: Bayes’ rule helps us to invert the dependencies and have a diagnosis. • If we know that the grass is wet, the probability that it rained can be calculated as follows: 𝑄 𝑆 𝑋 = 𝑄 𝑋 𝑆 𝑄(𝑆) = 0.75 𝑄(𝑋) • Knowing that the grass is wet increased the probability of rain from 0.4 to 0.75 . University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 7

Bayesian Network P(R) P(S) R=1 0.2 S=1 0.1 • 𝑆 ∈ 0,1 ( 𝑆 = 1 means that it has been raining, and 0 otherwise). • 𝑇 ∈ 0,1 ( 𝑇 = 1 means that Tracey has forgotten to turn off the sprinkler, and 0 otherwise). • 𝐾 ∈ 0,1 ( 𝐾 = 1 means that Jack's grass is wet, and 0 otherwise). • 𝑈 ∈ 0,1 ( 𝑈 = 1 means that Tracey's Grass is wet, and 0 otherwise). A model of Tracey's world corresponds to a probability distribution on the joint set of the variables of interest 𝑞(𝑈, 𝐾, 𝑆, 𝑇) . There are 2 D = 16 states. Using Bayes’ Rule, we have: 𝑄 𝑈, 𝐾, 𝑆, 𝑇 = 𝑄 𝑈 𝐾, 𝑆, 𝑇 𝑄 𝐾, 𝑆, 𝑇 = 𝑄 𝑈 𝐾, 𝑆, 𝑇 𝑄 𝐾|𝑆, 𝑇 𝑄 𝑆, 𝑇 = 𝑄 𝑈 𝐾, 𝑆, 𝑇 𝑄 𝐾|𝑆, 𝑇 𝑄 𝑆|𝑇)𝑄(𝑇 conditioned P(J|R) conditioned conditioned P(T|R,S) J=1 R=1 1 𝑄 𝑈 𝐾, 𝑆, 𝑇 = 𝑄 𝑈 𝑆, 𝑇 , 𝑄 𝐾 𝑆, 𝑇 = 𝑄 𝐾 𝑆 , 𝑄 𝑆 𝑇 = 𝑄 𝑆 T=1 R=1 S=1 0.9 𝑄(𝑈, 𝐾, 𝑆, 𝑇) = 𝑄(𝑈│𝑆, 𝑇) 𝑄(𝐾│𝑆) 𝑄(𝑆) 𝑄(𝑇) J=1 R=0 0.2 T=1 R=0 S=0 0 We need to specify to 4 + 2 + 1 + 1 = 8 values. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 8

Bayesian Network P(R) P(S) R=1 0.2 S=1 0.1 • What is the probability that the sprinkler was on overnight, given that Tracey's grass is wet ∑ N,: 𝑄(𝑈 = 1, 𝐾, 𝑆, 𝑇 = 1) 𝑄 𝑇 = 1|𝑈 = 1 = 𝑄(𝑇 = 1, 𝑈 = 1) = 𝑄(𝑈 = 1) ∑ N,:,O 𝑄(𝑈 = 1, 𝑇, 𝑆, 𝐾) ∑ N,: 𝑄 𝑈 = 1 𝑆, 𝑇 = 1 𝑄 𝐾|𝑆 𝑄 𝑆)𝑄(𝑇 = 1 = ∑ : 𝑄 𝑈 = 1 𝑆, 𝑇 = 1 𝑄 𝑆)𝑄(𝑇 = 1 = ∑ N,:,O 𝑄(𝑈 = 1│𝑆, 𝑇) 𝑄(𝐾│𝑆) 𝑄(𝑆) 𝑄(𝑇) ∑ :,O 𝑄(𝑈 = 1│𝑆, 𝑇) 𝑄(𝑆) 𝑄(𝑇) = 𝑄 𝑈 = 1 𝑆 = 1, 𝑇 = 1 𝑄 𝑆 = 1)𝑄(𝑇 = 1 + 𝑄 𝑈 = 1 𝑆 = 0, 𝑇 = 1 𝑄 𝑆 = 0)𝑄(𝑇 = 1 ∑ :,O 𝑄(𝑈 = 1│𝑆, 𝑇) 𝑄(𝑆) 𝑄(𝑇) 0.1(0.9×0.8 + 1×0.2) = 0.1 0.9×0.8 + 1×0.2 + 0.9(0.8×0 + 1×0.2) = 0.3382 The belief that the sprinkler is on increases above the prior probability 0.1, due to the fact that the grass is wet. • What is the probability that Tracey's sprinkler was on overnight, given that her grass is wet and that Jack's grass is also wet? conditioned P(J|R) conditioned conditioned P(T|R,S) = ∑ : 𝑄(𝑇 = 1, 𝐾 = 1, 𝑈 = 1, 𝑆) 𝑞 𝑇 = 1 𝑈 = 1, 𝐾 = 1 = 𝑄(𝑇 = 1, 𝐾 = 1, 𝑈 = 1) ∑ :,O 𝑄(𝑈 = 1, 𝐾 = 1, 𝑆, 𝑇) 𝑄(𝑈 = 1, 𝐾 = 1) J=1 R=1 1 T=1 R=0 S=1 0.9 ∑ : 𝑄 𝑈 = 1 𝑆, 𝑇 = 1 𝑄 𝐾 = 1|𝑆 𝑄 𝑆)𝑄(𝑇 = 1 1×1×0.2×0.1 + 0.9×0.2×0.8×0.1 = = J=1 R=0 0.2 T=1 R=0 S=0 0 0.8×0.2 0×0.9 + 0.9×1 + 0.2×1(1×0.9 + 1×0.1) ∑ :,O 𝑄(𝑈 = 1│𝑆, 𝑇) 𝑄(𝐾 = 1│𝑆) 𝑄(𝑆) 𝑄(𝑇) = 0.0344 T=1 R=1 S=1 1 0.2144 = 0.1604 T=1 R=1 S=0 1 The probability that the sprinkler is on, given the extra evidence that Jack's grass is wet, is lower than the probability that the grass is wet given only that Tracey's grass is wet. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 9

Bayesian Network Earthquake Burglary • If there is an arrow from node 𝐵 to another node 𝐶 , 𝐵 is called a parent of 𝐶 , and 𝐶 is a child of 𝐵 . In addition, the Radio Alarm parents of 𝐵 are the ancestors of 𝐶 . • The set of parent nodes of a node 𝑦 S is denoted by 𝑞𝑏𝑠𝑓𝑜𝑢 𝑡 Z [ . Call _ 𝑄 𝑦 \ , 𝑦 ] , … , 𝑦 _ = ` 𝑄(𝑦 S |𝑞𝑏𝑠𝑓𝑜𝑢 𝑡 Z [ ) Sa\ • If the alarm rings your neighbour may call you at work to let you know. When on your rush way home you hear a radio report of an earthquake, the degree of confidence (i.e. belief) that there was a burglary will diminish. 𝑄 𝐹, 𝐶, 𝑆, 𝐵, 𝐷 = 𝑄 𝐹 𝑄 𝐶 𝑄 𝑆 𝐹 𝑄 𝐵 𝐹, 𝐶 𝑄(𝐷|𝐵) University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 10

Bayesian Network Earthquake Burglary • The node call (child) is independent of burglary and earthquake (ancestors) given the node alarm (parent). The node call is the descendent of node Radio Alarm alarm and earthquake. • Given alarm , call is conditionally independent of Call burglary and earthquake . 𝑄(𝐵, 𝐹, 𝐶) Earthqua Burglary ke • Using conditional independence reduces the 𝑓 𝑐 𝑄(𝐵|𝑓, 𝑐) 𝑄(¬𝐵|𝑓, 𝑐) dimensionality of the network from full joint 𝑓 ¬𝑐 𝑄(𝐵|𝑓, ¬𝑐) 𝑄(¬𝐵|𝑓, ¬𝑐) probability table, 2 d − 1 = 31 to 1 + 1 + 4 + 2 + ¬𝑓 𝑐 𝑄(𝐵|¬𝑓, 𝑐) 𝑄(¬𝐵|¬𝑓, 𝑐) 2 = 10 parameters . ¬𝑓 ¬𝑐 𝑄(𝐵|¬𝑓, ¬𝑐) 𝑄(¬𝐵|¬𝑓, ¬𝑐) The conditional probability table for the node alarm, ¬ means “not”. University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 11

Independence 𝑄 𝑦, 𝑧 = 𝑄 𝑦 𝑄(𝑧) 𝑄 𝑦, 𝑧 = 𝑄 𝑦 𝑄(𝑧|𝑦) 𝑄 𝑦, 𝑧 = 𝑄 𝑧 𝑄(𝑦|𝑧) • Two sets of variables 𝐵 and 𝐶 are independent iff 𝑄(𝐵) = 𝑄(𝐵|𝐶) • or equivalently we can write 𝑄 𝐵, 𝐶 = 𝑄 𝐵 𝑄 𝐶 In this case we write 𝐵∐𝐶 . Let 𝐷 is the parent of two nodes 𝐵 and 𝐶 . Conditional Independence: Variable 𝐵 and 𝐶 are conditionally independent events given all states of variable 𝐷 if C 𝑄 𝐵, 𝐶 𝐷 = 𝑄 𝐵 𝐷 𝑄(𝐶|𝐷) This is written as 𝐵∐𝐶|𝐷 . 𝑄 𝐵, 𝐶 𝐷 = 𝑄(𝐵, 𝐶, 𝐷) = 𝑄 𝐷 𝑄 𝐵 𝐷 𝑄(𝐶|𝐷) A B = 𝑄 𝐵 𝐷 𝑄(𝐶|𝐷) 𝑄(𝐷) 𝑄(𝐷) Tail-to-tail connection University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 12

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical - PowerPoint PPT Presentation

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1 Outline Graphical Model Bayesian Network Conditional Independency

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Machine Learning Lecture 1: January 7 th , 2020 Course Introduction Zahra Sheikhbahaee

CS480/680 Machine Learning Lecture 3: January 14 th , 2020 Linear Regression Zahra Sheikhbahaee

CS480/680 Machine Learning Lecture 5: January 21 st , 2020 Information Theory Zahra Sheikhbahaee

CS480/680 Machine Learning Lecture 6: January 23 st , 2020 Maximum A posteriori & Maximum

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Machine Learning Lecture 12: February 13 th , 2020 Expectation-Maximization Zahra

CS480/680 Machine Learning Lecture 20: Convolutional Neural Network Zahra Sheikhbahaee March 29,

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

Graph Spectra & the N-intertwined Mean-field SIS approximation on Networks Piet Van Mieghem

What do you need to know about the Rate Response Mode (RRM)? Perioperative Electrophysiology: 1.

Graphical Models Aarti Singh Slides Courtesy: Carlos Guestrin Machine Learning 10-701/15-781

Biophysical Mechanisms of Cardiac Arrhythmias due to Fibrosis and Other Pathological Conditions

Draft Refining AI Analysis with CP Techniques or How to identifying suspicious values in

Matthieu Dien O. Bodini, X. Fontaine, A. Genitrini, H.-K. Hwang Universit Pierre et Marie Curie

Introducing a new nonlinear expressions framework for SCIP Benjamin M uller , Felipe Serrano ,

Pitfalls in Using a case based approach, we will Arrhythmias review pitfalls in management of:

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical - PowerPoint PPT Presentation

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1 Outline Graphical Model Bayesian Network Conditional Independency

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Machine Learning Lecture 1: January 7 th , 2020 Course Introduction Zahra Sheikhbahaee

CS480/680 Machine Learning Lecture 3: January 14 th , 2020 Linear Regression Zahra Sheikhbahaee

CS480/680 Machine Learning Lecture 5: January 21 st , 2020 Information Theory Zahra Sheikhbahaee

CS480/680 Machine Learning Lecture 6: January 23 st , 2020 Maximum A posteriori &amp; Maximum

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Machine Learning Lecture 12: February 13 th , 2020 Expectation-Maximization Zahra

CS480/680 Machine Learning Lecture 20: Convolutional Neural Network Zahra Sheikhbahaee March 29,

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

Graph Spectra &amp; the N-intertwined Mean-field SIS approximation on Networks Piet Van Mieghem

What do you need to know about the Rate Response Mode (RRM)? Perioperative Electrophysiology: 1.

Graphical Models Aarti Singh Slides Courtesy: Carlos Guestrin Machine Learning 10-701/15-781

Biophysical Mechanisms of Cardiac Arrhythmias due to Fibrosis and Other Pathological Conditions

Draft Refining AI Analysis with CP Techniques or How to identifying suspicious values in

Matthieu Dien O. Bodini, X. Fontaine, A. Genitrini, H.-K. Hwang Universit Pierre et Marie Curie

Introducing a new nonlinear expressions framework for SCIP Benjamin M uller , Felipe Serrano ,

Pitfalls in Using a case based approach, we will Arrhythmias review pitfalls in management of:

CS480/680 Machine Learning Lecture 6: January 23 st , 2020 Maximum A posteriori & Maximum

Graph Spectra & the N-intertwined Mean-field SIS approximation on Networks Piet Van Mieghem