CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical - - PowerPoint PPT Presentation

β–Ά
cs480 680 machine learning lecture 8 january 30 th 2020
SMART_READER_LITE
LIVE PREVIEW

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical - - PowerPoint PPT Presentation

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee University of Waterloo CS480/680 Winter 2020 Zahra Sheikhbahaee 1 Outline Graphical Model Bayesian Network Conditional Independency


slide-1
SLIDE 1

CS480/680 Winter 2020 Zahra Sheikhbahaee

CS480/680 Machine Learning Lecture 8: January 30th, 2020

Graphical Models Zahra Sheikhbahaee

University of Waterloo

1

slide-2
SLIDE 2

CS480/680 Winter 2020 Zahra Sheikhbahaee

Outline

  • Graphical Model
  • Bayesian Network
  • Conditional Independency
  • NaΓ―ve Bayes

University of Waterloo

2

slide-3
SLIDE 3

CS480/680 Winter 2020 Zahra Sheikhbahaee

Review: Probability Theory

  • Sum rule (marginal distributions)

π‘ž 𝑦 = $

%

π‘ž(𝑦, 𝑧)

  • Product rule

π‘ž 𝑦, 𝑧 = π‘ž 𝑦 𝑧 π‘ž 𝑧 From these we have Bayes’ theorem π‘ž 𝑧 𝑦 = π‘ž 𝑦 𝑧 π‘ž 𝑧 π‘ž 𝑦 The normalization factor π‘ž 𝑦 = * π‘ž 𝑦 𝑧 π‘ž 𝑧 𝑒𝑧

University of Waterloo

3

slide-4
SLIDE 4

CS480/680 Winter 2020 Zahra Sheikhbahaee

Graphical Models

  • Graphical Models (GMs) are depictions of independence/dependence relationships for

distributions in a probabilistic model. The whole goal of graphical model is to show the conditional independent property of probability distributions.

  • GM is a framework for representing, reasoning with, and learning complex problem.
  • A graph comprises nodes connected by links (edges).
  • Each node corresponds to a random variable, π‘Œ, and has a value corresponding to the

probability of the random variable, 𝑄(π‘Œ).

  • If there is a directed edge from node π‘Œ to node 𝑍, this indicates that π‘Œ has a direct

influence on 𝑍. This influence is specified by the conditional probability directed acyclic 𝑄(𝑍|π‘Œ).

  • The graph captures the way in which the joint distribution over all of the random

variables can be decomposed into a product of factors each depending only on a subset of the variables.

University of Waterloo

4

slide-5
SLIDE 5

CS480/680 Winter 2020 Zahra Sheikhbahaee

Bayesian Networks

University of Waterloo

5

Example: Tracey leaves her house and realises that her grass is wet Rain causes the grass to get wet: The random variables are binary; they are either true or false.

  • The probability of raining through a day 𝑄(𝑆) = 0.4
  • The chance that grass gets wet when it rains 𝑄 𝑋 𝑆 = 0.9
  • When it doesn’t rain enough to consider the grass is wet enough

𝑄(~𝑋|𝑆) = 0.1

  • The probability of grass gets wet without raining, i.e. when a

sprinkler is used. 𝑄(𝑋|~𝑆) = 0.2

  • The probability of grass is not wet given that it doesn’t rain

𝑄(~𝑋|~𝑆) = 0.8

slide-6
SLIDE 6

CS480/680 Winter 2020 Zahra Sheikhbahaee

Bayesian Networks

University of Waterloo

6

Example: Rain causes the grass to get wet:

  • The joint distribution of 𝑄 𝑆, 𝑋 = 𝑄 𝑆 𝑄(𝑋|𝑆)
  • The individual (marginal) probability of wet grass can be computed by summing up
  • ver the possible values that its parent node can take

𝑄 𝑋 = $

:

𝑄 𝑆, 𝑋 = 𝑄 𝑋|𝑆 𝑄 𝑆 + 𝑄 𝑋 ~𝑆 𝑄 ~𝑆 = 0.9Γ—0.4 + 0.2Γ—0.6 = 0.48

  • If we knew that it rained, the probability of wet grass would be 0.9; if we knew for sure

that it did not, it would be as low as 0.2; not knowing whether it rained or not, the probability is 0.48.

slide-7
SLIDE 7

CS480/680 Winter 2020 Zahra Sheikhbahaee

Bayesian Networks

University of Waterloo

7

Example: Rain causes the grass to get wet:

Bayes’ rule helps us to invert the dependencies and have a diagnosis.

  • If we know that the grass is wet, the probability that it

rained can be calculated as follows: 𝑄 𝑆 𝑋 = 𝑄 𝑋 𝑆 𝑄(𝑆) 𝑄(𝑋) = 0.75

  • Knowing that the grass is wet increased the probability
  • f rain from 0.4 to 0.75.
slide-8
SLIDE 8

CS480/680 Winter 2020 Zahra Sheikhbahaee

Bayesian Network

  • 𝑆 ∈ 0,1

(𝑆 = 1 means that it has been raining, and 0 otherwise).

  • 𝑇 ∈ 0,1

(𝑇 = 1 means that Tracey has forgotten to turn off the sprinkler, and 0

  • therwise).
  • 𝐾 ∈ 0,1

(𝐾 = 1 means that Jack's grass is wet, and 0 otherwise).

  • π‘ˆ ∈ 0,1

(π‘ˆ = 1 means that Tracey's Grass is wet, and 0 otherwise). A model of Tracey's world corresponds to a probability distribution on the joint set of the variables of interest π‘ž(π‘ˆ, 𝐾, 𝑆, 𝑇). There are 2D = 16 states. Using Bayes’ Rule, we have: 𝑄 π‘ˆ, 𝐾, 𝑆, 𝑇 = 𝑄 π‘ˆ 𝐾, 𝑆, 𝑇 𝑄 𝐾, 𝑆, 𝑇 = 𝑄 π‘ˆ 𝐾, 𝑆, 𝑇 𝑄 𝐾|𝑆, 𝑇 𝑄 𝑆, 𝑇 = 𝑄 π‘ˆ 𝐾, 𝑆, 𝑇 𝑄 𝐾|𝑆, 𝑇 𝑄 𝑆|𝑇)𝑄(𝑇 𝑄 π‘ˆ 𝐾, 𝑆, 𝑇 = 𝑄 π‘ˆ 𝑆, 𝑇 , 𝑄 𝐾 𝑆, 𝑇 = 𝑄 𝐾 𝑆 , 𝑄 𝑆 𝑇 = 𝑄 𝑆 𝑄(π‘ˆ, 𝐾, 𝑆, 𝑇) = 𝑄(π‘ˆβ”‚π‘†, 𝑇) 𝑄(𝐾│𝑆) 𝑄(𝑆) 𝑄(𝑇) We need to specify to 4 + 2 + 1 + 1 = 8 values.

University of Waterloo

8

P(R) R=1 0.2 P(S) S=1 0.1

conditioned

P(J|R) J=1 R=1 1 J=1 R=0 0.2

conditioned conditioned P(T|R,S)

T=1 R=1 S=1 0.9 T=1 R=0 S=0

slide-9
SLIDE 9

CS480/680 Winter 2020 Zahra Sheikhbahaee

Bayesian Network

  • What is the probability that the sprinkler was on overnight, given that Tracey's grass is wet

𝑄 𝑇 = 1|π‘ˆ = 1 = 𝑄(𝑇 = 1, π‘ˆ = 1) 𝑄(π‘ˆ = 1) = βˆ‘N,: 𝑄(π‘ˆ = 1, 𝐾, 𝑆, 𝑇 = 1) βˆ‘N,:,O 𝑄(π‘ˆ = 1, 𝑇, 𝑆, 𝐾) = βˆ‘N,: 𝑄 π‘ˆ = 1 𝑆, 𝑇 = 1 𝑄 𝐾|𝑆 𝑄 𝑆)𝑄(𝑇 = 1 βˆ‘N,:,O 𝑄(π‘ˆ = 1│𝑆, 𝑇) 𝑄(𝐾│𝑆) 𝑄(𝑆) 𝑄(𝑇) = βˆ‘: 𝑄 π‘ˆ = 1 𝑆, 𝑇 = 1 𝑄 𝑆)𝑄(𝑇 = 1 βˆ‘:,O 𝑄(π‘ˆ = 1│𝑆, 𝑇) 𝑄(𝑆) 𝑄(𝑇) = 𝑄 π‘ˆ = 1 𝑆 = 1, 𝑇 = 1 𝑄 𝑆 = 1)𝑄(𝑇 = 1 + 𝑄 π‘ˆ = 1 𝑆 = 0, 𝑇 = 1 𝑄 𝑆 = 0)𝑄(𝑇 = 1 βˆ‘:,O 𝑄(π‘ˆ = 1│𝑆, 𝑇) 𝑄(𝑆) 𝑄(𝑇) = 0.1(0.9Γ—0.8 + 1Γ—0.2) 0.1 0.9Γ—0.8 + 1Γ—0.2 + 0.9(0.8Γ—0 + 1Γ—0.2) = 0.3382 The belief that the sprinkler is on increases above the prior probability 0.1, due to the fact that the grass is wet.

  • What is the probability that Tracey's sprinkler was on overnight, given that her grass is wet

and that Jack's grass is also wet? π‘ž 𝑇 = 1 π‘ˆ = 1, 𝐾 = 1 = 𝑄(𝑇 = 1, 𝐾 = 1, π‘ˆ = 1) 𝑄(π‘ˆ = 1, 𝐾 = 1) = βˆ‘: 𝑄(𝑇 = 1, 𝐾 = 1, π‘ˆ = 1, 𝑆) βˆ‘:,O 𝑄(π‘ˆ = 1, 𝐾 = 1, 𝑆, 𝑇) = βˆ‘: 𝑄 π‘ˆ = 1 𝑆, 𝑇 = 1 𝑄 𝐾 = 1|𝑆 𝑄 𝑆)𝑄(𝑇 = 1 βˆ‘:,O 𝑄(π‘ˆ = 1│𝑆, 𝑇) 𝑄(𝐾 = 1│𝑆) 𝑄(𝑆) 𝑄(𝑇) = 1Γ—1Γ—0.2Γ—0.1 + 0.9Γ—0.2Γ—0.8Γ—0.1 0.8Γ—0.2 0Γ—0.9 + 0.9Γ—1 + 0.2Γ—1(1Γ—0.9 + 1Γ—0.1) = 0.0344 0.2144 = 0.1604 The probability that the sprinkler is on, given the extra evidence that Jack's grass is wet, is lower than the probability that the grass is wet given only that Tracey's grass is wet.

University of Waterloo

9

P(R) R=1 0.2 P(S) S=1 0.1

conditioned

P(J|R) J=1 R=1 1 J=1 R=0 0.2

conditioned conditioned P(T|R,S)

T=1 R=0 S=1 0.9 T=1 R=0 S=0 T=1 R=1 S=1 1 T=1 R=1 S=0 1

slide-10
SLIDE 10

CS480/680 Winter 2020 Zahra Sheikhbahaee

Bayesian Network

  • If there is an arrow from node 𝐡 to another node 𝐢 , 𝐡 is

called a parent of 𝐢 , and 𝐢 is a child of 𝐡 . In addition, the parents of 𝐡 are the ancestors of 𝐢.

  • The set of parent nodes of a node 𝑦S is denoted by

π‘žπ‘π‘ π‘“π‘œπ‘’ 𝑑Z[ . 𝑄 𝑦\, 𝑦], … , 𝑦_ = `

Sa\ _

𝑄(𝑦S|π‘žπ‘π‘ π‘“π‘œπ‘’ 𝑑Z[)

  • If the alarm rings your neighbour may call you at work to let

you know. When on your rush way home you hear a radio report of an earthquake, the degree of confidence (i.e. belief) that there was a burglary will diminish. 𝑄 𝐹, 𝐢, 𝑆, 𝐡, 𝐷 = 𝑄 𝐹 𝑄 𝐢 𝑄 𝑆 𝐹 𝑄 𝐡 𝐹, 𝐢 𝑄(𝐷|𝐡)

University of Waterloo

10

Earthquake Burglary

Radio Alarm Call

slide-11
SLIDE 11

CS480/680 Winter 2020 Zahra Sheikhbahaee

Bayesian Network

  • The node call (child) is independent of burglary and

earthquake (ancestors) given the node alarm (parent). The node call is the descendent of node alarm and earthquake.

  • Given alarm, call is conditionally independent of

burglary and earthquake.

  • Using conditional independence reduces the

dimensionality of the network from full joint probability table, 2d βˆ’ 1 = 31 to 1 + 1 + 4 + 2 + 2 = 10 parameters.

University of Waterloo

11

Earthquake Burglary

Radio Alarm Call

Earthqua ke Burglary

𝑄(𝐡, 𝐹, 𝐢) 𝑓 𝑐 𝑄(𝐡|𝑓, 𝑐) 𝑄(¬𝐡|𝑓, 𝑐) 𝑓 ¬𝑐 𝑄(𝐡|𝑓, ¬𝑐) 𝑄(¬𝐡|𝑓, ¬𝑐) ¬𝑓 𝑐 𝑄(𝐡|¬𝑓, 𝑐) 𝑄(¬𝐡|¬𝑓, 𝑐) ¬𝑓 ¬𝑐 𝑄(𝐡|¬𝑓, ¬𝑐) 𝑄(¬𝐡|¬𝑓, ¬𝑐)

The conditional probability table for the node alarm, Β¬ means β€œnot”.

slide-12
SLIDE 12

CS480/680 Winter 2020 Zahra Sheikhbahaee

Independence

  • Two sets of variables 𝐡 and 𝐢 are independent iff

𝑄(𝐡) = 𝑄(𝐡|𝐢)

  • or equivalently we can write

𝑄 𝐡, 𝐢 = 𝑄 𝐡 𝑄 𝐢 In this case we write 𝐡∐𝐢. Let 𝐷 is the parent of two nodes 𝐡 and 𝐢. Conditional Independence: Variable 𝐡 and 𝐢 are conditionally independent events given all states of variable 𝐷 if 𝑄 𝐡, 𝐢 𝐷 = 𝑄 𝐡 𝐷 𝑄(𝐢|𝐷) This is written as 𝐡∐𝐢|𝐷.

𝑄 𝐡, 𝐢 𝐷 = 𝑄(𝐡, 𝐢, 𝐷) 𝑄(𝐷) = 𝑄 𝐷 𝑄 𝐡 𝐷 𝑄(𝐢|𝐷) 𝑄(𝐷) = 𝑄 𝐡 𝐷 𝑄(𝐢|𝐷)

University of Waterloo

12

A B C

Tail-to-tail connection

𝑄 𝑦, 𝑧 = 𝑄 𝑦 𝑄(𝑧|𝑦) 𝑄 𝑦, 𝑧 = 𝑄 𝑧 𝑄(𝑦|𝑧) 𝑄 𝑦, 𝑧 = 𝑄 𝑦 𝑄(𝑧)

slide-13
SLIDE 13

CS480/680 Winter 2020 Zahra Sheikhbahaee

Independence

  • Knowing that it rained, we can invert the dependency and infer the cause:

𝑄 𝐷 𝑆 = 𝑄 𝑆 𝐷 𝑄 𝐷 𝑄 𝑆 = 𝑄 𝑆 𝐷 𝑄 𝐷 βˆ‘i 𝑄 𝑆, 𝐷 = 𝑄 𝑆 𝐷 𝑄 𝐷 𝑄 𝑆 𝐷 𝑄 𝐷 + 𝑄 𝑆 ¬𝐷 𝑄 ¬𝐷 = 0.8Γ—0.5 0.8Γ—0.5 + 0.1Γ—0.5 = 0.89 Knowing that it rained increased the probability that the weather is cloudy.

  • If we know that the sprinkler is on

𝑄 𝑆 𝑇 = $

i

𝑄 𝑆, 𝐷 𝑇 = 𝑄 𝑆 𝐷 𝑄 𝐷 𝑇 + 𝑄 𝑆 ¬𝐷 𝑄 ¬𝐷 𝑇 = 𝑄 𝑆 𝐷 𝑄 𝑇 𝐷 𝑄 𝐷 𝑄 𝑇 + 𝑄 𝑆 ¬𝐷 𝑄 𝑇 ¬𝐷 𝑄 ¬𝐷 𝑄 ¬𝑇 = 0.22

  • Knowing that the sprinkler is on decreases the probability that it rained because sprinkler and rain happens for

different states of cloudy weather.

  • When 𝐷 is unobserved, the presence of this path causes 𝐡 and 𝐢 to be dependent. However, when 𝐷 is
  • bserved and we condition on 𝐷, then the conditioned node blocks the path from 𝐡 to 𝐢 and causes 𝐡 and 𝐢 to

become conditionally independent.

University of Waterloo

13

A B C

Tail-to-tail connection

slide-14
SLIDE 14

CS480/680 Winter 2020 Zahra Sheikhbahaee

Independence

  • Conditional Independence: Three events may be connected serially.

We see here that 𝐡 and 𝐷 are independent given 𝐢: Knowing 𝐢 tells 𝐷 everything; knowing the state of 𝐡 does not add any extra knowledge for 𝐷.

𝑄(𝐡, 𝐢, 𝐷) = 𝑄 𝐡)𝑄(𝐢 𝐡 𝑄(𝐷|𝐢)

Writing the joint this way implies independence:

𝑄 𝐷 𝐡, 𝐢 = 𝑄(𝐡, 𝐢, 𝐷) 𝑄(𝐡, 𝐢) = 𝑄 𝐡)𝑄(𝐢 𝐡 𝑄(𝐷|𝐢) 𝑄 𝐡 𝑄(𝐢|𝐡) = 𝑄(𝐷|𝐢) If 𝐢 is unobserved, then such a path connects 𝐡 and 𝐷 and renders them dependent. But if 𝐢 is observed, then this observation blocks the path, and 𝐡 and 𝐷 become conditionally independent.

University of Waterloo

14

A B C

Head-to-tail connection

slide-15
SLIDE 15

CS480/680 Winter 2020 Zahra Sheikhbahaee

Independence

  • We can propagate information along the chain. If we do not know the state of cloudy

𝑄 𝑆 = 𝑄 𝑆 𝐷 𝑄 𝐷 + 𝑄(𝑆 ¬𝐷 𝑄 ¬𝐷 = 0.38 𝑄 𝑋 = 𝑄 𝑋 𝑆 𝑄 𝑆 + 𝑄 𝑋 ¬𝑆 𝑄 ¬𝐷 = 0.47 What is the probability of grass being wet given the weather is cloudy? 𝑄(𝑋|𝐷) = 𝑄(𝑋|𝑆)𝑄(𝑆|𝐷) + 𝑄(𝑋|¬𝑆)𝑄(¬𝑆|𝐷) = 0.76

  • Knowing that the weather is cloudy increased the probability of wet grass.
  • We were traveling and on our return, see that our grass is wet; what is the probability that the

weather was cloudy that day? University of Waterloo

15

A B C

Head-to-tail connection

slide-16
SLIDE 16

CS480/680 Winter 2020 Zahra Sheikhbahaee

Independence

  • there are two parents 𝐡 and 𝐢 to a single node 𝐷. The joint

density is written as 𝑄 𝐡, 𝐢, 𝐷 = 𝑄 𝐡 𝑄 𝐢 𝑄(𝐷|𝐡, 𝐢) Here 𝐡 and 𝐢 are independent 𝑄 𝐡, 𝐢 = 𝑄 𝐡 𝑄 𝐢 , however they become dependent when 𝐷 is known.

  • The path between 𝐡 and 𝐢 is blocked, or they are separated,

when 𝐷 is not observed; when 𝐷 is observed, they are not blocked, separated, nor are independent.

University of Waterloo

16

A B C

Head-to-head connection

slide-17
SLIDE 17

CS480/680 Winter 2020 Zahra Sheikhbahaee

Independence

  • The probability of grass being wet is calculated by marginalizing over the

joint 𝑄 𝑋 = $

:,O

𝑄 𝑋, 𝑆, 𝑇 = 𝑄 𝑋 𝑆, 𝑇 𝑄 𝑆, 𝑇 + 𝑄 𝑋 ¬𝑆, 𝑇 𝑄 ¬𝑆, 𝑇 + 𝑄 𝑋 𝑆, ¬𝑇 𝑄 𝑆, ¬𝑇 + 𝑄 𝑋 ¬𝑆, ¬𝑇 𝑄 ¬𝑆, ¬𝑇 = 𝑄 𝑋 𝑆, 𝑇 𝑄 𝑆)𝑄(𝑇 + 𝑄 𝑋 ¬𝑆, 𝑇 𝑄 ¬𝑆)𝑄(𝑇 + 𝑄 𝑋 𝑆, ¬𝑇 𝑄 𝑆)𝑄(¬𝑇 + 𝑄 𝑋 ¬𝑆, ¬𝑇 𝑄 ¬𝑆)𝑄(¬𝑇 = 0.52 If we know that the sprinkler is on, we can check how it will affect the probability of grass being wet 𝑄 𝑋 𝑇 = βˆ‘: 𝑄 𝑋, 𝑆 𝑇 = 𝑄 𝑋 𝑆, 𝑇 𝑄 𝑆 𝑇 + 𝑄 𝑋 ¬𝑆, 𝑇 𝑄 ¬𝑆 𝑇 = 𝑄 𝑋 𝑆, 𝑇 𝑄 𝑆 + 𝑄 𝑋 ¬𝑆, 𝑇 𝑄 ¬𝑆 = 0.92 Now 𝑄 𝑋 𝑇 > 𝑄(𝑋).

  • Now what is the probability that the sprinkler is on, given that the grass is

wet?

University of Waterloo

17

A B C

Head-to-head connection

slide-18
SLIDE 18

CS480/680 Winter 2020 Zahra Sheikhbahaee

Independence

  • d-separation is a criterion for deciding, from a given a causal

graph, whether a set X of variables is independent of another set Y, given a third set Z.

  • Definition: An undirected path between two vertices is

blocked w.r.t. 𝐷 if it passes through a node 𝑀 such that either a) The arrows are head-tail or tail-tail and 𝑀 ∈ 𝐷. b) The arrows are head-head and 𝑀 βˆ‰ 𝐷 and non of the descendants of 𝑀 are in 𝐷.

  • Definition: 𝐡 and 𝐢 are 𝑒-separated by 𝐷, if all paths from a

vertex 𝐡 to a vertex 𝐢 are blocked w.r.t 𝐷.

  • Theorem: If 𝐡 and 𝐢 are 𝑒-separated by 𝐷 then 𝐡∐𝐢|𝐷.

University of Waterloo

18 Tail-tail connection Head-tail connection Head-head connection

slide-19
SLIDE 19

CS480/680 Winter 2020 Zahra Sheikhbahaee

  • We can calculate the probability of having wet grass if it is cloudy by merging the

two subgraphs 𝑄 𝑋 𝐷 = βˆ‘:,O 𝑄 𝑋, 𝑆, 𝑇 𝐷 = 𝑄 𝑋, 𝑆, 𝑇 𝐷 + 𝑄 𝑋, ¬𝑆, 𝑇 𝐷 + 𝑄 𝑋, 𝑆, ¬𝑇 𝐷 + 𝑄 𝑋, ¬𝑆, ¬𝑇 𝐷 = 𝑄 𝑋 𝑆, 𝑇, 𝐷 𝑄 𝑆, 𝑇 𝐷 + 𝑄 𝑋 ¬𝑆, 𝑇, 𝐷 𝑄 ¬𝑆, 𝑇 𝐷 + 𝑄 𝑋 𝑆, ¬𝑇, 𝐷 𝑄 𝑆, ¬𝑇 𝐷 + 𝑄 𝑋 ¬𝑆, ¬𝑇, 𝐷 𝑄 ¬𝑆, ¬𝑇 𝐷 = 𝑄 𝑋 𝑆, 𝑇 𝑄 𝑆 𝐷 𝑄 𝑇 𝐷 + 𝑄 𝑋 ¬𝑆, 𝑇 𝑄 ¬𝑆 𝐷 𝑄 𝑇 𝐷 + 𝑄 𝑋 𝑆, ¬𝑇 𝑄 𝑆 𝐷 𝑄 ¬𝑇 𝐷 + 𝑄 𝑋 ¬𝑆, ¬𝑇 𝑄 ¬𝑆 𝐷 𝑄 ¬𝑇 𝐷

University of Waterloo

19

Independence

slide-20
SLIDE 20

CS480/680 Winter 2020 Zahra Sheikhbahaee

Causal Structure

  • A flu causes sinus inflammation
  • Allergies also cause sinus inflammation
  • sinus inflammation causes a runny nose
  • sinus inflammation causes headaches

𝑄 𝐺, 𝐡, 𝑇, 𝑆, 𝐼 = 𝑄 𝐺 𝑄 𝐡 𝑄(𝑇|𝐺, 𝐡)𝑄 𝑆 𝑇 𝑄(𝐼|𝑇)

University of Waterloo

20

𝑄(𝐺) 𝑄(𝐡) 𝑄(𝑇|𝐡, 𝐺) 𝑄(𝑆|𝑇) 𝑄(𝐼|𝑇)

slide-21
SLIDE 21

CS480/680 Winter 2020 Zahra Sheikhbahaee

NaΓ―ve Bayes Model

  • Class variable 𝐷
  • Evidence variables π‘Œ = 𝑦\, 𝑦], … , 𝑦_
  • Assumption: features are conditionally

independent given class (𝑦SβŠ₯ 𝑦p)|𝐷; βˆ€π‘¦S βŠ† π‘Œ, 𝑦ptS βŠ† π‘Œ 𝑄 𝐷, π‘Œ = 𝑄(𝐷) `

Sa\ _

𝑄(𝑦S|𝐷)

University of Waterloo

21

Graphical models are ways to represent conditional independency statements

  • pictorially. This provides a compact way to define joint probability distributions.