Lecture 19 Conditional Independence, Bayesian networks intro 1

Announcement nouncement • Assignment 4 will be out on next week. • Due Friday Dec 1 • you can still use late days if you have any left) 2

Lecture cture Ov Overvie rview • Recap lecture 18 • Marginal Independence • Conditional Independence • Bayesian Networks Introduction 3

Proba obabili bility ty Dis istri tributions butions Consider the case where possible worlds are simply assignments to one random variable. Definition (probability distribution) A probability distribution P on a random variable X is a function dom(X)  [0,1] such that x  P(X=x) Example: X represents a female adult’s hight in Canada with domain {short, normal, tall} – based on some definition of these terms short  P(hight = short) = 0.2 normal  P(hight = normal) = 0.5 tall  P(hight = tall) = 0.3 4

Joint nt Pr Probabilit ility y Distrib tributio ution (JPD PD) • Joint probability distribution over random variables X 1 , …, X n : • a probability distribution over the joint random variable <X 1 , …, X n > with domain dom(X 1 ) × … × dom(X n ) (the Cartesian product) • Think of a joint distribution over n variables as the table of the corresponding possible worlds • There is a column (dimension) for each variable, and one for the probability • Each row corresponds to an assignment X 1 = x 1 , …, X n = x n and its probability P(X 1 = x 1 , … , X n = x n ) We can also write P(X 1 = x 1  …  X n = x n ) • • The sum of probabilities across the whole table is 1. Weather Temperature µ(w) {Weather, Temperature} sunny hot 0.10 example from before sunny mild 0.20 sunny cold 0.10 cloudy hot 0.05 cloudy mild 0.35 5 cloudy cold 0.20

Recap: cap: Condi nditioning tioning • Conditioning: revise beliefs based on new observations • We need to integrate two sources of knowledge • Prior probability distribution P(X): all background knowledge • New evidence e • Combine the two to form a posterior probability distribution • The conditional probability P(h|e) 6

Recap: cap: Condi nditional tional probabil obability ity Possible Weather Temperature µ(w) T P(T|W=sunny) world hot 0.10/0.40=0.25 w 1 sunny hot 0.10 mild 0.20/0.40=0.50 w 2 sunny mild 0.20 cold 0.10/0.40=0.25 w 3 sunny cold 0.10 JPD for P(T|W=sunny ) w 4 cloudy hot 0.05 w 5 cloudy mild 0.35 w 6 cloudy cold 0.20 7

Recap: cap: In Infer ference ence by Enumeration meration • Great, we can compute arbitrary probabilities now! • Given • Prior joint probability distribution (JPD) on set of variab riables les X • specific values e for the evidenc idence e variables ariables E (subset of X) • We want to compute • posterior joint distribution of quer ery y variables ariables Y (a subset of X) given evidence e • Step 1: Condition to get distribution P(X|e) • Step 2: Marginalize to get distribution P(Y|e) 8

In Infer ference ence by Enumerati umeration: on: example ample • Given P(W,C,T) as JPD below, and evidence e : “ Wind=yes ” • What is the probability that it is cold? I.e., P(T= cold | W=yes) • Step 1: condition to get distribution P(C, T| W=yes) Windy Cloudy Temperature P(W, C, T) Cloudy Temperature P(C, T| W=yes) W C T C T yes no hot 0.04 no 0.04/0.43  0.10 hot yes no mild 0.09 no 0.09/0.43  0.21 mild yes no cold 0.07 no 0.07/0.43  0.16 cold yes yes hot 0.01 yes 0.01/0.43  0.02 hot yes yes mild 0.10 yes 0.10/0.43  0.23 mild yes yes cold 0.12 yes 0.12/0.43  0.28 cold no no hot 0.06 no no mild 0.11 𝑄(𝐷 = 𝑑 ∧ 𝑈 = 𝑢|𝑋 = 𝑧𝑓𝑡) = no no cold 0.03 𝑄(𝐷=𝑑ٿ 𝑈=𝑢ٿ 𝑋=𝑧𝑓𝑡) no yes hot 0.04 = 𝑄(𝑋=𝑧𝑓𝑡) no yes mild 0.25 9 no yes cold 0.08

In Infer ference ence by Enumerati umeration: on: example ample • Given P(W,C,T) as JPD in previous slide, and evidence e : “ Wind=yes ” • What is the probability that it is cold? I.e., P(T=cold | W=yes) • Step 2: marginalize over Cloudy to get distribution P(T | W=yes) Cloudy Temperature P(C, T| W=yes) Temperature P(T| W=yes) C T T sunny hot 0.10 hot 0.10+0.02 = 0.12 sunny mild 0.21 mild 0.21+0.23 = 0.44 sunny 0.16 cold 0.16+0.28 = 0.44 cold cloudy 0.02 hot cloudy 0.23 mild cloudy cold 0.28 P(T=cold | W=yes) is a specific entry of the • This is a probability distribution: it defines the probability probability distribution for of all the possible values of Temperature (three here), P(T | W=yes ) given the observed value for Windy (yes). • Because this is a probability distribution, the sum of all its values is 10

Conditi ition onal al Pr Probabili ility y among g Random m Va Variabl bles es It expresses the conditional probability of P(X | Y) = P(X , Y) / P(Y) each possible value for X given each possible value for Y P(X | Y) = P(Temperature | Weather) = P(Temperature  Weather) / P(Weather) Example: Temperature {hot, cold}; Weather = {sunny, cloudy} P(Temperature | Weather) T = hot T = cold W = sunny P(hot|sunny) P(cold|sunny) W = cloudy P(hot|cloudy) P(cold|cloudy) Which of the following is true? A. The probabilities in each row should sum to 1 B. The probabilities in each column should sum to 1 C. Both of the above D. None of the above 11

Conditional Probability among Random Variables It expresses the conditional probability P(X | Y) = P(X , Y) / P(Y) of each possible value for X given each possible value for Y P(X | Y) = P(Temperature | Weather) = P(Temperature  Weather) / P(Weather) Example: Temperature {hot, cold}; Weather = {sunny, cloudy} P(Temperature | Weather) T = hot T = cold P(T | Weather = sunny) W = sunny P(hot|sunny) P(cold|sunny) W = cloudy P(hot|cloudy) P(cold|cloudy) P(T | Weather = cloudy) A. The probabilities in each row should These are two JPDs! sum to 1 12

Recap: cap: In Infer ference ence by Enumeration meration • Great, we can compute arbitrary probabilities now! • Given • Prior joint probability distribution (JPD) on set of variab riables les X • specific values e for the evidence idence var ariables iables E (subset of X) • We want to compute • posterior joint distribution of query ery var ariables iables Y (a subset of X) given evidence e • Step 1: Condition to get distribution P(X|e) • Step 2: Marginalize to get distribution P(Y|e) Generally applicable, but memory-heavy and slow We will see a better way to do probabilistic inference 13

Bayes yes rule le and d Chain ain Rule le  ( | ) P fire alarm 14

Bayes yes rule le and d Chain ain Rule le 15

Product oduct Rule le • By definition, we know that :  ( ) P f f  2 1 ( | ) P f f 2 1 ( ) P f 1 • We can rewrite this to    ( ) ( | ) ( ) P f f P f f P f 2 1 2 1 1 • In general

Chain ain Rule le 1 Theorem: Chain Rule 𝑜 𝑄(𝑔 1 ٿ…ٿ𝑔 𝑜 ) = ෑ 𝑄(𝑔𝑗|𝑔 𝑗 − 1ٿ…ٿ 𝑔 1 ) 𝑗=1 17

Chain ain Rule le • Allows representing a Join Probability Distribution (JPD) as the product of conditional probability distributions Theorem: Chain Rule 𝑜 𝑄(𝑔 1 ٿ…ٿ𝑔 𝑜 ) = ෑ 𝑄(𝑔𝑗|𝑔 𝑗 − 1ٿ…ٿ 𝑔 1 ) 𝑗=1 19

Wh Why does es th the chain in rule le help lp us? We will see how, under specific circumstances (variables independence), this rule helps gain compactness • We can represent the JPD as a product of marginal distributions • We can simplify some terms when the variables involved are marginally independent or conditionally independent 20

Lecture cture Ov Overvie rview • Recap lecture 18 • Marginal Independence • Conditional Independence • Bayesian Networks Introduction 21

Margi rginal nal In Independenc ependence • Intuitively: if X ╨ Y , then • learning that Y=y does not change your belief in X • and this is true for all values y that Y could take • For example, weather is marginally independent of the result of a coin toss 22

Examples mples fo for marginal ginal in independence dependence Weather W Temperature T P(W,T) • Is Temperature marginally sunny hot 0.10 independent of Weather (see sunny mild 0.20 previous example)? sunny cold 0.10 cloudy hot 0.05 cloudy mild 0.35 cloudy cold 0.20 23

• Is Temperature marginally independent of Weather (see previous example Weather W Temperature T P(W,T) A. yes sunny hot 0.10 sunny mild 0.20 B. no sunny cold 0.10 C. It depends of the value of T cloudy hot 0.05 cloudy mild 0.35 D. It depends of the value of W cloudy cold 0.20 T P(T) T P(T|W=sunny) hot 0.15 hot 0.25 mild 0.55 mild 0.50 cold 0.30 cold 0.25 24

Lecture 19 Conditional Independence, Bayesian networks intro 1 - PowerPoint PPT Presentation

Lecture 19 Conditional Independence, Bayesian networks intro 1 Announcement nouncement Assignment 4 will be out on next week. Due Friday Dec 1 you can still use late days if you have any left) 2 Lecture cture Ov Overvie rview

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction

Richard Redux (With Apologies to John Updike) Ash Asudeh University of Rochester October

Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer

WEATHER Weather Forecasting Module 3.1 Proudly developed by SMART with funding from Inspiring

Modeling Update for Aliso OII California Public Utilities Commission Hearing Room, 5 th Floor 320

1 2 The AM was first developed as a method for weather predicting more than 60 years ago and

COORDINATION GAMES Nash Equilibria, Schelling Points and the Prisoners Dilemma Owain Evans,

Implementing Linked Data in Low Resource Conditions Caterina Caracciolo, Johannes Keizer