CS 440/ECE448 Lecture 19: Bayes Net Inference Mark - PowerPoint PPT Presentation

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 modified by Julia Hockenmaier 3/2019 Including slides by Svetlana Lazebnik, 11/2016

CS440/ECE448 Lecture 19: Bayesian Networks and Bayes Net Inference Slides by Svetlana Lazebnik, 10/2016 Modified by Mark Hasegawa-Johnson, 3/2019 and Julia Hockenmaier 3/2019

Today’s lecture • Bayesian Networks (Bayes Nets) • A graphical representation of probabilistic models • Capture conditional (in)dependencies between random variables • Inference and Learning in Bayes Nets • Inference = Reasoning • Learning = Parameter estimation CS440/ECE448: Intro AI 3

Review: Bayesian inference A general scenario: Query variables: X Evidence ( observed ) variables and their values : E = e Inference problem : answer questions about the query variables given the evidence variables This can be done using the posterior distribution P( X | E = e ) Example of a useful question: Which X is true? More formally: what value of X has the least probability of being wrong? Answer: MPE = MAP (argmin P(error) = argmax P(X=x|E=e))

Today: What if P(X,E) is complicated? • Very, very common problem: P(X,E) is complicated because both X and E depend on some hidden variable Y • SOLUTION: • Represent the dependencies as a graph • When your algorithm performs inference, make sure it does so in the order of the graph • FORMALISM: Bayesian Network

Bayesian Inference with Hidden Variables A general scenario: • Query variables: X - Evidence ( observed ) variables and their values : E = e - Hidden (unobserved) variables : Y - Inference problem : answer questions about the query • variables given the evidence variables This can be done using the posterior distribution P( X | E = e ) - In turn, the posterior needs to be derived from the full joint P( X , E , Y ) - P ( X , e ) å = = µ P ( X | E e ) P ( X , e , y ) P ( e ) y Bayesian networks are a tool for representing joint • probability distributions efficiently

Bayesian Networks

Bayesian networks • More commonly called graphical models • A way to depict conditional independence relationships between random variables • A compact specification of full joint distributions

Independence • Random variables X and Y are independent ( X ⊥ Y) if P(X,Y) = P(X) × P(Y) NB.: Since X and Y are R.V.s (not individual events), P(X,Y) = P(X)×P(Y) is an abbreviation for: ∀ x ∀ y P(X=x,Y=y) =P(X=x)×P(Y=y) • X and Y are conditionally independent given Z ( X ⊥ Y | Z) if P(X,Y | Z) = P(X | Z ) × P(Y | Z) The value of X depends on the value of Z, and the value of Y depends on the value of Z, so X and Y are not independent. CS440/ECE448: Intro AI 9

Bayesian networks • Insight: (Conditional) independence assumptions are essential for probabilistic modeling • Bayes Net: a directed graph which represents the joint distribution of a number of random variables in a directed graph • Nodes = random variables • Directed edges = dependencies CS440/ECE448: Intro AI 10

Bayesian networks • Nodes: random variables • Edges: dependencies • An edge from one variable (parent) to another (child) indicates direct influence (conditional probabilities) • Edges must form a directed, acyclic graph We have four random variables • Each node is conditioned on its parents: Weather is independent of cavity, P(X | Parents(X)) toothache and catch These conditional distributions are the Toothache and catch both depend on parameters of the network cavity. • Each node is conditionally independent of its non-descendants given its parent

Conditional independence and the joint distribution • Key property: each node is conditionally independent of its non-descendants given its parents • Suppose the nodes X 1 , …, X n are sorted in topological order of the graph (i.e. if X i is a parent of X j , i < j) • To get the joint distribution P(X 1 , …, X n ), use chain rule (step 1 below) and then take advantage of independencies (step 2) n ( ) Õ = ! ! P ( X , , X ) P X | X , , X - 1 n i 1 i 1 = i 1 n ( ) Õ = P X | Parents ( X ) i i = i 1

The joint probability distribution n ( ) Õ = ! P ( X , , X ) P X | Parents ( X ) 1 n i i = i 1 P(j, m, a, ¬ b, ¬ e) = P( ¬ b) P( ¬ e) P(a| ¬ b, ¬ e) P(j|a) P(m|a)

Example: N independent coin flips • Complete independence: no interactions: P(X1) P(X2) P(X3) … X 1 X 2 X n

Conditional probability distributions • To specify the full joint distribution, we need to specify a conditional distribution for each node given its parents: P (X| Parents(X)) … Z 1 Z 2 Z n X P (X| Z 1 , …, Z n )

Naïve Bayes document model • Random variables: • X: document class • W 1 , …, W n : words in the document • Dependencies: P(X) P(W 1 | X) … P(W n | X) X … W 1 W 2 W n

Example: Los Angeles Burglar Alarm • I have a burglar alarm that is sometimes set off by minor earthquakes. My two neighbors, John and Mary, promised to call me at work if they hear the alarm • Example inference task: suppose Mary calls and John doesn’t call. What is the probability of a burglary? • What are the random variables? • Burglary , Earthquake , Alarm , JohnCalls , MaryCalls • What are the direct influence relationships? • A burglar can set the alarm off • An earthquake can set the alarm off • The alarm can cause Mary to call • The alarm can cause John to call

Example: Burglar Alarm 𝑄(𝐶) 𝑄(𝐹) A “model” is a complete • specification of the 𝑄(𝐵|𝐶, 𝐹) dependencies. • The conditional probability tables are the model parameters. 𝑄(𝑁|𝐵) 𝑄(𝐾|𝐵)

Example: Burglar Alarm

Outline • Review: Bayesian inference • Bayesian network: graph semantics • The Los Angeles burglar alarm example • Conditional independence ≠ Independence • Constructing a Bayesian network: Structure learning • Constructing a Bayesian network: Hire an expert

Independence • By saying that 𝑌 / and 𝑌 0 are independent, we mean that P(𝑌 0 , 𝑌 / ) = P(𝑌 / )P(𝑌 0 ) • 𝑌 / and 𝑌 0 are independent if and only if they have no common ancestors • Example: independent coin flips … X 1 X 2 X n • Another example: Weather is independent of all other variables in this model.

Conditional independence • By saying that 𝑋 / and 𝑋 0 are conditionally independent given 𝑌 , we mean that P 𝑋 / , 𝑋 0 𝑌 = P(𝑋 / |𝑌)P(𝑋 0 |𝑌) • 𝑋 / and 𝑋 0 are conditionally independent given 𝑌 if and only if they have no common ancestors other than the ancestors of 𝑌 . • Example: naïve Bayes model: X … W 1 W 2 W n

Conditional independence ≠ Independence Common cause: Conditionally Common effect : Independent Independent Are X and Z independent ? No Are X and Z independent ? Yes 𝑄(𝑌, 𝑎) = 𝑄(𝑌)𝑄(𝑎) 𝑄 𝑎, 𝑌 = 5 𝑄 𝑎 𝑍 𝑄 𝑌 𝑍 𝑄(𝑍) 6 Are they conditionally independent given Y? No 𝑄 𝑎 𝑄 𝑌 = 5 𝑄 𝑎 𝑍 𝑄(𝑍) 5 𝑄 𝑌 𝑍 𝑄(𝑍) 𝑄 𝑎, 𝑌 𝑍 = 𝑄 𝑍 𝑌, 𝑎 𝑄 𝑌 𝑄(𝑎) 6 6 𝑄(𝑍) Are they conditionally independent given Y? Yes ≠ 𝑄 𝑎|𝑍 𝑄 𝑌|𝑍 𝑄 𝑎, 𝑌 𝑍 = 𝑄(𝑎|𝑍)𝑄(𝑌|𝑍)

Conditional independence ≠ Independence Common cause: Conditionally Common effect: Independent Independent Are X and Z independent? No Are X and Z independent? Yes Knowing X tells you about Y, which tells you about Z. Knowing X tells you nothing about Z. Are they conditionally independent given Y? Yes Are they conditionally independent given Y? No If Y is true, then either X or Z must be true. If you already know Y, then X gives you no useful information about Z. Knowing that X is false means Z must be true. We say that X “explains away” Z.

Conditional independence ≠ Independence X … W 1 W 2 W n Being conditionally independent given X does NOT mean that 𝑋 / and 𝑋 0 are independent. Quite the opposite. For example: • The document topic, X, can be either “sports” or “pets”, equally probable. • W 1 =1 if the document contains the word “food,” otherwise W 1 =0. • W 2 =1 if the document contains the word “dog,” otherwise W 2 =0. • Suppose you don’t know X, but you know that W 2 =1 (the document has the word “dog”). Does that change your estimate of p(W 1 =1)?

Conditional independence Another example: causal chain • X and Z are conditionally independent given Y, because they have no common ancestors other than the ancestors of Y. • Being conditionally independent given Y does NOT mean that X and Z are independent. Quite the opposite. For example, suppose P(𝑌) = 0.5 , P 𝑍 𝑌 = 0.8 , P 𝑍 ¬𝑌 = 0.1 , P 𝑎 𝑍 = 0.7 , and P 𝑎 ¬𝑍 = 0.4 . Then we can calculate that P 𝑎 𝑌 = 0.64 , but P(𝑎) = 0.535

Outline • Review: Bayesian inference • Bayesian network: graph semantics • The Los Angeles burglar alarm example • Conditional independence ≠ Independence • Constructing a Bayesian network: Structure learning • Constructing a Bayesian network: Hire an expert

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark - PowerPoint PPT Presentation

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 modified by Julia Hockenmaier 3/2019 Including slides by Svetlana Lazebnik, 11/2016 CS440/ECE448 Lecture 19: Bayesian Networks and Bayes Net Inference Slides by

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 Including slides by

CS440/ECE448: Artificial Intelligence Lecture 1: What is AI? CS440/ECE448 Lecture 1: What is AI?

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

CS 440/ECE448 Lecture 36: Mechanism Design Mark Hasegawa-Johnson, 4/2020 Including slides by

CS 440/ECE448 Lecture 22: Reinforcement Learning Slides by Svetlana Lazebnik, 11/2016 Modified

CS 440/ECE448 Lecture 35: Game Theory Mark Hasegawa-Johnson, 4/2020 Including slides by Svetlana

CS 440/ECE448 Lecture 9: Game Theory Slides by Svetlana Lazebnik, 9/2016 Modified by Mark

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

CS440/ECE448 Lecture 14: Nave Bayes Mark Hasegawa-Johnson, 2/2020 Including slides by

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl

Outline Inference in Bayes Nets Variable Elimination Bayes Nets (cont) CS 486/686

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Lecture 20 A Bayes Net defines a joint distribution P(X 1 X n ) over a set of random

Out line Wrap up d-separ at ion I nf erence in Bayes Net s Bayes Net s (cont )

Generalized Bayesian Inference with Sets of Conjugate Priors for Dealing with Prior-Data Conflict

Introducing Bayes Rasmus Bth, rasmus.baath@gmail.com King Digital Entertainment Some ways to

Analysing identification issues in DSGE models Nikolai Iskrev, Marco Ratto Bank of Portugal,

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Projected Stein variational Newton: A fast and scalable Bayesian inference method in high

Announcements Piazza started Matlab Grader homework, email Friday, 2 (of 9) homeworks Due 21

Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State University April 23, 2019