Bayesian Networks George Konidaris gdk@cs.brown.edu Fall 2019

Recall Joint distributions: • P(X 1 , …, X n ). • All you (statistically) need to know about X 1 … X n . • From it you can infer P(X 1 ), P(X 1 | Xs), etc. Raining Cold Prob. True True 0.3 True False 0.1 False True 0.4 False False 0.2

Joint Distributions Are Useful Classification • P(X 1 | X 2 … X n ) things you know thing you want to know Co-occurrence • P(X a , X b ) how likely are these two things together? Rare event detection • P(X 1 , …, X n )

Independence If independent, can break JPD into separate tables. P(A, B) = P(A)P(B) Raining Prob. Cold Prob. True 0.6 True 0.75 False 0.4 False 0.25 X Raining Cold Prob. True True 0.45 True False 0.15 False True 0.3 False False 0.1

Conditional Independence A and B are conditionally independent given C if: • P(A | B, C) = P(A | C) • P(A, B | C) = P(A | C) P(B | C) (recall independence: P(A, B) = P(A)P(B)) This means that, if we know C , we can treat A and B as if they were independent . A and B might not be independent otherwise!

Example Consider 3 RVs: • Temperature • Humidity • Season Temperature and humidity are not independent. But, they might be, given the season: the season explains both , and they become independent of each other.

Bayes Nets A particular type of graphical model: • A directed, acyclic graph. • A node for each RV. S T H Given parents, each RV independent of non- descendants.

Bayes Net S T H JPD decomposes: Y P ( x 1 , ..., x n ) = P ( x i | parents( x i )) i So for each node, store conditional probability table (CPT): P ( x i | parents( x i ))

CPTs Conditional Probability Table • Probability distribution over variable given parents. • One distribution per setting of parents. conditioning variables X Y Z P True True True 0.7 False True True 0.3 variable of distributions True True False 0.2 interest (sum to 1) False True False 0.8 True False True 0.5 False False True 0.5 True False False 0.4 False False False 0.6

Example Suppose we know: • The flu causes sinus inflammation. • Allergies cause sinus inflammation. • Sinus inflammation causes a runny nose. • Sinus inflammation causes headaches.

Example Flu Allergy Sinus Nose Headache

Example Flu Allergy Flu P Allergy P True 0.6 True 0.2 Sinus False 0.4 False 0.8 Sinus Flu Allergy P True True True 0.9 False True True 0.1 Headache True True False 0.6 False True False 0.4 True False False 0.2 False False False 0.8 Nose True False True 0.4 False False True 0.6 Headache Sinus P Nose Sinus P True True 0.6 Headache False Sinus True 0.4 P True True 0.8 True False 0.5 False True 0.2 False False 0.5 True False 0.3 joint: 32 (31) entries False False 0.7

Uses Things you can do with a Bayes Net: • Inference: given some variables, posterior? • ( might be intractable : NP-hard) • Learning (fill in CPTs) • Structure Learning (fill in edges) Generally: • Often few parents. • Inference cost often reasonable. • Can include domain knowledge.

Inference What is: P(f | h)? Flu Allergy Sinus Nose Headache

Inference Given A compute P(B | A). Flu Allergy Sinus Nose Headache

Inference What is: P(F=True | H=True)? Flu Allergy Sinus Nose Headache

Inference P ( f | h ) = P ( f, h ) P SAN P ( f, h, S, A, N ) = P ( h ) P SANF P ( h, S, A, N, F ) identity X P ( a ) = P ( a, B ) B = T,F X X P ( a ) = P ( a, B, C ) B = T,F C = T,F

Inference P SAN P ( f, h, S, A, N ) P ( f | h ) = P ( f, h ) = P ( h ) P SANF P ( h, S, A, N, F ) We know from definition of Bayes net: X P ( h ) = P ( h, S, A, N, F ) SANF X P ( h ) = P ( h | S ) P ( N | S ) P ( S | A, F ) P ( F ) P ( A ) SANF

Variable Elimination So we have: X P ( h ) = P ( h | S ) P ( N | S ) P ( S | A, F ) P ( F ) P ( A ) SANF … we can eliminate variables one at a time: (distributive law) X X P ( h ) = P ( h | S ) P ( N | S ) P ( S | A, F ) P ( F ) P ( A ) SN AF X X X P ( h ) = P ( h | S ) P ( N | S ) P ( S | A, F ) P ( F ) P ( A ) S N AF

Variable Elimination X X X P ( h ) = P ( h | S ) P ( N | S ) P ( S | A, F ) P ( F ) P ( A ) S N AF sinus = true X X 0 . 6 × P ( N | S = True ) P ( S = True | A, F ) P ( F ) P ( A )+ N AF X X 0 . 5 × P ( N | S = False ) P ( S = False | A, F ) P ( F ) P ( A ) sinus = false N AF Headache Sinus P True True 0.6 Headache Sinus P False True 0.4 True True 0.6 True False 0.5 False False 0.5

Variable Elimination X X X P ( h ) = P ( h | S ) P ( N | S ) P ( S | A, F ) P ( F ) P ( A ) S N AF X 0 . 6 × [0 . 8 × P ( S = True | A, F ) P ( F ) P ( A )+ AF X 0 . 2 × P ( S = True | A, F ) P ( F ) P ( A )]+ AF X 0 . 5 × [0 . 3 × P ( S = False | A, F ) P ( F ) P ( A )+ AF X 0 . 7 × P ( S = False | A, F ) P ( F ) P ( A )] AF Nose Sinus P True True 0.8 Nose Sinus P False True 0.2 True True 0.8 True False 0.3 False False 0.7

Variable Elimination Downsides: • How to simplify? (Hard in general.) • Computational complexity • Hard to parallelize

Alternative Sampling approaches • Based on drawing random numbers • Computationally expensive, but easy to code! • Easy to parallelize

Sampling What’s a sample? From a distribution: x xx x x x x x x x x x x x x x x x 0.6 F=True From a CPT: Flu P F=True 0 1 True 0.6 F=False False 0.4 F=False F=True

Generative Models How do we sample from a Bayes Net? A Bayes Net is known as a generative model . Describe a generative process for the data. • Each variable is generated by a distribution. • Describes the structure of that generation. • Can generate more data. Natural way to include domain knowledge via causality.

Sampling the Joint Algorithm for generating samples drawn from the joint distribution: For each node with no parents: • Draw sample from marginal distribution. • Condition children on choice (removes edge) • Repeat. Results in artificial data set. Probability values - literally just count .

Generative Models Allergy Flu Allergy P True 0.2 False 0.8 Flu P Sinus True 0.6 False 0.4 Sinus Flu Allergy P Headache True True True 0.9 False True True 0.1 True True False 0.6 False True False 0.4 True False False 0.2 Nose False False False 0.8 True False True 0.4 False False True 0.6 Headache Sinus P Nose Sinus P True True 0.6 Headache False Sinus True 0.4 P True True 0.8 True False 0.5 True True 0.6 Nose False Sinus True 0.2 P False False 0.5 True False 0.3 True True 0.8 False False 0.7

Generative Models Allergy Flue = True Allergy P True 0.2 False 0.8 Sinus Sinus Flu Allergy P Headache True True True 0.9 False True True 0.1 True True False 0.6 False True False 0.4 True False False 0.2 Nose False False False 0.8 True False True 0.4 False False True 0.6 Headache Sinus P Nose Sinus P True True 0.6 Headache False Sinus True 0.4 P True True 0.8 True False 0.5 True True 0.6 Nose False Sinus True 0.2 P False False 0.5 True False 0.3 True True 0.8 False False 0.7

Generative Models Flue = True Allergy = False Sinus Sinus Flu Allergy P Headache True True True 0.9 False True True 0.1 True True False 0.6 False True False 0.4 True False False 0.2 Nose False False False 0.8 True False True 0.4 False False True 0.6 Headache Sinus P Nose Sinus P True True 0.6 Headache False Sinus True 0.4 P True True 0.8 True False 0.5 True True 0.6 Nose False Sinus True 0.2 P False False 0.5 True False 0.3 True True 0.8 False False 0.7

Generative Models Flue = True Allergy = False Sinus = True Nose = True Headache = False Headache Nose Headache Sinus P Nose Sinus P True True 0.6 Headache False Sinus True 0.4 P True True 0.8 True False 0.5 True True 0.6 Nose False Sinus True 0.2 P False False 0.5 True False 0.3 True True 0.8 False False 0.7

Sampling the Conditional What if we want to know P(A | B)? We could use the previous procedure, and just divide the data up based on B. What if we want P(A | b)? • Could do the same, just use data with B=b . • Throw away the rest of the data. • Rejection sampling.

Sampling the Conditional What if b is uncommon? What if b involves many variables? Importance sampling: • Bias the sampling process to get more “hits”. • New distribution, Q. • Use a reweighing trick to unbias probabilities. • Multiply by P/Q to get probability of sample .

Sampling Properties of sampling: • Slow. • Always works. • Always applicable. • Easy to parallelize. • Computers are getting faster.

Independence What does this look like with a Bayes Net? Raining Prob. Cold Prob. True 0.6 True 0.75 Raining False 0.4 False 0.25 X Cold Raining Cold Prob. True True 0.45 True False 0.15 False True 0.3 False False 0.1

Naive Bayes P(S) S … W1 W2 W3 Wn P(W1|S) P(W2|S) P(W3|S) P(Wn|S)

Spam Filter (Naive Bayes) P(S) S … W1 W2 W3 Wn P(W1|S) P(W2|S) P(W3|S) P(Wn|S) Want P(S | W 1 … W n )

Bayesian Networks George Konidaris gdk@cs.brown.edu Fall 2019 - PowerPoint PPT Presentation

Bayesian Networks George Konidaris gdk@cs.brown.edu Fall 2019 Recall Joint distributions: P(X 1 , , X n ). All you (statistically) need to know about X 1 X n . From it you can infer P(X 1 ), P(X 1 | Xs), etc. Raining Cold

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Record and Application to Allergy Triage for Inpatient Penicillin Allergy Testing Hannah D. Fjeld,

Distinguishing)the)Difference: COVID319)vs.)Allergies)vs.)Flu Purvi Parikh,)MD

Mind%Body)Connection)during)COVID%19:) Physical,)Emotional,)Relational)Health) and)More)

BATTLING THE MYTHS AND FEARS REGARDING PEDIATRIC Nothing to disclose VACCINES RC Hellinga,

For Monday Finish chapter 14 Homework: Chapter 13, exercises 8, 15 Program 3 Bayesian

XQuery Advanced Topics Alin Deutsch Roadmap Use of XQuery for Web Data Integration

Allergy and Hypersensitivity K. J. Goodrum 2005 1 Fig.12.6 Early IL-4 response promotes Th2

Mariana Castells, M.D., Ph.D. Associate Professor in Medicine Allergy and Clinical Immunology