Probabilistic Graphical Models Lecture 2 Bayesian Networks - PowerPoint PPT Presentation

Probabilistic Graphical Models Lecture 2 – Bayesian Networks Representation CS/CNS/EE 155 Andreas Krause

Announcements Will meet in Steele 102 for now Still looking for another 1-2 TAs.. Homework 1 will be out soon. Start early!! ☺ 2

Multivariate distributions Instead of random variable, have random vector X ( ω ) = [X 1 ( ω ),…,X n ( ω )] Specify P(X 1 =x 1 ,…,X n =x n ) Suppose all X i are Bernoulli variables. How many parameters do we need to specify? 3 3

Marginal distributions Suppose we have joint distribution P(X 1 ,…,X n ) Then If all X i binary: How many terms? 4 4

Rules for random variables Chain rule Bayes’ rule 5

Key concept: Conditional independence Events α , β conditionally independent given γ if Random variables X and Y cond. indep. given Z if for all x ∈ Val(X), y ∈ Val(Y), Z ∈ Val(Z) P(X = x, Y = y | Z = z) = P(X =x | Z = z) P(Y = y| Z= z) If P(Y=y |Z=z)>0, that’s equivalent to P(X = x | Z = z, Y = y) = P(X = x | Z = z) Similarly for sets of random variables X , Y , Z We write: P � X ⊥ Y | Z 6 6

Why is conditional independence useful? P(X 1 ,…,X n ) = P(X 1 ) P(X 2 | X 1 ) … P(X n | X 1 ,…,X n-1 ) How many parameters? Now suppose X 1 …X i-1 ⊥ X i+1 … X n | X i for all i Then P(X 1 ,…,X n ) = How many parameters? Can we compute P(X n ) more efficiently? 7

Key questions How do we specify distributions that satisfy particular independence properties? � Representation How can we exploit independence properties for efficient computation? � Inference How can we identify independence properties present in data? � Learning Will now see example: Bayesian Networks 9

Key idea Conditional parameterization (instead of joint parameterization) For each RV, specify P(X i | X A ) for set X A of RVs Then use chain rule to get joint parametrization Have to be careful to guarantee legal distribution… 10

Example: 2 variables 11

Example: 3 variables 12

Example: Naïve Bayes models Class variable Y Evidence variables X 1 ,…,X n Assume that X A ⊥ X B | Y for all subsets X A ,X B of {X 1 ,…,X n } Conditional parametrization: Specify P(Y) Specify P(X i | Y) Joint distribution 13

Today: Bayesian networks Compact representation of distributions over large number of variables (Often) allows efficient exact inference (computing marginals, etc.) HailFinder 56 vars ~ 3 states each � ~10 26 terms > 10.000 years on Top supercomputers JavaBayes applet 14

Causal parametrization Graph with directed edges from (immediate) causes to (immediate) effects Earthquake Burglary Alarm MaryCalls JohnCalls 15

Bayesian networks A Bayesian network structure is a directed, acyclic graph G, where each vertex s of G is interpreted as a random variable X s (with unspecified distribution) A Bayesian network (G,P) consists of A BN structure G and .. ..a set of conditional probability distributions (CPTs) P(X s | Pa Xs ), where Pa Xs are the parents of node X s such that (G,P) defines joint distribution 16

Bayesian networks Can every probability distribution be described by a BN? 17

Representing the world using BNs � � � � � � � � � � � � � � � � � � � � represent � � � � � �� True distribution P’ Bayes net (G,P) with cond. ind. I(P’) with I(P) Want to make sure that I(P) ⊆ I(P’) Need to understand CI properties of BN (G,P) 18

Which kind of CI does a BN imply? E B A J M 19

Which kind of CI does a BN imply? E B A J M 20

Local Markov Assumption Each BN Structure G is associated with the following conditional independence assumptions X ⊥ NonDescendents X | Pa X We write I loc (G) for these conditional independences Suppose (G,P) is a Bayesian network representing P Does it hold that I loc (G) ⊆ I(P)? If this holds, we say G is an I-map for P. 21

Factorization Theorem � � � � � � � � � � � � � � � � � � � � � � � � � �� True distribution P can be represented exactly as I loc (G) ⊆ I(P) i.e., P can be represented as G is an I-map of P a Bayes net (G,P) (independence map) 22

Factorization Theorem � � � � � � � � � � � � � � � � � � � � � � � �� True distribution P can be represented exactly as I loc (G) ⊆ I(P) a Bayes net (G,P) G is an I-map of P (independence map) 23

Proof: I-Map to factorization 24

Factorization Theorem � � � � � � � � � � � � � � � � � � � � � � � �� True distribution P can be represented exactly as I loc (G) ⊆ I(P) a Bayes net (G,P) G is an I-map of P (independence map) 25

The general case 26

Factorization Theorem � � � � � � � � � � � � � � � � � � � � � � � �� True distribution P I loc (G) ⊆ I(P) can be represented exactly as Bayesian network (G,P) G is an I-map of P (independence map) 27

Defining a Bayes Net Given random variables and known conditional independences Pick ordering X 1 ,…,X n of the variables For each X i Find minimal subset A ⊆ {X 1 ,…,X i-1 } such that X i ⊥ X ¬A | A, where ¬ A = {X 1 ,…,X n } \ A Specify / learn CPD(X i | A) Ordering matters a lot for compactness of representation! More later this course. 28

Adding edges doesn’t hurt Theorem : Let G be an I-Map for P, and G’ be derived from G by adding an edge. Then G’ is an I-Map of P (G’ is strictly more expressive than G) Proof 29

Additional conditional independencies BN specifies joint distribution through conditional parameterization that satisfies Local Markov Property But we also talked about additional properties of CI Weak Union, Intersection, Contraction, … Which additional CI does a particular BN specify? All CI that can be derived through algebraic operations 30

What you need to know Bayesian networks Local Markov property I-Maps Factorization Theorem 31

Tasks Subscribe to Mailing list https://utils.its.caltech.edu/mailman/listinfo/cs155 Read Koller & Friedman Chapter 3.1-3.3 Form groups and think about class projects. If you have difficulty finding a group, email Pete Trautman 32

Probabilistic Graphical Models Lecture 2 Bayesian Networks - PowerPoint PPT Presentation

Probabilistic Graphical Models Lecture 2 Bayesian Networks Representation CS/CNS/EE 155 Andreas Krause Announcements Will meet in Steele 102 for now Still looking for another 1-2 TAs.. Homework 1 will be out soon. Start early!! 2

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs,

Calculating probabilities of two events F OUN DATION S OF P ROBABILITY IN P YTH ON Alexander

CS7015 (Deep Learning) : Lecture 17 Recap of Probability Theory, Bayesian Networks, Conditional

s r strtr qt

Probability, continued CMPUT 296: Basics of Machine Learning 2.2-2.4 Recap Probabilities

Machine Learning - MT 2017 10. Classification : Generative Models Varun Kanade University of

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical

Multi-parameter models Applied Bayesian Statistics Dr. Earvin Balderama Department of