Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , - PowerPoint PPT Presentation

Probabilistic Graphical Models: Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD ’12 tutorial

Outline  Graphical models  Bayesian networks - definition  Bayesian networks - inference  Bayesian networks - learning November 5, 2017 Data Mining: Concepts and Techniques 2

Overview The envelope quiz  There are two envelopes, one has a red ball ($100) and a black ball, the other two black balls  You randomly picked an envelope, randomly took out a ball – it was black  At this point, you are given the option to switch envelopes. Should you?

Overview The envelope quiz Random variables E ∈ { 1 , 0 } , B ∈ { r, b } P ( E = 1) = P ( E = 0) = 1 / 2 P ( B = r | E = 1) = 1 / 2 ,P ( B = r | E = 0) = 0 We ask: P ( E = 1 | B = b )

Overview The envelope quiz Random variables E ∈ { 1 , 0 } , B ∈ { r, b } P ( E = 1) = P ( E = 0) = 1 / 2 P ( B = r | E = 1) = 1 / 2 ,P ( B = r | E = 0) = 0 We ask: P ( E = 1 | B = b ) P(B = b | E =1) P ( E =1) 1 / 2 × 1 / 2 P ( E = 1 | B = b ) = = 1 / 3 = P ( B = b ) 3 / 4 The graphical model: E B

Overview Reasoning with uncertainty A set of random variables x 1 , . . . , x n • , x n ≡ y the class label e.g. ( x 1 , . . . , x n −1 ) a feature vector Inference: given joint distribution p ( x 1 , ...,x n ) , compute • p ( X Q | X E ) where X Q ∪ X E ⊆ { x 1 ...x n } e.g. Q = { n } , E = { 1 ...n − 1 } , by the definition of conditional p ( x 1 , ...,x n −1 , x n ) p ( x | x , . . . , x n − 1 ) = 1 n Σ v p ( x 1 , ...,x n −1 , x n = v ) Learning: estimate p ( x 1 , ...,x n ) from training data X (1) , ...,X ( N ) , • where X ( i ) = ( x ( i ) , ...,x ( i ) ) n 1

Overview It is difficult to reason with uncertainty joint distribution p ( x 1 , ...,x n ) • exponential naive storage ( 2 n for binary r .v.) hard to interpret (conditional independence) inference p ( X Q | X E ) • Often can’t afford to do it by brute force If p ( x 1 , ...,x n ) not given, estimate it from data • Often can’t afford to do it by brute force Graphical model: efficient representation, inference, and learning • on p ( x 1 , ...,x n ) , exactly or approximately

Definitions Graphical-Model-Nots Graphical model is the study of probabilistic models Just because there are nodes and edges doesn’t mean it’s a graphical model These are not graphical models: neural network decision tree network flow HMM template

Graphical Models  Bayesian networks – directed  Markov networks – undirected November 5, 2017 Data Mining: Concepts and Techniques 9

Outline  Graphical models  Bayesian networks - definition  Bayesian networks - inference  Bayesian networks - learning November 5, 2017 Data Mining: Concepts and Techniques 10

Bayesian Networks: Intuition • A graphical representation for a joint probability distribution  Nodes are random variables  Directed edges between nodes reflect dependence • Some informal examples: Smoking At Fire Understood Sensor Material Exam Assignment Alarm Grade Grade

Bayesian networks • a BN consists of a Directed Acyclic Graph (DAG) and a set of conditional probability distributions • in the DAG – each node denotes random a variable – each edge from X to Y represents that X directly influences Y – formally: each variable X is independent of its non- descendants given its parents • each node X has a conditional probability distribution (CPD) representing P ( X | Parents ( X ) )

Definitions Directed Graphical Models Conditional Independence Two r .v.s A, B are independent if P ( A, B ) = P ( A ) P ( B ) or P ( A | B ) = P ( A ) (the two are equivalent) Two r .v.s A, B are conditionally independent given C if P ( A, B | C ) = P ( A | C ) P ( B | C ) or P ( A | B, C ) = P ( A | C ) (the two are equivalent)

Bayesian networks • using the chain rule, a joint probability distribution can be expressed as n P( X 1 )  P ( X i | X 1 , … , X i  1 )) P ( X 1 , … , X n )  i  2 • a BN provides a compact representation of a joint probability distribution n  P ( X i | Parents ( X i )) P ( X 1 , … , X n )  i  1

Bayesian network example • Consider the following 5 binary random variables: B = a burglary occurs at your house E = an earthquake occurs at your house A = the alarm goes off J = John calls to report the alarm M = Mary calls to report the alarm • Suppose we want to answer queries like what is P ( B | M , J ) ?

Bayesian network example P ( B ) P ( E ) t f t f 0.001 0.999 0.001 0.999 Burglary Earthquake P ( A | B, E ) B E t f t t 0.95 0.05 Alarm t f 0.94 0.06 f t 0.29 0.71 f f 0.001 0.999 JohnCalls MaryCalls P ( J |A ) P ( M |A ) A t f A t f t 0.9 0.1 t 0.7 0.3 f 0.05 0.95 f 0.01 0.99

Bayesian networks Burglary Earthquake P ( B , E , A , J , M )  P ( B )  P ( E )  P ( A | B , E ) Alarm  P(J | A )  P ( M | A ) JohnCalls MaryCalls • a standard representation of the joint distribution for the Alarm example has 2 5 = 32 parameters • the BN representation of this distribution has 20 parameters

Bayesian networks • consider a case with 10 binary random variables • How many parameters does a BN with the following graph structure have? 2 4 4 = 42 4 4 4 4 4 8 4 • How many parameters does the standard table representation of the joint distribution have? = 1024

Advantages of the Bayesian network representation • Captures independence and conditional independence where they exist • Encodes the relevant portion of the full joint among variables where dependencies exist • Uses a graphical representation which lends insight into the complexity of inference

Bayesian Networks  Graphical models  Bayesian networks - definition  Bayesian networks – inference  Exact inference  Approximate inference  Bayesian networks – learning  Parameter learning  Network learning 20

The inference task in Bayesian networks Given : values for some variables in the network ( evidence ), and a set of query variables Do : compute the posterior distribution over the query variables • variables that are neither evidence variables nor query variables are other variables • the BN representation is flexible enough that any set can be the evidence variables and any set can be the query variables

Recall Naïve Bayesian Classifier P ( | C ) P ( C ) X  Derive the maximum posterior  i i P ( C | ) X i P ( ) X  Independence assumption n       P ( | ) P ( | ) P ( | ) P ( | ) ... P ( | ) X Ci x Ci x Ci x Ci x Ci k 1 2 n  k 1  Simplified network

Inference Exact Inference Exact Inference by Enumeration Let X = ( X Q , X E ,X O ) for query , evidence, and other variables. Infer P ( X Q | X E ) By definition P ( X , X , X ) Σ X O P ( X , X ) Q E O | X ) = Q E P ( X Q E = Σ X Q ,X O P ( X Q , X E ,X O ) P ( X E )

Inference by enumeration example • let a denote A =true , and ¬a denote A =false • suppose we’re given the query: P ( b | j , m ) “probability the house is being burglarized given that John and Mary both called” • from the graph structure we can first compute: P ( b , j , m )   P ( b ) P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) E B e a A sum over possible values for E and A variables ( e, ¬e, a, ¬a ) J M

Inference by enumeration example P ( b , j , m )   P ( b ) P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) e a  P ( b )  P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) e a P(E) P(B) B E A J M 0.001 0.001 e, a  0.001  (0.001  0.95  0.9  0.7  E B e, ¬a B E P(A) 0.001  0.05  0.05  0.01  t t 0.95 0.999  0.94  0.9  0.7  ¬e, a t f 0.94 A f t 0.29 0.999  0.06  0.05  0.01) ¬ e, ¬ a f f 0.001 J M A P(J) A P(M) t t 0.9 0.7 f f 0.05 0.01

Inference by enumeration example • now do equivalent calculation for P ( ¬b , j, m ) • and determine P ( b | j, m ) P ( b | j , m )  P ( b , j , m )  P ( b , j , m ) P ( b , j , m )  P (  b , j , m ) P ( j , m )

Inference Exact Inference Exact Inference by Enumeration Let X = ( X Q , X E ,X O ) for query , evidence, and other variables. Infer P ( X Q | X E ) By definition P ( X , X , X ) Σ X O P ( X , X ) Q E O | X ) = Q E P ( X Q E = Σ X Q ,X O P ( X Q , X E ,X O ) P ( X E ) Computational issue: summing exponential number of terms - with k variables in X O each taking r values, there are r k terms

Bayesian Networks  Graphical models  Bayesian networks - definition  Bayesian networks – inference  Exact inference  Approximate inference  Bayesian networks – learning  Parameter learning  Network learning 28

Approximate (Monte Carlo) Inference in Bayes Nets • Basic idea: repeatedly generate data samples according to the distribution represented by the Bayes Net • Estimate the probability P ( X Q | X E ) E B A J M

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , - PowerPoint PPT Presentation

Probabilistic Graphical Models: Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12 tutorial Outline Graphical models Bayesian networks - definition Bayesian networks - inference Bayesian

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Plia Xiong Plia Xiong Prospective Student Services Coordinator CALS Academic Affairs What is

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Model Selection Model Selection with Small Samples with Small Samples Department of Computer

Overview Bayesian Model Selection Bayesian Learning of CPTs Dealing with Multiple Models Chris

First Results with PAWIAN th 2019| P ANDA CM 19/2 GSI | Jennifer Ptz June 25 Outline

Probabilistic Graphical Models Lecture 6 Variable Elimination CS/CNS/EE 155 Andreas Krause

T-61.3050 Machine Learning: Basic Principles Multivariate Methods Kai Puolam aki Laboratory

Learning Bayesian Networks: Learning Bayesian Networks: Na ve and non ve and non- -Na

Bayes Network Analysis by Program Verification Joost-Pieter Katoen Alan Turing Institute,

Graphical Models Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , - PowerPoint PPT Presentation

Probabilistic Graphical Models: Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12 tutorial Outline Graphical models Bayesian networks - definition Bayesian networks - inference Bayesian

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Plia Xiong Plia Xiong Prospective Student Services Coordinator CALS Academic Affairs What is

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Model Selection Model Selection with Small Samples with Small Samples Department of Computer

Overview Bayesian Model Selection Bayesian Learning of CPTs Dealing with Multiple Models Chris

First Results with PAWIAN th 2019| P ANDA CM 19/2 GSI | Jennifer Ptz June 25 Outline

Probabilistic Graphical Models Lecture 6 Variable Elimination CS/CNS/EE 155 Andreas Krause

T-61.3050 Machine Learning: Basic Principles Multivariate Methods Kai Puolam aki Laboratory

Learning Bayesian Networks: Learning Bayesian Networks: Na ve and non ve and non- -Na

Bayes Network Analysis by Program Verification Joost-Pieter Katoen Alan Turing Institute,

Graphical Models Henrik I. Christensen Robotics &amp; Intelligent Machines @ GT Georgia

Graphical Models Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia