bayesian networks li xiong
play

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , - PowerPoint PPT Presentation

Probabilistic Graphical Models: Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12 tutorial Outline Graphical models Bayesian networks - definition Bayesian networks - inference Bayesian


  1. Probabilistic Graphical Models: Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD ’12 tutorial

  2. Outline  Graphical models  Bayesian networks - definition  Bayesian networks - inference  Bayesian networks - learning November 5, 2017 Data Mining: Concepts and Techniques 2

  3. Overview The envelope quiz  There are two envelopes, one has a red ball ($100) and a black ball, the other two black balls  You randomly picked an envelope, randomly took out a ball – it was black  At this point, you are given the option to switch envelopes. Should you?

  4. Overview The envelope quiz Random variables E ∈ { 1 , 0 } , B ∈ { r, b } P ( E = 1) = P ( E = 0) = 1 / 2 P ( B = r | E = 1) = 1 / 2 ,P ( B = r | E = 0) = 0 We ask: P ( E = 1 | B = b )

  5. Overview The envelope quiz Random variables E ∈ { 1 , 0 } , B ∈ { r, b } P ( E = 1) = P ( E = 0) = 1 / 2 P ( B = r | E = 1) = 1 / 2 ,P ( B = r | E = 0) = 0 We ask: P ( E = 1 | B = b ) P(B = b | E =1) P ( E =1) 1 / 2 × 1 / 2 P ( E = 1 | B = b ) = = 1 / 3 = P ( B = b ) 3 / 4 The graphical model: E B

  6. Overview Reasoning with uncertainty A set of random variables x 1 , . . . , x n • , x n ≡ y the class label e.g. ( x 1 , . . . , x n −1 ) a feature vector Inference: given joint distribution p ( x 1 , ...,x n ) , compute • p ( X Q | X E ) where X Q ∪ X E ⊆ { x 1 ...x n } e.g. Q = { n } , E = { 1 ...n − 1 } , by the definition of conditional p ( x 1 , ...,x n −1 , x n ) p ( x | x , . . . , x n − 1 ) = 1 n Σ v p ( x 1 , ...,x n −1 , x n = v ) Learning: estimate p ( x 1 , ...,x n ) from training data X (1) , ...,X ( N ) , • where X ( i ) = ( x ( i ) , ...,x ( i ) ) n 1

  7. Overview It is difficult to reason with uncertainty joint distribution p ( x 1 , ...,x n ) • exponential naive storage ( 2 n for binary r .v.) hard to interpret (conditional independence) inference p ( X Q | X E ) • Often can’t afford to do it by brute force If p ( x 1 , ...,x n ) not given, estimate it from data • Often can’t afford to do it by brute force Graphical model: efficient representation, inference, and learning • on p ( x 1 , ...,x n ) , exactly or approximately

  8. Definitions Graphical-Model-Nots Graphical model is the study of probabilistic models Just because there are nodes and edges doesn’t mean it’s a graphical model These are not graphical models: neural network decision tree network flow HMM template

  9. Graphical Models  Bayesian networks – directed  Markov networks – undirected November 5, 2017 Data Mining: Concepts and Techniques 9

  10. Outline  Graphical models  Bayesian networks - definition  Bayesian networks - inference  Bayesian networks - learning November 5, 2017 Data Mining: Concepts and Techniques 10

  11. Bayesian Networks: Intuition • A graphical representation for a joint probability distribution  Nodes are random variables  Directed edges between nodes reflect dependence • Some informal examples: Smoking At Fire Understood Sensor Material Exam Assignment Alarm Grade Grade

  12. Bayesian networks • a BN consists of a Directed Acyclic Graph (DAG) and a set of conditional probability distributions • in the DAG – each node denotes random a variable – each edge from X to Y represents that X directly influences Y – formally: each variable X is independent of its non- descendants given its parents • each node X has a conditional probability distribution (CPD) representing P ( X | Parents ( X ) )

  13. Definitions Directed Graphical Models Conditional Independence Two r .v.s A, B are independent if P ( A, B ) = P ( A ) P ( B ) or P ( A | B ) = P ( A ) (the two are equivalent) Two r .v.s A, B are conditionally independent given C if P ( A, B | C ) = P ( A | C ) P ( B | C ) or P ( A | B, C ) = P ( A | C ) (the two are equivalent)

  14. Bayesian networks • using the chain rule, a joint probability distribution can be expressed as n P( X 1 )  P ( X i | X 1 , … , X i  1 )) P ( X 1 , … , X n )  i  2 • a BN provides a compact representation of a joint probability distribution n  P ( X i | Parents ( X i )) P ( X 1 , … , X n )  i  1

  15. Bayesian network example • Consider the following 5 binary random variables: B = a burglary occurs at your house E = an earthquake occurs at your house A = the alarm goes off J = John calls to report the alarm M = Mary calls to report the alarm • Suppose we want to answer queries like what is P ( B | M , J ) ?

  16. Bayesian network example P ( B ) P ( E ) t f t f 0.001 0.999 0.001 0.999 Burglary Earthquake P ( A | B, E ) B E t f t t 0.95 0.05 Alarm t f 0.94 0.06 f t 0.29 0.71 f f 0.001 0.999 JohnCalls MaryCalls P ( J |A ) P ( M |A ) A t f A t f t 0.9 0.1 t 0.7 0.3 f 0.05 0.95 f 0.01 0.99

  17. Bayesian networks Burglary Earthquake P ( B , E , A , J , M )  P ( B )  P ( E )  P ( A | B , E ) Alarm  P(J | A )  P ( M | A ) JohnCalls MaryCalls • a standard representation of the joint distribution for the Alarm example has 2 5 = 32 parameters • the BN representation of this distribution has 20 parameters

  18. Bayesian networks • consider a case with 10 binary random variables • How many parameters does a BN with the following graph structure have? 2 4 4 = 42 4 4 4 4 4 8 4 • How many parameters does the standard table representation of the joint distribution have? = 1024

  19. Advantages of the Bayesian network representation • Captures independence and conditional independence where they exist • Encodes the relevant portion of the full joint among variables where dependencies exist • Uses a graphical representation which lends insight into the complexity of inference

  20. Bayesian Networks  Graphical models  Bayesian networks - definition  Bayesian networks – inference  Exact inference  Approximate inference  Bayesian networks – learning  Parameter learning  Network learning 20

  21. The inference task in Bayesian networks Given : values for some variables in the network ( evidence ), and a set of query variables Do : compute the posterior distribution over the query variables • variables that are neither evidence variables nor query variables are other variables • the BN representation is flexible enough that any set can be the evidence variables and any set can be the query variables

  22. Recall Naïve Bayesian Classifier P ( | C ) P ( C ) X  Derive the maximum posterior  i i P ( C | ) X i P ( ) X  Independence assumption n       P ( | ) P ( | ) P ( | ) P ( | ) ... P ( | ) X Ci x Ci x Ci x Ci x Ci k 1 2 n  k 1  Simplified network

  23. Inference Exact Inference Exact Inference by Enumeration Let X = ( X Q , X E ,X O ) for query , evidence, and other variables. Infer P ( X Q | X E ) By definition P ( X , X , X ) Σ X O P ( X , X ) Q E O | X ) = Q E P ( X Q E = Σ X Q ,X O P ( X Q , X E ,X O ) P ( X E )

  24. Inference by enumeration example • let a denote A =true , and ¬a denote A =false • suppose we’re given the query: P ( b | j , m ) “probability the house is being burglarized given that John and Mary both called” • from the graph structure we can first compute: P ( b , j , m )   P ( b ) P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) E B e a A sum over possible values for E and A variables ( e, ¬e, a, ¬a ) J M

  25. Inference by enumeration example P ( b , j , m )   P ( b ) P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) e a  P ( b )  P ( e ) P ( a | b , e ) P ( j | a ) P ( m | a ) e a P(E) P(B) B E A J M 0.001 0.001 e, a  0.001  (0.001  0.95  0.9  0.7  E B e, ¬a B E P(A) 0.001  0.05  0.05  0.01  t t 0.95 0.999  0.94  0.9  0.7  ¬e, a t f 0.94 A f t 0.29 0.999  0.06  0.05  0.01) ¬ e, ¬ a f f 0.001 J M A P(J) A P(M) t t 0.9 0.7 f f 0.05 0.01

  26. Inference by enumeration example • now do equivalent calculation for P ( ¬b , j, m ) • and determine P ( b | j, m ) P ( b | j , m )  P ( b , j , m )  P ( b , j , m ) P ( b , j , m )  P (  b , j , m ) P ( j , m )

  27. Inference Exact Inference Exact Inference by Enumeration Let X = ( X Q , X E ,X O ) for query , evidence, and other variables. Infer P ( X Q | X E ) By definition P ( X , X , X ) Σ X O P ( X , X ) Q E O | X ) = Q E P ( X Q E = Σ X Q ,X O P ( X Q , X E ,X O ) P ( X E ) Computational issue: summing exponential number of terms - with k variables in X O each taking r values, there are r k terms

  28. Bayesian Networks  Graphical models  Bayesian networks - definition  Bayesian networks – inference  Exact inference  Approximate inference  Bayesian networks – learning  Parameter learning  Network learning 28

  29. Approximate (Monte Carlo) Inference in Bayes Nets • Basic idea: repeatedly generate data samples according to the distribution represented by the Bayes Net • Estimate the probability P ( X Q | X E ) E B A J M

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend