inference in bayesian networks
play

Inference in Bayesian Networks Marco Chiarandini Department of - PowerPoint PPT Presentation

Lecture 7 Inference in Bayesian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Inference in BN Course Overview Introduction Learning


  1. Lecture 7 Inference in Bayesian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig

  2. Inference in BN Course Overview ✔ Introduction Learning ✔ Artificial Intelligence Supervised ✔ Intelligent Agents Learning Bayesian Networks, Neural Networks ✔ Search Unsupervised ✔ Uninformed Search EM Algorithm ✔ Heuristic Search Reinforcement Learning Uncertain knowledge and Games and Adversarial Search Reasoning Minimax search and ✔ Probability and Bayesian Alpha-beta pruning approach Multiagent search Bayesian Networks Knowledge representation and Hidden Markov Chains Reasoning Kalman Filters Propositional logic First order logic Inference Plannning 2

  3. Inference in BN Bayesian networks, Resume Encode local conditional independences Pr ( X i | X − i ) = Pr ( X i | Parents ( X i )) Thus the global semantics simplifies to (joint probability factorization): n � Pr ( X 1 , . . . , X n ) = Pr ( X i | X 1 , . . . , X i − 1 ) (chain rule) i = 1 n � = Pr ( X i | Parents ( X i )) (by construction) i = 1 3

  4. Inference in BN Outline 1. Inference in BN 4

  5. Inference in BN Inference tasks Simple queries: compute posterior marginal Pr ( X i | E = e ) e.g., P ( NoGas | Gauge = empty , Lights = on , Starts = false ) Conjunctive queries: Pr ( X i , X j | E = e ) = Pr ( X i | E = e ) Pr ( X j | X i , E = e ) Optimal decisions: decision networks include utility information; probabilistic inference required for P ( outcome | action , evidence ) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? 5

  6. Inference in BN Inference by enumeration Sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: B E Pr ( B | j , m ) = Pr ( B , j , m ) / P ( j , m ) A = α Pr ( B , j , m ) = α � � a Pr ( B , e , a , j , m ) e J M Rewrite full joint entries using product of CPT entries: = α � � Pr ( B | j , m ) a Pr ( B ) P ( e ) Pr ( a | B , e ) P ( j | a ) P ( m | a ) e = α Pr ( B ) � e P ( e ) � a Pr ( a | B , e ) P ( j | a ) P ( m | a ) Recursive depth-first enumeration: O ( n ) space, O ( d n ) time 6

  7. Inference in BN Enumeration algorithm function Enumeration-Ask( X , e , bn ) returns a distribution over X inputs : X , the query variable e , observed values for variables E bn , a Bayesian network with variables { X } ∪ E ∪ Y Q ( X ) ← a distribution over X , initially empty for each value x i of X do Q ( x i ) ← Enumerate-All( bn .Vars, e ∪ { X = x i } ) return Normalize( Q ( X ) ) function Enumerate-All( vars , e ) returns a real number if Empty?( vars ) then return 1.0 Y ← First( vars ) if Y has value y in e then return P ( y | parent ( Y )) × Enumerate-All(Rest( vars ), e ) else return � y P ( y | parent ( Y )) × Enumerate-All(Rest( vars ), e ∪ { Y = y } ) 7

  8. Inference in BN Evaluation tree P(b) .001 P(e) P( e) .002 .998 P(a|b,e) P( a|b,e) P(a|b, e) P( a|b, e) .95 .05 .94 .06 P(j|a) P(j| a) P(j|a) P(j| a) .90 .05 .90 .05 P(m|a) P(m| a) P(m|a) P(m| a) .70 .01 .70 .01 Enumeration is inefficient: repeated computation e.g., computes P ( j | a ) P ( m | a ) for each value of e 8

  9. Inference in BN Inference by variable elimination Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation Pr ( B | j , m ) � � = α Pr ( B ) e P ( e ) a Pr ( a | B , e ) P ( j | a ) P ( m | a ) � �� � ���� � �� � � �� � � �� � B E A J M = α Pr ( B ) � e P ( e ) � a Pr ( a | B , e ) P ( j | a ) f M ( a ) = α Pr ( B ) � e P ( e ) � a Pr ( a | B , e ) f J ( a ) f M ( a ) = α Pr ( B ) � e P ( e ) � a f A ( a , b , e ) f J ( a ) f M ( a ) = α Pr ( B ) � e P ( e ) f ¯ AJM ( b , e ) (sum out A ) = α Pr ( B ) f ¯ AJM ( b ) (sum out E ) E ¯ = α f B ( b ) × f ¯ AJM ( b ) E ¯ 9

  10. Inference in BN Variable elimination: Basic operations Summing out a variable from a product of factors: 1. move any constant factors outside the summation: � � x f 1 × · · · × f k = f 1 × · · · × f i x f i + 1 × · · · × f k = f 1 × · · · × f i × f ¯ X assuming f 1 , . . . , f i do not depend on X 2. add up submatrices in pointwise product of remaining factors: Eg: pointwise product of f 1 and f 2 : f 1 ( x 1 , . . . , x j , y 1 , . . . , y k ) × f 2 ( y 1 , . . . , y k , z 1 , . . . , z l ) = f ( x 1 , . . . , x j , y 1 , . . . , y k , z 1 , . . . , z l ) E.g., f 1 ( a , b ) × f 2 ( b , c ) = f ( a , b , c ) 10

  11. Inference in BN Irrelevant variables B E Consider the query P ( JohnCalls | Burglary = true ) � � � P ( J | b ) = α P ( b ) P ( e ) P ( a | b , e ) P ( J | a ) P ( m | a ) A e a m Sum over m is identically 1; M is irrelevant to the J M query Theorem Y is irrelevant unless Y ∈ Ancestors ( { X } ∪ E ) Here, X = JohnCalls , E = { Burglary } , and Ancestors ( { X } ∪ E ) = { Alarm , Earthquake } so MaryCalls is irrelevant 12

  12. Inference in BN Irrelevant variables contd. Defn: moral graph of DAG Bayes net: marry all parents and drop arrows Defn: A is m-separated from B by C iff separated by C in the moral graph Theorem Y is irrelevant if m-separated from X by E For P ( JohnCalls | Alarm = true ) , both B E Burglary and Earthquake are irrelevant A J M 13

  13. Inference in BN Complexity of exact inference Singly connected networks (or polytrees): – any two nodes are connected by at most one (undirected) path – time and space cost (with variable elimination) are O ( d k n ) – hence time and space cost are linear in n and k bounded by a constant Multiply connected networks: – can reduce 3SAT to exact inference = ⇒ NP-hard – equivalent to counting 3SAT models = ⇒ #P-complete Proof of this in one of the exercises for Thursday. 14

  14. Inference in BN Inference by stochastic simulation Basic idea: 0.5 Draw N samples from a sampling distribution S Compute an approximate posterior probability ˆ P Show this converges to the true probability P Coin Outline: – Sampling from an empty network – Rejection sampling: reject samples disagreeing with evidence – Likelihood weighting: use evidence to weight samples – Markov chain Monte Carlo (MCMC): sample from a stochastic process whose stationary distribution is the true posterior 15

  15. Inference in BN Sampling from an empty network function Prior-Sample( bn ) returns an event sampled from bn inputs : bn , a belief network specifying joint distribution Pr ( X 1 , . . . , X n ) x ← an event with n elements for i = 1 to n do x i ← a random sample from Pr ( X i | parents ( X i )) given the values of Parents ( X i ) in x return x Ancestor sampling 16

  16. Inference in BN Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain T .10 Sprinkler T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 17

  17. Inference in BN Sampling from an empty network contd. Probability that PriorSample generates a particular event S PS ( x 1 . . . x n ) = P ( x 1 . . . x n ) i.e., the true prior probability E.g., S PS ( t , f , t , t ) = 0 . 5 × 0 . 9 × 0 . 8 × 0 . 9 = 0 . 324 = P ( t , f , t , t ) Proof: Let N PS ( x 1 . . . x n ) be the number of samples generated for event x 1 , . . . , x n . Then we have ˆ lim P ( x 1 , . . . , x n ) = N →∞ N PS ( x 1 , . . . , x n ) / N lim N →∞ = S PS ( x 1 , . . . , x n ) n � = P ( x i | parents ( X i )) = P ( x 1 . . . x n ) i = 1 � That is, estimates derived from PriorSample are consistent Shorthand: ˆ P ( x 1 , . . . , x n ) ≈ P ( x 1 . . . x n ) 18

  18. Inference in BN Rejection sampling ˆ Pr ( X | e ) estimated from samples agreeing with e function Rejection-Sampling( X , e , bn , N ) returns an estimate of P ( X | e ) local variables : N , a vector of counts over X , initially zero for j = 1 to N do x ← Prior-Sample( bn ) if x is consistent with e then N [ x ] ← N [ x ]+1 where x is the value of X in x return Normalize( N [ X ]) E.g., estimate Pr ( Rain | Sprinkler = true ) using 100 samples 27 samples have Sprinkler = true Of these, 8 have Rain = true and 19 have Rain = false . ˆ Pr ( Rain | Sprinkler = true ) = Normalize ( � 8 , 19 � ) = � 0 . 296 , 0 . 704 � Similar to a basic real-world empirical estimation procedure 19

  19. Inference in BN Analysis of rejection sampling Rejection sampling returns consistent posterior estimates Proof: ˆ Pr ( X | e ) = α N PS ( X , e ) (algorithm defn.) = N PS ( X , e ) / N PS ( e ) (normalized by N PS ( e ) ) ≈ Pr ( X , e ) / P ( e ) (property of PriorSample) = Pr ( X | e ) (defn. of conditional probability) Problem: hopelessly expensive if P ( e ) is small P ( e ) drops off exponentially with number of evidence variables! 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend