inference in bayesian networks
play

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 - PowerPoint PPT Presentation

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference by Markov


  1. Inference in Bayesian networks Chapter 14.4–5 Chapter 14.4–5 1

  2. Outline ♦ Exact inference by enumeration ♦ Exact inference by variable elimination ♦ Approximate inference by stochastic simulation ♦ Approximate inference by Markov chain Monte Carlo Chapter 14.4–5 2

  3. Inference tasks Simple queries: compute posterior marginal P ( X i | E = e ) e.g., P ( NoGas | Gauge = empty, Lights = on, Starts = false ) Conjunctive queries: P ( X i , X j | E = e ) = P ( X i | E = e ) P ( X j | X i , E = e ) Optimal decisions: decision networks include utility information; probabilistic inference required for P ( outcome | action, evidence ) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? Chapter 14.4–5 3

  4. Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: B E P ( B | j, m ) = P ( B, j, m ) /P ( j, m ) A = α P ( B, j, m ) J M = α Σ e Σ a P ( B, e, a, j, m ) Rewrite full joint entries using product of CPT entries: P ( B | j, m ) = α Σ e Σ a P ( B ) P ( e ) P ( a | B, e ) P ( j | a ) P ( m | a ) = α P ( B ) Σ e P ( e ) Σ a P ( a | B, e ) P ( j | a ) P ( m | a ) Recursive depth-first enumeration: O ( n ) space, O ( d n ) time Chapter 14.4–5 4

  5. Enumeration algorithm function Enumeration-Ask ( X , e , bn ) returns a distribution over X inputs : X , the query variable e , observed values for variables E bn , a Bayesian network with variables { X } ∪ E ∪ Y Q ( X ) ← a distribution over X , initially empty for each value x i of X do extend e with value x i for X Q ( x i ) ← Enumerate-All ( Vars [ bn ], e ) return Normalize ( Q ( X ) ) function Enumerate-All ( vars , e ) returns a real number if Empty? ( vars ) then return 1.0 Y ← First ( vars ) if Y has value y in e then return P ( y | Pa ( Y )) × Enumerate-All ( Rest ( vars ), e ) � y P ( y | Pa ( Y )) × Enumerate-All ( Rest ( vars ), e y ) else return where e y is e extended with Y = y Chapter 14.4–5 5

  6. Evaluation tree P(b) .001 P(e) P( e) .002 .998 P( a|b, e) P(a|b,e) P( a|b,e) P(a|b, e) .95 .05 .94 .06 P(j|a) P(j| a) P(j|a) P(j| a) .90 .05 .90 .05 P(m|a) P(m| a) P(m|a) P(m| a) .70 .01 .70 .01 Enumeration is inefficient: repeated computation e.g., computes P ( j | a ) P ( m | a ) for each value of e Chapter 14.4–5 6

  7. Inference by variable elimination Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation P ( B | j, m ) Σ e P ( e ) Σ a P ( a | B, e ) = α P ( B ) P ( j | a ) P ( m | a ) � �� � � �� � � �� � � �� � � �� � B E A J M = α P ( B ) Σ e P ( e ) Σ a P ( a | B, e ) P ( j | a ) f M ( a ) = α P ( B ) Σ e P ( e ) Σ a P ( a | B, e ) f J ( a ) f M ( a ) = α P ( B ) Σ e P ( e ) Σ a f A ( a, b, e ) f J ( a ) f M ( a ) = α P ( B ) Σ e P ( e ) f ¯ AJM ( b, e ) (sum out A ) = α P ( B ) f ¯ AJM ( b ) (sum out E ) E ¯ = αf B ( b ) × f ¯ AJM ( b ) E ¯ Chapter 14.4–5 7

  8. Variable elimination: Basic operations Summing out a variable from a product of factors: move any constant factors outside the summation add up submatrices in pointwise product of remaining factors Σ x f 1 × · · · × f k = f 1 × · · · × f i Σ x f i +1 × · · · × f k = f 1 × · · · × f i × f ¯ X assuming f 1 , . . . , f i do not depend on X Pointwise product of factors f 1 and f 2 : f 1 ( x 1 , . . . , x j , y 1 , . . . , y k ) × f 2 ( y 1 , . . . , y k , z 1 , . . . , z l ) = f ( x 1 , . . . , x j , y 1 , . . . , y k , z 1 , . . . , z l ) E.g., f 1 ( a, b ) × f 2 ( b, c ) = f ( a, b, c ) Chapter 14.4–5 8

  9. Variable elimination algorithm function Elimination-Ask ( X , e , bn ) returns a distribution over X inputs : X , the query variable e , evidence specified as an event bn , a belief network specifying joint distribution P ( X 1 , . . . , X n ) factors ← [ ] ; vars ← Reverse ( Vars [ bn ]) for each var in vars do factors ← [ Make-Factor ( var , e ) | factors ] if var is a hidden variable then factors ← Sum-Out ( var , factors ) return Normalize ( Pointwise-Product ( factors )) Chapter 14.4–5 9

  10. Irrelevant variables Consider the query P ( JohnCalls | Burglary = true ) B E A P ( J | b ) = αP ( b ) e P ( e ) a P ( a | b, e ) P ( J | a ) m P ( m | a ) � � � J M Sum over m is identically 1; M is irrelevant to the query Thm 1: Y is irrelevant unless Y ∈ Ancestors ( { X } ∪ E ) Here, X = JohnCalls , E = { Burglary } , and Ancestors ( { X } ∪ E ) = { Alarm, Earthquake } so MaryCalls is irrelevant (Compare this to backward chaining from the query in Horn clause KBs) Chapter 14.4–5 10

  11. Irrelevant variables contd. Defn: moral graph of Bayes net: marry all parents and drop arrows Defn: A is m-separated from B by C iff separated by C in the moral graph Thm 2: Y is irrelevant if m-separated from X by E B E A For P ( JohnCalls | Alarm = true ) , both Burglary and Earthquake are irrelevant J M Chapter 14.4–5 11

  12. Complexity of exact inference Singly connected networks (or polytrees): – any two nodes are connected by at most one (undirected) path – time and space cost of variable elimination are O ( d k n ) Multiply connected networks: – can reduce 3SAT to exact inference ⇒ NP-hard – equivalent to counting 3SAT models ⇒ #P-complete 0.5 0.5 0.5 0.5 A B C D L L 1. A v B v C 1 2 3 2. C v D v A L 3. B v C v D L AND Chapter 14.4–5 12

  13. Inference by stochastic simulation Basic idea: 1) Draw N samples from a sampling distribution S 0.5 2) Compute an approximate posterior probability ˆ P 3) Show this converges to the true probability P Coin Outline: – Sampling from an empty network – Rejection sampling: reject samples disagreeing with evidence – Likelihood weighting: use evidence to weight samples – Markov chain Monte Carlo (MCMC): sample from a stochastic process whose stationary distribution is the true posterior Chapter 14.4–5 13

  14. Sampling from an empty network function Prior-Sample ( bn ) returns an event sampled from bn inputs : bn , a belief network specifying joint distribution P ( X 1 , . . . , X n ) x ← an event with n elements for i = 1 to n do x i ← a random sample from P ( X i | parents ( X i )) given the values of Parents ( X i ) in x return x Chapter 14.4–5 14

  15. Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain Sprinkler T .10 T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 15

  16. Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain Sprinkler T .10 T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 16

  17. Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain Sprinkler T .10 T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 17

  18. Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain Sprinkler T .10 T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 18

  19. Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain Sprinkler T .10 T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 19

  20. Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain Sprinkler T .10 T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 20

  21. Example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain Sprinkler T .10 T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 21

  22. Sampling from an empty network contd. Probability that PriorSample generates a particular event S PS ( x 1 . . . x n ) = Π n i = 1 P ( x i | parents ( X i )) = P ( x 1 . . . x n ) i.e., the true prior probability E.g., S PS ( t, f, t, t ) = 0 . 5 × 0 . 9 × 0 . 8 × 0 . 9 = 0 . 324 = P ( t, f, t, t ) Let N PS ( x 1 . . . x n ) be the number of samples generated for event x 1 , . . . , x n Then we have ˆ lim P ( x 1 , . . . , x n ) = N →∞ N PS ( x 1 , . . . , x n ) /N lim N →∞ = S PS ( x 1 , . . . , x n ) = P ( x 1 . . . x n ) That is, estimates derived from PriorSample are consistent Shorthand: ˆ P ( x 1 , . . . , x n ) ≈ P ( x 1 . . . x n ) Chapter 14.4–5 22

  23. Rejection sampling ˆ P ( X | e ) estimated from samples agreeing with e function Rejection-Sampling ( X , e , bn , N ) returns an estimate of P ( X | e ) local variables : N , a vector of counts over X , initially zero for j = 1 to N do x ← Prior-Sample ( bn ) if x is consistent with e then N [ x ] ← N [ x ]+1 where x is the value of X in x return Normalize ( N [ X ]) E.g., estimate P ( Rain | Sprinkler = true ) using 100 samples 27 samples have Sprinkler = true Of these, 8 have Rain = true and 19 have Rain = false . ˆ P ( Rain | Sprinkler = true ) = Normalize ( � 8 , 19 � ) = � 0 . 296 , 0 . 704 � Similar to a basic real-world empirical estimation procedure Chapter 14.4–5 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend