inference by enumeration
play

Inference by enumeration Slightly intelligent way to sum out - PDF document

Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: Inference in Bayesian networks B E P ( B | j, m ) = P ( B, j,


  1. Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: Inference in Bayesian networks B E P ( B | j, m ) = P ( B, j, m ) /P ( j, m ) A = α P ( B, j, m ) J M = α Σ e Σ a P ( B, e, a, j, m ) Chapter 14.4–5 Rewrite full joint entries using product of CPT entries: P ( B | j, m ) = α Σ e Σ a P ( B ) P ( e ) P ( a | B, e ) P ( j | a ) P ( m | a ) = α P ( B ) Σ e P ( e ) Σ a P ( a | B, e ) P ( j | a ) P ( m | a ) Recursive depth-first enumeration: O ( n ) space, O ( d n ) time Chapter 14.4–5 1 Chapter 14.4–5 4 Outline Enumeration algorithm ♦ Exact inference by enumeration function Enumeration-Ask ( X , e , bn ) returns a distribution over X inputs : X , the query variable ♦ Approximate inference by stochastic simulation e , observed values for variables E bn , a Bayesian network with variables { X } ∪ E ∪ Y Q ( X ) ← a distribution over X , initially empty for each value x i of X do extend e with value x i for X Q ( x i ) ← Enumerate-All ( Vars [ bn ], e ) return Normalize ( Q ( X ) ) function Enumerate-All ( vars , e ) returns a real number if Empty? ( vars ) then return 1.0 Y ← First ( vars ) if Y has value y in e then return P ( y | Pa ( Y )) × Enumerate-All ( Rest ( vars ), e ) else return y P ( y | Pa ( Y )) × Enumerate-All ( Rest ( vars ), e y ) � where e y is e extended with Y = y Chapter 14.4–5 2 Chapter 14.4–5 5 Inference tasks Complexity of exact inference Simple queries: compute posterior marginal P ( X i | E = e ) Multiply connected networks: e.g., P ( NoGas | Gauge = empty, Lights = on, Starts = false ) – can reduce 3SAT to exact inference ⇒ NP-hard – equivalent to counting 3SAT models ⇒ #P-complete Conjunctive queries: P ( X i , X j | E = e ) = P ( X i | E = e ) P ( X j | X i , E = e ) 0.5 0.5 0.5 0.5 Optimal decisions: decision networks include utility information; A B C D probabilistic inference required for P ( outcome | action, evidence ) L 1. A v B v C L Value of information: which evidence to seek next? 2. C v D v A 1 2 3 L Sensitivity analysis: which probability values are most critical? 3. B v C v D L Explanation: why do I need a new starter motor? AND Chapter 14.4–5 3 Chapter 14.4–5 6

  2. Inference by stochastic simulation Example P(C) Basic idea: .50 1) Draw N samples from a sampling distribution S 0.5 2) Compute an approximate posterior probability ˆ P Cloudy 3) Show this converges to the true probability P Coin Outline: C P(S|C) C P(R|C) – Sampling from an empty network Rain T .10 Sprinkler T .80 – Rejection sampling: reject samples disagreeing with evidence F .50 F .20 – Likelihood weighting: use evidence to weight samples Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 7 Chapter 14.4–5 10 Sampling from an empty network Example P(C) function Prior-Sample ( bn ) returns an event sampled from bn .50 inputs : bn , a belief network specifying joint distribution P ( X 1 , . . . , X n ) Cloudy x ← an event with n elements for i = 1 to n do x i ← a random sample from P ( X i | parents ( X i )) C P(S|C) C P(R|C) given the values of Parents ( X i ) in x Rain T .10 Sprinkler T .80 return x F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 8 Chapter 14.4–5 11 Example Example P(C) P(C) .50 .50 Cloudy Cloudy C P(S|C) C P(R|C) C P(S|C) C P(R|C) Rain Rain T .10 Sprinkler T .80 T .10 Sprinkler T .80 F .50 F .20 F .50 F .20 Wet Wet Grass Grass S R P(W|S,R) S R P(W|S,R) T T .99 T T .99 T F .90 T F .90 F T .90 F T .90 F F .01 F F .01 Chapter 14.4–5 9 Chapter 14.4–5 12

  3. Example Sampling from an empty network contd. P(C) Probability that PriorSample generates a particular event .50 S PS ( x 1 . . . x n ) = Π n i = 1 P ( x i | parents ( X i )) = P ( x 1 . . . x n ) i.e., the true prior probability Cloudy E.g., S PS ( t, f, t, t ) = 0 . 5 × 0 . 9 × 0 . 8 × 0 . 9 = 0 . 324 = P ( t, f, t, t ) C P(S|C) C P(R|C) Let N PS ( x 1 . . . x n ) be the number of samples generated for event x 1 , . . . , x n Rain T .10 Sprinkler T .80 Then we have F .50 F .20 ˆ Wet lim P ( x 1 , . . . , x n ) = N →∞ N PS ( x 1 , . . . , x n ) /N lim Grass N →∞ = S PS ( x 1 , . . . , x n ) S R P(W|S,R) = P ( x 1 . . . x n ) T T .99 That is, estimates derived from PriorSample are consistent T F .90 F T .90 Shorthand: ˆ P ( x 1 , . . . , x n ) ≈ P ( x 1 . . . x n ) F F .01 Chapter 14.4–5 13 Chapter 14.4–5 16 Example Rejection sampling P(C) ˆ P ( X | e ) estimated from samples agreeing with e .50 function Rejection-Sampling ( X , e , bn , N ) returns an estimate of P ( X | e ) Cloudy local variables : N , a vector of counts over X , initially zero for j = 1 to N do C P(S|C) C P(R|C) x ← Prior-Sample ( bn ) Rain T .10 Sprinkler T .80 if x is consistent with e then N [ x ] ← N [ x ]+1 where x is the value of X in x F .50 F .20 return Normalize ( N [ X ]) Wet Grass E.g., estimate P ( Rain | Sprinkler = true ) using 100 samples S R P(W|S,R) 27 samples have Sprinkler = true T T .99 Of these, 8 have Rain = true and 19 have Rain = false . T F .90 ˆ P ( Rain | Sprinkler = true ) = Normalize ( � 8 , 19 � ) = � 0 . 296 , 0 . 704 � F T .90 F F .01 Similar to a basic real-world empirical estimation procedure Chapter 14.4–5 14 Chapter 14.4–5 17 Example Analysis of rejection sampling P(C) ˆ P ( X | e ) = α N PS ( X, e ) (algorithm defn.) .50 = N PS ( X, e ) /N PS ( e ) (normalized by N PS ( e ) ) ≈ P ( X, e ) /P ( e ) (property of PriorSample ) Cloudy = P ( X | e ) (defn. of conditional probability) C P(S|C) C P(R|C) Hence rejection sampling returns consistent posterior estimates Rain T .10 Sprinkler T .80 Problem: hopelessly expensive if P ( e ) is small F .50 F .20 P ( e ) drops off exponentially with number of evidence variables! Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 Chapter 14.4–5 15 Chapter 14.4–5 18

  4. Likelihood weighting Likelihood weighting example P(C) Idea: fix evidence variables, sample only nonevidence variables, .50 and weight each sample by the likelihood it accords the evidence Cloudy function Likelihood-Weighting ( X , e , bn , N ) returns an estimate of P ( X | e ) local variables : W , a vector of weighted counts over X , initially zero C P(S|C) C P(R|C) Rain T .10 Sprinkler T .80 for j = 1 to N do F .50 F .20 x , w ← Weighted-Sample ( bn ) W [ x ] ← W [ x ] + w where x is the value of X in x Wet Grass return Normalize ( W [ X ] ) S R P(W|S,R) function Weighted-Sample ( bn , e ) returns an event and a weight T T .99 T F .90 x ← an event with n elements; w ← 1 F T .90 for i = 1 to n do F F .01 if X i has a value x i in e then w ← w × P ( X i = x i | parents ( X i )) else x i ← a random sample from P ( X i | parents ( X i )) w = 1 . 0 return x , w Chapter 14.4–5 19 Chapter 14.4–5 22 Likelihood weighting example Likelihood weighting example P(C) P(C) .50 .50 Cloudy Cloudy C P(S|C) C P(R|C) C P(S|C) C P(R|C) Rain Rain T .10 Sprinkler T .80 T .10 Sprinkler T .80 F .50 F .20 F .50 F .20 Wet Wet Grass Grass S R P(W|S,R) S R P(W|S,R) T T .99 T T .99 T F .90 T F .90 F T .90 F T .90 F F .01 F F .01 w = 1 . 0 w = 1 . 0 × 0 . 1 Chapter 14.4–5 20 Chapter 14.4–5 23 Likelihood weighting example Likelihood weighting example P(C) P(C) .50 .50 Cloudy Cloudy C P(S|C) C P(R|C) C P(S|C) C P(R|C) Rain Rain T .10 Sprinkler T .80 T .10 Sprinkler T .80 F .50 F .20 F .50 F .20 Wet Wet Grass Grass S R P(W|S,R) S R P(W|S,R) T T .99 T T .99 T F .90 T F .90 F T .90 F T .90 F F .01 F F .01 w = 1 . 0 w = 1 . 0 × 0 . 1 Chapter 14.4–5 21 Chapter 14.4–5 24

  5. Likelihood weighting example Summary P(C) Exact inference by enumeration: .50 – NP-hard on general graphs Cloudy Approximate inference by LW: – LW does poorly when there is lots of (downstream) evidence C P(S|C) C P(R|C) – LW, generally insensitive to topology Rain T .10 Sprinkler T .80 F .50 F .20 – Convergence can be very slow with probabilities close to 1 or 0 – Can handle arbitrary combinations of discrete and continuous variables Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 w = 1 . 0 × 0 . 1 Chapter 14.4–5 25 Chapter 14.4–5 28 Likelihood weighting example P(C) .50 Cloudy C P(S|C) C P(R|C) Rain T .10 Sprinkler T .80 F .50 F .20 Wet Grass S R P(W|S,R) T T .99 T F .90 F T .90 F F .01 w = 1 . 0 × 0 . 1 × 0 . 99 = 0 . 099 Chapter 14.4–5 26 Likelihood weighting analysis Sampling probability for WeightedSample is S WS ( z , e ) = Π l i = 1 P ( z i | parents ( Z i )) Note: pays attention to evidence in ancestors only Cloudy ⇒ somewhere “in between” prior and posterior distribution Sprinkler Rain Wet Weight for a given sample z , e is Grass w ( z , e ) = Π m i = 1 P ( e i | parents ( E i )) Weighted sampling probability is S WS ( z , e ) w ( z , e ) = Π l i = 1 P ( z i | parents ( Z i )) Π m i = 1 P ( e i | parents ( E i )) = P ( z , e ) (by standard global semantics of network) Hence likelihood weighting returns consistent estimates but performance still degrades with many evidence variables because a few samples have nearly all the total weight Chapter 14.4–5 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend