bayesian belief network
play

Bayesian Belief Network 14.4 Inference Decision Theoretic Agents - PDF document

RN, Chapter Bayesian Belief Network 14.4 Inference Decision Theoretic Agents Introduction to Probability [Ch13] Belief networks [Ch14] Introduction [Ch14.1-14.2] Bayesian Net Inference [Ch14.4] (Bucket Elimination) Dynamic


  1. RN, Chapter Bayesian Belief Network 14.4 Inference

  2. Decision Theoretic Agents � Introduction to Probability [Ch13] � Belief networks [Ch14] � Introduction [Ch14.1-14.2] � Bayesian Net Inference [Ch14.4] (Bucket Elimination) � Dynamic Belief Networks [Ch15] � Single Decision [Ch16] � Sequential Decisions [Ch17] Game Theory [Ch17.6 – 17.7] � 2

  3. Types of Reasoning � Typical case: P( QueryVar | EvidenceVars = vals ) � Eg: P( + Burglary | + JohnCalls, ¬ MaryCalls ) � Diagnostic : from effect to (possible) causes P( + Burglary | + JohnCalls ) = 0.016 � � Causal : from cause to effects P( + JohnCalls | + Burglary ) = 0.86 � � I nterCausal : between causes of common effect P( + Burglary | + Alarm ) = 0.376 � P(+ Burglary | + Alarm, + Earthquake ) = 0.003 � Earthquake EXPLAINS alarms, and so Earthquake EXPLAI NS AWAY burglary � Mixed : combinations of . . . P( Alarm | JohnCall, ¬ Earthquake ) = 0.03 3 �

  4. Approaches to Belief Assessment � Exact, Guaranteed � PolyTree Algorithm � Inherent complexity. . . � Clustering Approach � Bucket Elimination � CutSet Approach Approximate, Guaranteed � � Algorithm Modification � Value Merging � Node Merging � Arc Removal Approximate, Probabilistic � � Logic Sampling � Likelihood Sampling 5

  5. Inherent Complexity 1. A v B v C � Worst case: 2. C v D v ~ A 3. B v C v ~ D � NP-hard to get exact answer (# P-complete) � NP-hard to get answer within 0.5 � Cannot get relative error within 2 n1- ε unless P = NP � Cannot stochastically approximate 1-bit, unless P= RP � Efficient algorithm . . . � for “PolyTree”: Poly time � ≤ 1 path between any two nodes � if CPtable “bounded” (sub-exp time) wrt λ = M/m M = largest CPtable entry; m = smallest 11

  6. Exact Inference: Re-arrange Sums ) b = B , a = A ∑ ( P = b ) a = A ( P P(+ b, + j, + m ) = ∑ e ∑ a P(+ b, E= e, A= a, + j, + m) = ∑ e ∑ a P(+ b) P(e) P(a|+ b,e) P(+ j|a) P(+ m|a) = P(+ b) ∑ e P(e) ∑ a P(a|+ b,e) P(+ j|a) P(+ m|a) 15

  7. Still Duplicated Computation! P( + b, + j, + m ) = P(+ b ) ∑ e P( e ) ∑ a P( a | + b, e ) P(+ j | a ) P(+ m | a ) � Enumeration is inefficient: ... as repeated computation Computes P(+ j | a )P(+ m | a ) for each value of E: { + e, – e } � Better to have DAG… re-use COMMON SUBEXPRESSION ! 16

  8. Bucket-Elimination : Set-up θ A= 1 θ A= 0 A 0.4 0.6 θ B= 1|A= a θ B= 0|A= a a C 1 0.325 0.675 B 0 0.440 0.550 θ C= 1|A= a θ C= 0|A= a a � Given 1 0.200 0.800 0 0.367 0.633 D � specific structure θ D= 1|B= b,C= θ D= 0|B= b,C= b c c c � specific CPtable entries 1 1 0.300 0.700 1 0 0.333 0.667 0 1 0.250 0.750 � Fixed ordering over variables: 0 0 0.450 0.550 π 0 = 〈 A,B,C,D 〉 � Create |Vars|+ 1 buckets � b { } , b A , b B , b C , b D 24

  9. b f(b) e f(e) (b) = λ 〈 b 〉 . f B 0 0.999 (e) = λ 〈 e 〉 . f E 0 0.998 1 0.001 1 0.002 a e b f(a, e, b) 1 1 1 0.95 (a,e,b) = λ 〈 A,E,B 〉 . f A 1 1 0 0.29 : : : : 0 0 1 0.06 0 0 0 0.999 j a f(j,a) m a f(m,a) 1 1 0.90 1 1 0.70 (j,a) = λ 〈 J, A 〉 . f J 1 0 0.05 (m,a) = λ 〈 M, A 〉 . f M 1 0 0.01 0 1 0.10 0 1 0.30 0 0 0.95 0 0 0.99 –b, + j, + m 25

  10. b f(b) e f(e) () = λ 〈〉 . f -b 0 0.999 (e) = λ 〈 e 〉 . f E 0 0.998 1 0.001 1 0.002 a e b f(a, e, b) 1 1 1 0.95 (a,e) = λ 〈 A,E 〉 . f A,-b 1 1 0 0.29 : : : : 0 0 1 0.06 0 0 0 0.999 j a f(j,a) m a f(m,a) 1 1 0.90 1 1 0.70 (a) = λ 〈 A 〉 . f + j 1 0 0. 05 (a) = λ 〈 A 〉 . f + m 1 0 0. 01 0 1 0.10 0 1 0.30 0 0 0.95 0 0 0.99 –b, + j, + m 26

  11. b f(-b) e f(e) () = λ 〈〉 . f -b 0 0.999 (e) = λ 〈 e 〉 . f E 0 0.998 1 0.002 a e f(a, e, -b) (a,e) = λ 〈 A,E 〉 . f A,-b 1 1 0.29 : : : 0 0 0.999 a f(+ j,a) a f(+ m,a) 1 0.90 1 0.70 (a) = λ 〈 A 〉 . f + j 0 0.05 (a) = λ 〈 A 〉 . f + m 0 0.01 b { } b nil b B b B b E b E b A b A b J b J b M b M () = θ -b (e) = θ e (a,e) = θ a| -b,e f { } ,1 f E,1 f A,1 - - - (a) = θ + j|a f A,2 (a) = θ + m|a f A,3 27

  12. “Variable Elimination”: Factors P( -b, + j, + m ) = P(-b ) ∑ e P( e ) ∑ a P( a | -b, e ) P(+ j | a ) P(+ m | a ) B E A J M � Store intermediate results (factors) to avoid recomputation � Factor for M: 2-element vector � Factor for J: � Factor for A: ≡ 4-element vector 28

  13. BE Alg, con’t Process buckets, from highest to lowest � g X := elim X [ f X,1 ⋈ f X,2 ⋈ … ⋈ f X,k ] � g x is function of ∪ i Vars( f X,I ) – { X} � Process b A � Let highest index by “Y” g A (e) = elim A [ f A,1 ⋈ f A,2 ⋈ f A,3 ] Store g X into b Y � add to b E … � b { } b B b E b A b J b M () = θ -b (e) = θ e (a,e) = θ a|-b,e f { } ,1 f E,1 f A,1 - - - (a) = θ + j|a f A,2 (a) = θ + m|a f A,3 [ f A,1 ⋈ f A,2 ⋈ f E,2 (e) = elim A f A,3 ] 30

  14. BE Alg, con’t Process buckets, from highest to lowest � g X := elim X [ f X,1 ⋈ f X,2 ⋈ … ⋈ f X,k ] � g x is function of ∪ i Vars( f X,I ) – { X} � Process b E � Let highest index by “Y” g E () = elim E [ f E,1 ⋈ f E,2 ] Store g X into b Y � add to b nill … � b nil b B b E b A b J b M () = θ -b (e) = θ e (a,e) = θ a|-b,e f { } ,1 f E,1 f A,1 - - - (a) = θ + j|a f E,2 (e) = … f A,2 (a) = θ + m|a f A,3 [ f E,1 ⋈ f { } ,2 () = elim E f E,2 ] 33

  15. BE Alg, con’t Process buckets, from highest to lowest � g X := elim X [ f X,1 ⋈ f X,2 ⋈ … ⋈ f X,k ] � g x is function of ∪ i Vars( f X,I ) – { X} � Process b { } � Let highest index by “Y” g { } () = [ f { } ,1 ⋈ f { } ,2 ] Store g X into b Y � Return g { } l … � b { } b B b E b A b J b M () = θ -b (e) = θ e (a,e) = θ a|-b,e f { } ,1 f E,1 f A,1 - - - (a) = θ + j|a f { } ,2 () = … f E,2 (e) = … f A,2 (a) = θ + m|a f A,3 Return f { } ,1 ⋈ f { } ,2 34

  16. Bucket Elimination Algorithm Given : � Belief Net BN = 〈 N, A, C 〉 � Order of nodes π = 〈 X 1 , … , X |N| 〉 � Evidence (nodes { E i } ⊂ N, values { e i } ) � (Single) Query node X ∈ N Compute: P(X | E 1 = e 1 , … ) by computing , … ) ∀ P(X = x, E 1 = e 1 x � Step# 1: Initialize |N| + 1 “buckets” � . . . bucket b i for variable X i � Each “instantiated form of CPtables" is function of variables � Store in bucket with highest index � Step# 2: Process each bucket � . . . from highest index down � to eliminate associated variable � Step# 3: Read off answer 35 � . . . in “top” bucket, b { }

  17. Remove “Dead Variables” P(+ b, + j ) = = ∑ e ∑ a ∑ m P(+ b, E= e, A= a, + j, M= m) = ∑ e ∑ a ∑ m P(+ b) P(E= e) P(a|+ b,e) P(+ j|a) P(m|a) = P(+ b) ∑ e P(e) ∑ a P(a|+ b,e) P(+ j|a) ∑ m P(m|a) � Note for any A= a, ∑ m P( M= m | a ) = 1 ⇒ can remove this node! � In general: need to keep only nodes ABOVE query, evidence notes (Remove any nodes below) 36

  18. Approaches to Belief Assessment � Exact, Guaranteed � PolyTree Algorithm � Inherent complexity. . . � Clustering Approach � Bucket Elimination � CutSet Approach Approximate, Guaranteed � � Algorithm Modification � Value Merging � Node Merging � Arc Removal Approximate, Probabilistic � � Logic Sampling � Likelihood Sampling 46

  19. Logic Sampling + What is P( WG = + ) ? + � Get DataSample � Of 5 tuples, 2 have WG = + Set P( WG= + ) = 2/5 � But … how to generate examples? A � Uniform?? No! � What is P(+ a, -b) ? a P(+ b|a) + 1.0 � Based on distribution!! B - 0.0 47

  20. Example of Logic Sampling To get value of “Cloudy”: Flip 0.5-coin � Assume “Cloudy = True” To get value of “Sprinkler”: Flip 0.1-coin � (as Cloudy = True, P( + s | + c ) = 0.10) Assume “Sprinkler = False” To get value of “Rain”: Flip 0.8-coin � (as Cloudy = True, P( + r | + c ) = 0.8) C C S S R R W W Assume “Rain = True” + T 0 F + T + T + + 0 + To get value of “WetGrass”: Flip 0.9-coin � 0 0 + 0 (as Sprinkler = F, Rain = T, P( + w | ¬ s, + r ) = 0.9) + + 0 + Assume “WetGrass = True” On other trials, get other results, as different results of coin-flips � 48

  21. Stochastic Approximation 1: Logic Sampling � To estimate P(X | E = e ) : � To produce random instance from BN: PriorSample � Note: if E ≠ e, just ignore instance 49

  22. Aside: Flipping A Coin � Consider flipping a (fair) coin m times. … expect to observe ≈ 0.5 m heads � Could have “bad run” ... suggesting coin is not fair. � How (un)likely to observe ≥ 55% heads? (10% more than expected) � ... as function of m : What's probability of � (1) m = 100, ≥ 55 heads � (2) m = 500, ≥ 275 heads � (3) m = 1000, ≥ 550 heads � (4) m = 10,000, ≥ 5,500 heads ? 50

  23. Using Chernoff Bounds X i 's are iid… for now, with μ = 0.5 � � Prob of S m > 0.55 is < e -2 m 0.05^ 2 m = 100 ⇒ < 0.6 m = 500 ⇒ < 0.08 m = 1,000 ⇒ < 0.007 m = 10,000 ⇒ < 10 -22 51

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend