inference in bayesian networks
play

Inference in Bayesian Networks CE417: Introduction to Artificial - PowerPoint PPT Presentation

Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley. Bayes Nets } Representation } Conditional


  1. Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley.

  2. Bayes’ Nets } Representation } Conditional Independences } Probabilistic Inference } Enumeration (exact, exponential complexity) } Variable elimination (exact, worst-case exponential complexity, often better) } Probabilistic inference is NP-complete } Sampling (approximate) } Learning Bayes’ Nets from Data 2

  3. Recap: Bayes ’ Net Representation } A directed, acyclic graph, one node per random variable } A conditional probability table (CPT) for each node A collection of distributions over X, one for each combination of } parents ’ values } Bayes ’ nets implicitly encode joint distributions As a product of local conditional distributions } To see what probability a BN gives to a full assignment, multiply } all the relevant conditionals together: 3

  4. Example: Alarm Network E P(E) B P(B) B urglary E arthqk +e 0.002 +b 0.001 -e 0.998 -b 0.999 A larm B E A P(A|B,E) +b +e +a 0.95 J ohn M ary +b +e -a 0.05 calls calls +b -e +a 0.94 A J P(J|A) A M P(M|A) +b -e -a 0.06 +a +j 0.9 +a +m 0.7 -b +e +a 0.29 +a -j 0.1 +a -m 0.3 -b +e -a 0.71 -a +j 0.05 -a +m 0.01 -b -e +a 0.001 -a -j 0.95 -a -m 0.99 -b -e -a 0.999 [Demo: BN Applet] 4

  5. Video of Demo BN Applet 5

  6. Example: Alarm Network B P(B) E P(E) B E +b 0.001 +e 0.002 -b 0.999 -e 0.998 A A J P(J|A) A M P(M|A) B E A P(A|B,E) +a +j 0.9 +a +m 0.7 +b +e +a 0.95 +a -j 0.1 +a -m 0.3 J M +b +e -a 0.05 -a +j 0.05 -a +m 0.01 +b -e +a 0.94 -a -j 0.95 -a -m 0.99 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 6

  7. Example: Alarm Network B P(B) E P(E) B E +b 0.001 +e 0.002 -b 0.999 -e 0.998 A A J P(J|A) A M P(M|A) B E A P(A|B,E) +a +j 0.9 +a +m 0.7 +b +e +a 0.95 +a -j 0.1 +a -m 0.3 J M +b +e -a 0.05 -a +j 0.05 -a +m 0.01 +b -e +a 0.94 -a -j 0.95 -a -m 0.99 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 7

  8. Bayes’ Nets Representation } Conditional Independences } Probabilistic Inference } Enumeration (exact, exponential complexity) } Variable elimination (exact, worst-case exponential } complexity, often better) Inference is NP-complete } Sampling (approximate) } Learning Bayes’ Nets from Data } 8

  9. Inference Examples: } Inference: calculating some § useful quantity from a joint § Posterior probability probability distribution § Most likely explanation: 9

  10. Inference by Enumeration General case: * Works fine with } § We want: multiple query Evidence variables: } variables, too Query* variable: } All variables Hidden variables: } Step 1: Select the Step 2: Sum out H to get joint § Step 3: Normalize § § entries consistent of Query and evidence with the evidence × 1 Z 10

  11. Inference by Enumeration in Bayes’ Net } Given unlimited time, inference in BNs is easy B E } Reminder of inference by enumeration by example: P ( B | + j, + m ) ∝ B P ( B, + j, + m ) A X = P ( B, e, a, + j, + m ) J M e,a X = P ( B ) P ( e ) P ( a | B, e ) P (+ j | a ) P (+ m | a ) e,a = P ( B ) P (+ e ) P (+ a | B, + e ) P (+ j | + a ) P (+ m | + a ) + P ( B ) P (+ e ) P ( − a | B, + e ) P (+ j | − a ) P (+ m | − a ) P ( B ) P ( − e ) P (+ a | B, − e ) P (+ j | + a ) P (+ m | + a ) + P ( B ) P ( − e ) P ( − a | B, − e ) P (+ j | − a ) P (+ m | − a ) 11

  12. Burglary example: full joint probability ∑ ∑ 𝑄 𝑐 𝑘, ¬𝑛 = 𝑄 𝑘, ¬𝑛, 𝑐 𝑄 𝑘, ¬𝑛, 𝑐, 𝐵, 𝐹 , + = 𝑄 𝑘, ¬𝑛 ∑ ∑ ∑ 𝑄 𝑘, ¬𝑛, 𝑐, 𝐵, 𝐹 - , + ∑ ∑ 𝑄 𝑘 𝐵 𝑄 ¬𝑛 𝐵 𝑄 𝐵 𝑐, 𝐹 𝑄 𝑐 𝑄(𝐹) + , = ∑ ∑ ∑ 𝑄 𝑘 𝐵 𝑄 ¬𝑛 𝐵 𝑄 𝐵 𝐶, 𝐹 𝑄 𝐶 𝑄(𝐹) - , + Short-hands 𝑘: 𝐾𝑝ℎ𝑜𝐷𝑏𝑚𝑚𝑡 = 𝑈𝑠𝑣𝑓 ¬𝑐: 𝐶𝑣𝑠𝑕𝑚𝑏𝑠𝑧 = 𝐺𝑏𝑚𝑡𝑓 … 12

  13. Inference by Enumeration? P ( Antilock | observed variables ) = ? 13

  14. Factor Zoo 14

  15. Factor Zoo I } Joint distribution: P(X,Y) T W P Entries P(x,y) for all x, y hot sun 0.4 } hot rain 0.1 Sums to 1 } cold sun 0.2 cold rain 0.3 } Selected joint: P(x,Y) A slice of the joint distribution } T W P Entries P(x,y) for fixed x, all y } cold sun 0.2 Sums to P(x) } cold rain 0.3 } Number of capitals = dimensionality of the table 15

  16. Factor Zoo II } Single conditional: P(Y | x) Entries P(y | x) for fixed x, all y } T W P Sums to 1 } cold sun 0.4 cold rain 0.6 } Family of conditionals: P(X |Y) Multiple conditionals } T W P Entries P(x | y) for all x, y } hot sun 0.8 Sums to |Y| } hot rain 0.2 cold sun 0.4 cold rain 0.6 16

  17. Factor Zoo III } Specified family: P( y | X ) Entries P(y | x) for fixed y, } but for all x Sums to … who knows! } T W P hot rain 0.2 cold rain 0.6 17

  18. Factor Zoo Summary § In general, when we write P(Y 1 … Y N | X 1 … X M ) § It is a “ factor, ” a multi-dimensional array § Its values are P(y 1 … y N | x 1 … x M ) § Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array 18

  19. Example: Traffic Domain } RandomVariables +r 0.1 R -r 0.9 } R: Raining } T:Traffic +r +t 0.8 T } L: Late for class! +r -t 0.2 -r +t 0.1 -r -t 0.9 P ( L ) = ? L X = P ( r, t, L ) +t +l 0.3 r,t +t -l 0.7 -t +l 0.1 X = P ( r ) P ( t | r ) P ( L | t ) -t -l 0.9 r,t 19

  20. Inference by Enumeration: Procedural Outline } Track objects called factors } Initial factors are local CPTs (one per node) +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 +t -l 0.7 -r +t 0.1 -t +l 0.1 -r -t 0.9 -t -l 0.9 } Any known values are selected } E.g. if we know , the initial factors are +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 -t +l 0.1 -r +t 0.1 -r -t 0.9 } Procedure: Join all factors, then eliminate all hidden variables 20

  21. Operation 1: Join Factors } First basic operation: joining factors } Combining factors: Just like a database join } Get all factors over the joining variable } Build a new factor over the union of the } variables involved } Example: Join on R Computation for each entry: pointwise } products R +r 0.1 +r +t 0.8 +r +t 0.08 R,T -r 0.9 +r -t 0.2 +r -t 0.02 -r +t 0.1 -r +t 0.09 T -r -t 0.9 -r -t 0.81 21

  22. Example: Multiple Joins 22

  23. Example: Multiple Joins +r 0.1 R -r 0.9 Join R Join T +r +t 0.08 R, T, L +r -t 0.02 -r +t 0.09 T +r +t 0.8 R, T -r -t 0.81 +r -t 0.2 -r +t 0.1 0.024 +r +t +l -r -t 0.9 0.056 +r +t -l L L 0.002 +r -t +l 0.018 +r -t -l +t +l 0.3 +t +l 0.3 0.027 -r +t +l +t -l 0.7 +t -l 0.7 0.063 -r +t -l -t +l 0.1 -t +l 0.1 0.081 -r -t +l -t -l 0.9 -t -l 0.9 0.729 -r -t -l 23

  24. Operation 2: Eliminate } Second basic operation: marginalization } Take a factor and sum out a variable Shrinks a factor to a smaller one } A projection operation } } Example: +r +t 0.08 +t 0.17 +r -t 0.02 -t 0.83 -r +t 0.09 -r -t 0.81 24

  25. Multiple Elimination R, T, L T, L L 0.024 +r +t +l Sum Sum 0.056 +r +t -l out R out T 0.002 +r -t +l 0.018 +r -t -l +t +l 0.051 +l 0.134 0.027 -r +t +l +t -l 0.119 -l 0.886 0.063 -r +t -l -t +l 0.083 0.081 -r -t +l -t -l 0.747 0.729 -r -t -l 25

  26. Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration) 26

  27. Inference by Enumeration vs. Variable Elimination } Why is inference by enumeration so slow? Idea: interleave joining and marginalizing! § Called “ Variable Elimination ” § You join up the whole joint distribution before } you sum out the hidden variables Still NP-hard, but usually much faster than § inference by enumeration First we’ll need some new notation: factors § 27

  28. Traffic Domain P ( L ) = ? R § Variable Elimination } Inference by Enumeration T X X = P ( L | t ) P ( r ) P ( t | r ) X X = P ( L | t ) P ( r ) P ( t | r ) L t r t r Join on r Join on r Join on t Eliminate r Eliminate r Join on t Eliminate t Eliminate t 28

  29. Marginalizing Early (= Variable Elimination) 29

  30. Marginalizing Early! (aka VE) Join R Sum out T Sum out R Join T +r +t 0.08 +r 0.1 +r -t 0.02 +t 0.17 -r 0.9 -r +t 0.09 -t 0.83 -r -t 0.81 R T T, L R, T L +r +t 0.8 +r -t 0.2 -r +t 0.1 T L -r -t 0.9 L +t +l 0.051 +l 0.134 +t -l 0.119 -l 0.866 -t +l 0.083 L +t +l 0.3 +t +l 0.3 -t -l 0.747 +t +l 0.3 +t -l 0.7 +t -l 0.7 +t -l 0.7 -t +l 0.1 -t +l 0.1 -t +l 0.1 -t -l 0.9 -t -l 0.9 -t -l 0.9 30

  31. Evidence } If evidence, start with factors that select that evidence } No evidence uses these initial factors: +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 +t -l 0.7 -r +t 0.1 -t +l 0.1 -r -t 0.9 -t -l 0.9 } Computing , the initial factors become: +r 0.1 +r +t 0.8 +t +l 0.3 +r -t 0.2 +t -l 0.7 -t +l 0.1 -t -l 0.9 } We eliminate all vars other than query + evidence 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend