15 381 781
play

15-381/781 Bayesian Nets & Probabilistic Inference Emma - PowerPoint PPT Presentation

15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein (Berkeley), Percy Liang (Stanford) and Past 15-381 Instructors for some slide content, and Russell & Norvig What


  1. 15-381/781 Bayesian Nets & Probabilistic Inference Emma Brunskill (this time) Ariel Procaccia With thanks to Dan Klein (Berkeley), Percy Liang (Stanford) and Past 15-381 Instructors for some slide content, and Russell & Norvig

  2. What You Should Know • Define probabilistic inference • How to define a Bayes Net given a real example • How joint can be used to answer any query • Complexity of exact inference • Approximation inference (direct, likelihood, Gibbs) Be able to implement and run algorithm o Compare benefits and limitations of each o 2

  3. Bayesian Network • Compact representation of the joint distribution • Conditional independence relationships explicit Each var conditionally independent of all its non- o descendants in the graph given the value of its parents 3

  4. Joint Distribution Ex. +c +s +r +w .01 • Variables: Cloudy, Sprinkler, +c +s +r -w .01 Rain, Wet Grass +c +s -r +w .05 +c +s -r -w .1 • Domain of each variable: 2 +c -s +r +w # +c -s +r -w # (true or false) +c -s -r +w # • Joint encodes probability of all +c -s -r -w # -c +s +r +w # combos of variables & values -c +s +r -w # -c +s -r +w # -c +s -r -w # P(Cloudy=false & Sprinkler = true -c -s +r +w # & Rain = false & WetGrass = True) -c -s +r -w # -c -s -r +w # -c -s -r -w # 4

  5. Joint as Product of Conditionals (Chain rule) +c +s +r +w .01 P(WetGrass|Cloudy,Sprinkler,Rain)* +c +s +r -w .01 P(Rain|Cloudy,Sprinkler)* +c +s -r +w .05 +c +s -r -w .1 P(Sprinkler|Cloudy)* +c -s +r +w # P(Cloudy) +c -s +r -w # +c -s -r +w # = +c -s -r -w # -c +s +r +w # -c +s +r -w # -c +s -r +w # -c +s -r -w # -c -s +r +w # -c -s +r -w # -c -s -r +w # -c -s -r -w # 5

  6. Joint as Product of Conditionals +c +s +r +w .01 P(WetGrass|Cloudy,Sprinkler,Rain)* +c +s +r -w .01 P(Rain|Cloudy,Sprinkler)* +c +s -r +w .05 P(Sprinkler|Cloudy)* +c +s -r -w .1 +c -s +r +w # P(Cloudy) +c -s +r -w # +c -s -r +w # = Cloudy +c -s -r -w # -c +s +r +w # -c +s +r -w # -c +s -r +w # Rain Sprinkler -c +s -r -w # -c -s +r +w # -c -s +r -w # -c -s -r +w # Wet -c -s -r -w # Grass … but there may be additional conditional independencies 6

  7. What if some variables are conditionally indep? Cloudy Cloudy Rain Sprinkler Rain Sprinkler Wet Wet Grass Grass Explicitly shows any conditional independencies 7

  8. Conditional Independencies +c ¡ 0.5 ¡ -­‑c ¡ 0.5 ¡ +c +s +r +w .01 +c +s +r -w .01 Cloudy +c +s -r +w .05 +c +s -r -w .1 +c -s +r +w # +c ¡ +s ¡ .1 ¡ +c ¡ +r ¡ .8 ¡ +c -s +r -w # +c ¡ -­‑s ¡ .9 ¡ Rain +c ¡ -­‑r ¡ .2 ¡ Sprinkler +c -s -r +w # à -­‑c ¡ +s ¡ .5 ¡ -­‑c ¡ +r ¡ .2 ¡ +c -s -r -w # -­‑c ¡ -­‑s ¡ .5 ¡ -­‑c ¡ -­‑r ¡ .8 ¡ -c +s +r +w # Wet -c +s +r -w # Grass -c +s -r +w # +s ¡+r ¡+w ¡ .99 ¡ -c +s -r -w # +s ¡+r ¡ -­‑w ¡ .01 ¡ -c -s +r +w # +s ¡-­‑r ¡ +w ¡ .90 ¡ -c -s +r -w # +s ¡-­‑r ¡ -­‑w ¡ .10 ¡ -c -s -r +w # -­‑s ¡+r ¡+w ¡ .90 ¡ -c -s -r -w # -­‑s ¡+r ¡ -­‑w ¡ .10 ¡ -­‑s ¡ -­‑r ¡ +w ¡ 0 ¡ -­‑s ¡ -­‑r ¡ -­‑w ¡ 1.0 ¡ 8

  9. Bayesian Network • Compact representation of the joint distribution • Conditional independence relationships explicit • Still represents joint so can be used to answer any probabilistic query 9

  10. Probabilistic Inference • Compute probability of a query variable (or variables) taking on a value (or set of values) given some evidence • Pr[Q | E 1 =e 1 ,...,E k =e k ] 10

  11. Using the Joint To Answer Queries • Joint distribution is sufficient to answer any probabilistic inference question involving variables described in joint • Can take Bayes Net, construct full joint, and then look up entries where evidence variables take on specified values 11

  12. But Constructing Joint Expensive & Exact Inference is NP-Hard 12

  13. Soln: Approximate Inference • Use samples to approximate posterior distribution Pr[Q | E 1 =e 1 ,...,E k =e k ] • Last time Direct sampling o Likelihood weighting o • Today Gibbs o 13

  14. Poll: Which Algorithm? • Evidence: Cloudy=+c, Rain=+r • Query variable: Sprinkler • P(Sprinkler|Cloudy=+c,Rain=+r) Cloudy • Samples +c,+s,+r,-w o +c,-s,-r,-w Rain Sprinkler o +c,+s,-r,+w o +c,-s,+r,-w o Wet Grass • What algorithm could’ve generated these samples? 1) Direct sampling 2) Likelihood weighting 3) Both 4) No clue 14

  15. Direct Sampling Recap Algorithm: 1. Create a topological order of the variables in the Bayes Net 15

  16. Topological Order • Any ordering in directed acyclic graph where a node can only appear after all of its ancestors in the graph Cloudy • E.g. Cloudy, Sprinkler, Rain, WetGrass Rain o Sprinkler Cloudy, Rain, Sprinkler, WetGrass o Wet Grass 16

  17. Direct Sampling Recap Algorithm: 1. Create a topological order of the variables in the Bayes Net 2. Sample each variable conditioned on the values of its parents 3. Use samples which match evidence variable values to estimate probability of query variable e.g. P(Sprinkler=+s|Cloudy=+c,Rain=+r) ~ # samples with +s,+c, +r / # samples with +c, +r • Consistent in limit of infinite samples • Inefficient (why?) 17

  18. Consistency • In the limit of infinite samples, estimated Pr[Q | E 1 =e 1 ,...,E k =e k ] will converge to true posterior probability • Desirable property (otherwise always have some error) 18

  19. Likelihood Weighting Recap 1. Create array TotalWeights Initialize value of each array element to 0 1. 2. For j=1:N w tmp = 1 1. Set evidence variables in sample z =<z 1 , … z n > to observed values 2. For each variable z i in topological order 3. 1. If x i is an evidence variable 1. w tmp = w tmp *P(Z i = e i |Parents(Z) = x (Parents(Z i ))) 2. Else 1. Sample x i conditioned on the values of its parents Update weight of resulting sample 4. 1. TotalWeights[ z ]=TotalWeights[ z ]+w tmp 3. Use weights to compute probability of query variable P(Sprinkler=+s|Cloudy=+c,Rain=+r) ~ Sum c,r,w TotalWeight(+s,c,r,w)/Sum s,c,r,w TotalWeight(s,c,r,w) 19

  20. LW Consistency • Probability of getting a sample ( z,e ) where z is a set of values for the non-evidence variables and e is the vals of evidence vars Sampling distribution for a weighted sample (WS) • Is this the true posterior distribution P(z|e)? No, why? o Doesn’t consider evidence that is not an ancestor … o Weights fix this! o 20

  21. Weighted Probability • Samples each non-evidence variable z according to • Weight of a sample is • Weighted probability of a sample is From chain rule & conditional indep 21

  22. Does Likelihood Weighting Produce Consistent Estimates? Yes X is query var(s) E is evidence var(s) P ( X = x | e ) = P ( X = x , e ) ∝ P ( X = x , e ) Y is non-query vars P ( e ) P ( X = x | e ) ∝   ∑ P ( X = x , e ) = N WS ( x , y , e ) w ( x , y , e ) # of samples where query y variables=x, non-query=y, Evidence=e ∑ n * S WS ( x , y , e ) w ( x , y , e ) ≈ y as # samples n à infinity ∑ P ( x , y , e ) = y = P ( x , e ) 22

  23. Example • When sampling S and R the evidence W=t is ignored Samples with S=f and R=f although evidence rules o this out • Weight makes up for this difference above weight would be 0 o • If we have 100 samples with R=t and total weight 1 , and 400 samples with R=f and total weight 2 , what is estimate of R=t? = 1/ 3 o 23

  24. Limitations of Likelihood Weighting • Poor performance if evidence vars occur later in ordering • Why? • Not being used to influence samples! • Yields samples with low weights 24

  25. Markov Chain Monte Carlo Methods • Prior methods generate each new sample from scratch • MCMC generate each new sample by making a random change to preceding sample • Can view algorithm as being in a particular state (assignment of values to each variable) 25

  26. Review: Markov Blanket • Markov blanket Parents o Children o Children’s o parents • Variable conditionally independent of all other nodes given its Markov Blanket 26

  27. Gibbs Sampling: Compute P(X| e ) mb(Z i ) = Markov Blanket of Z i from Russell & Norvig 27

  28. Gibbs Sampling Example • Want Pr(R|S=t,W=t) +c ¡ 0.5 ¡ • Non-evidence variables are C & R -­‑c ¡ 0.5 ¡ • Initialize randomly: C= t and R=f Cloudy • Initial state (C,S,R,W)= [t,t,f,t] • Sample C given current values of +c ¡ +s ¡ .1 ¡ +c ¡ +r ¡ .8 ¡ +c ¡ -­‑s ¡ .9 ¡ +c ¡ -­‑r ¡ .2 ¡ its Markov Blanket Rain Sprinkler -­‑c ¡ +s ¡ .5 ¡ -­‑c ¡ +r ¡ .2 ¡ -­‑c ¡ -­‑s ¡ .5 ¡ -­‑c ¡ -­‑r ¡ .8 ¡ Wet Grass +s ¡+r ¡+w ¡ .99 ¡ +s ¡+r ¡ -­‑w ¡ .01 ¡ +s ¡-­‑r ¡ +w ¡ .90 ¡ +s ¡-­‑r ¡ -­‑w ¡ .10 ¡ -­‑s ¡+r ¡+w ¡ .90 ¡ -­‑s ¡+r ¡ -­‑w ¡ .10 ¡ -­‑s ¡ -­‑r ¡ +w ¡ 0 ¡ -­‑s ¡ -­‑r ¡ -­‑w ¡ 1.0 ¡ 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend