cs 4100 artificial intelligence
play

CS 4100: Artificial Intelligence Bayes Nets: Sampling Jan-Willem - PDF document

CS 4100: Artificial Intelligence Bayes Nets: Sampling Jan-Willem van de Meent, Northeastern University [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at


  1. CS 4100: Artificial Intelligence Bayes’ Nets: Sampling Jan-Willem van de Meent, Northeastern University [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Bayes’ Net Representation • A A di directed, d, acyclic graph ph, o , one n node p per r random v variable • A A co conditional al probab ability tab able (CP CPT) ) for each node de • A collection of distributions over X , one for each possible assignment to parent variables • Ba Bayes ’ ne nets implicitly enc ncode jo join int dis istrib ributio ions • As a product of local conditional distributions • To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together:

  2. Variable Elimination • Interleave ve jo join inin ing and and marginalizi zing • d k entries s computed for a factor ove ver k variables va s with domain si size zes s d … • Or Orde dering of elimination of hidden va variables s … can can af affect ect si size ze of factors s ge generate ted • Worst st case se: running time exp xponential in the si in size ze of the Baye yes’ s’ net Approximate Inference: Sampling

  3. Sampling • Sampling is a lot like ke repeated simulation • Why Why sampl ple? • Predicting the weather, basketball games, … • Re Reinforcement Learning: Can approximate (q-)values even when you don’t know the transition function • Ba Basic idea • In Inference: : getting a sample is faster than Draw N samples from a sampling distribution S • Dr computing the right answer (e.g. with Compute an approximate posterior probability variable elimination) • Co • Show this co ges to the true probability P conver verges Sampling • Example u u = 0.83 • Sampling from give ven dist stribution C P(C) • St Step p 1: Get sample u from uniform distribution over [0 [0, 1) 0 0.6 0.7 1.0 red 0.6 • E.g. ra random() m() in python green 0.1 • St Step p 2: Convert this sample u into blue 0.3 an outcome for the given distribution by having each target outcome associated with a sub-interval of [0,1) with sub-interval size equal to [0 • If ra random() returns u = = 0.83 , probability of the outcome then our sample is C = = blue • E.g, after sampling 8 times:

  4. Sampling in Bayes’ Nets • Pr Prior ior Sa Samp mplin ling • Re Rejecti tion Sampling • Like kelihood Weighting • Gi Gibbs Sampling Prior Sampling

  5. Prior Sampling +c 0.5 -c 0.5 Cloudy Cloudy +c +s 0.1 +c +r 0.8 -s 0.9 -r 0.2 +s 0.5 +r 0.2 -c -c Sprinkler Sprinkler Rain Rain -s 0.5 -r 0.8 Samples: WetGrass WetGrass +r +w 0.99 +s +c, -s, +r, +w -w 0.01 +w 0.90 -r -c, +s, -r, +w -w 0.10 … +r +w 0.90 -s -w 0.10 +w 0.01 -r -w 0.99 Prior Sampling • For For i = 1 , 2 , …, n • Sa Sampl ple x i fr from P( P(X i | Parents( s(X i )) )) • Re Return (x (x 1 , x , x 2 , …, …, x n )

  6. Prior Sampling • This s process ss generates s sa samples s with pr proba babi bility ty: … i.e … .e. th . the B BN’s s joint probability • Let the number of sa samples s of an eve vent be • Then Then • i.e., the sa sampling procedure is s consi sist stent* (different from consi sist stent heurist stic, or arc consi sist stency) y) Example • We We’ll ll dr draw w a ba batch of sa samples f s from t the B BN: +c +c, -s, +r +r, +w +w C +c, +s +c +s, +r +r, +w +w S R -c, +s +s, +r +r, -w +c +c, -s, +r +r, +w +w W -c, c, -s, s, -r, +w +w • If we want to kn know P( P(W) Count outcomes <+ • Co <+w: 4 : 4, , -w: 1 : 1> lize to get P(W) = • No Normaliz = <+w <+w: 0 : 0.8 .8, , -w: 0 : 0.2 .2> • Estimate will get closer to the true distribution with more samples • Can estimate anything else, too • What about P(C | +w +w) ? P( P(C | +r, r, +w +w) ? P( P(C | -r, r, -w) ? • Fa Fast st: can use fewer samples if less time (what’s the drawback?)

  7. Rejection Sampling Rejection Sampling • Le Let’ t’s s sa say y we want P( P(C) • No point keeping all samples around C • Just tally counts of C as we go S R W • Le Let’ t’s s sa say y we want P( P(C | +s) idea: tally C outcomes, but ignore • Sa Same me id +c, -s, +r +c +r, +w +w (reject) samples which don’t have S= S=+s +c, +s +c +s, +r +r, +w +w • This is called re reje jectio ion samplin ling -c, +s +s, +r +r, -w +c +c, -s, +r +r, +w +w • It is also consistent for conditional probabilities -c, c, -s, s, -r, +w +w (i.e., correct in the limit of large N )

  8. Rejection Sampling • In Input: t: evidence assignments • For For i = 1, 2, …, …, n • Sa Sample mple x i fr from om P(X P(X i | Parents( s(X i )) )) • If If x i not consi sist stent with evi vidence • Re Reje ject: Return – no sample is generated in this cycle • Re Return (x (x 1 , x , x 2 , …, …, x n ) Likelihood Weighting

  9. <latexit sha1_base64="91+0FuZ923Dcgnlovm9ic1E65zk=">AHK3icfZVb9MwFMe9AWU28YeanokIaIqmQd3eBp2pjG2ViF6mpJidx21Dngu107Sx/Fl7hgU/DE4hXvgd2EiCJwxylsc75nb+P3WPbibFPmWl+X1q+cfNW4/bKnebde/cfPFxde3RKo4S46MSNcETOHUgR9kN0wnyG0XlMEAwcjM6c6YHyn80QoX4UvmeLGA0DOA79ke9CJk0Xq+v9zQOjdyA9rPadG6/nls4vVtkx09bSO1beae+BrPUv1hrA9iI3CVDIXAwpHVhmzIYcEua7GImnVAUQ3cKx2guyEMEB3yNHvRajafFvw8RJfxnKE5EzX2ALKJaBb1OAxoZq0YR1HIaNmKodSli6BsdYKy4gDN4io9L0PCWVONBcqRQ+N5BqnOXPwQkS/PhoX3DT6HUNa2tHVBmCvByxdk1DPhoxJgiFObO7bVi93RoTkiM0T/KVJzKuEi9hmTaz0E3WExlWp1ez5D/lWGmb7cr9Ij9dBYZb+XYdfyxmtEfeTNVz8Je1MAHCxiWxdVr/Yc+ypaikLt5rfpbAsMxKmZTmLBKXi4QbJm3CgIYOhxe4ZcMbCG3EYhTQhSNcNtJ8KeLAj54W1LCKEFZSEyNvVL0aIXzSAW3DZk0AgjV5UK37Axgh5l0YawX1UD5qI8/IzP0GLzEJjFhpzpTFXGjPRGBsxWDPHpAwmVaGPFSE2SXWqMriCYXkIeTUcq8pBbcRBYknvrb2kIwDqNYzihGBLCLqULn02QT7gc8oz/1Cj/LD6OkvzrYSUh9es4/FBopOvgtGRkr7zN0vIpo8TULXDasgx0chsw9Swsc7mJ0MN7C40ON23NWik6+abUMHpnfFStd7fG0LvnG51rG6n+267vbef3x4r4DF4AjaBXbAHngD+uAEuGABPoHP4Evja+Nb40fjZ4YuL+Ux6DUGr9+A/T9jb4=</latexit> Likelihood Weighting • Pr Problem m with rejection samp mpling: • Id Idea: ea: fi fix ev eviden ence ce an and sa sample t the r rest st • If evidence is unlikely, rejects lots of samples • Pr Probl blem: sample distribution not consistent! • Evidence not exploited as you sample • So Solution: Assign a we weig ight by according to probability of evidence given parents • Consider P( P( Sh Shape pe | bl blue ) pyramid, blue pyramid, green pyramid, blue pyramid, red sphere, blue Shape Color Shape Color sphere, blue cube, blue cube, red sphere, blue sphere, green Likelihood Weighting Ex Examp mple: P ( C, R | + s, + w ) +c 0.5 -c 0.5 Cloudy Cloudy +s 0.1 +r 0.8 +c +c -s 0.9 -r 0.2 -c +s 0.5 -c +r 0.2 Sprinkler Rain Rain -s 0.5 -r 0.8 Samples: WetGrass WetGrass +w 0.99 +r +s +c, +s, +r, +w -w 0.01 -r +w 0.90 … -w 0.10 +w 0.90 +r -s -w 0.10 -r +w 0.01 In Intuition: As Assign higher w w to to “good” samples -w 0.99 (i.e. samples with high probability for evidence)

  10. Likelihood Weighting • In Input: t: evidence assignment • w w = 1. 1.0 • for or i = 1 , 2 , …, n • if X i is s an evi vidence va variable • X i = x i (from evidence) • w w = w * P( P(x i | Parents( s(X i )) )) • else se • Sam Sampl ple x i fro from P( P(X i | Parents( s(X i )) )) • Ret etur urn n (x (x 1 , x , x 2 , …, …, x n ) , w Likelihood Weighting • Sampling dist stribution if z sa sampled and e fixe xed evi vidence Cloudy C S R • Now, sa samples s have ve we weig ights ts W • Together, weighted sa sampling dist stribution is s consi sist stent

  11. Ex Exerc rcise: Sampling ng fro rom p(B p(B, E, A | +j, j, +m) ) in in Ala larm Netwo twork B P(B) E P(E) B E A P(A|B,E) B E +b +e +a 0.95 +b 0.001 +e 0.002 +b +e -a 0.05 -b 0.999 -e 0.998 +b -e +a 0.94 A +b -e -a 0.06 A J P(J|A) A M P(M|A) -b +e +a 0.29 +a +j 0.9 +a +m 0.7 -b +e -a 0.71 +a -j 0.1 +a -m 0.3 J M -b -e +a 0.001 -a +j 0.05 -a +m 0.01 -b -e -a 0.999 -a -j 0.95 -a -m 0.99 1. What is the probability of p( p(-b, b, -e, e, -a) a) ? 2. What weight w w will likelihood weighting assign to a sample -b, b,-e, e, -a ? 3. What weight w w will likelihood weighting assign to a sample -b,-e, + , +a ? 4. Will rejection sampling reject more or less than 99.9% of the samples? Likelihood Weighting • The The Good ood: We take • Like kelihood weighting doesn’t t solve all our r problems ke evidence in into to account t as as we we gen ener erat ate e do down wnstr tream sa samples Evidence influences samples for do down wnstr tream • E.g. here, W’ W’s value will get picked based on variables, but not up upst stream ones • the evidence values of S , R (were aren’t more likely to sample a “good” value for C C that matches the evidence) • More of our samples will reflect the state of the world suggested by the evidence • We would like ke to consider evidence when we sam sample every var ariab able e (lead eads s to o Gib Gibbs samplin ling) C S R W

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend