CS 4100: Artificial Intelligence Bayes Nets: Sampling Jan-Willem - PDF document

CS 4100: Artificial Intelligence Bayes’ Nets: Sampling Jan-Willem van de Meent, Northeastern University [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Bayes’ Net Representation • A A di directed, d, acyclic graph ph, o , one n node p per r random v variable • A A co conditional al probab ability tab able (CP CPT) ) for each node de • A collection of distributions over X , one for each possible assignment to parent variables • Ba Bayes ’ ne nets implicitly enc ncode jo join int dis istrib ributio ions • As a product of local conditional distributions • To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together:

Variable Elimination • Interleave ve jo join inin ing and and marginalizi zing • d k entries s computed for a factor ove ver k variables va s with domain si size zes s d … • Or Orde dering of elimination of hidden va variables s … can can af affect ect si size ze of factors s ge generate ted • Worst st case se: running time exp xponential in the si in size ze of the Baye yes’ s’ net Approximate Inference: Sampling

Sampling • Sampling is a lot like ke repeated simulation • Why Why sampl ple? • Predicting the weather, basketball games, … • Re Reinforcement Learning: Can approximate (q-)values even when you don’t know the transition function • Ba Basic idea • In Inference: : getting a sample is faster than Draw N samples from a sampling distribution S • Dr computing the right answer (e.g. with Compute an approximate posterior probability variable elimination) • Co • Show this co ges to the true probability P conver verges Sampling • Example u u = 0.83 • Sampling from give ven dist stribution C P(C) • St Step p 1: Get sample u from uniform distribution over [0 [0, 1) 0 0.6 0.7 1.0 red 0.6 • E.g. ra random() m() in python green 0.1 • St Step p 2: Convert this sample u into blue 0.3 an outcome for the given distribution by having each target outcome associated with a sub-interval of [0,1) with sub-interval size equal to [0 • If ra random() returns u = = 0.83 , probability of the outcome then our sample is C = = blue • E.g, after sampling 8 times:

Sampling in Bayes’ Nets • Pr Prior ior Sa Samp mplin ling • Re Rejecti tion Sampling • Like kelihood Weighting • Gi Gibbs Sampling Prior Sampling

Prior Sampling +c 0.5 -c 0.5 Cloudy Cloudy +c +s 0.1 +c +r 0.8 -s 0.9 -r 0.2 +s 0.5 +r 0.2 -c -c Sprinkler Sprinkler Rain Rain -s 0.5 -r 0.8 Samples: WetGrass WetGrass +r +w 0.99 +s +c, -s, +r, +w -w 0.01 +w 0.90 -r -c, +s, -r, +w -w 0.10 … +r +w 0.90 -s -w 0.10 +w 0.01 -r -w 0.99 Prior Sampling • For For i = 1 , 2 , …, n • Sa Sampl ple x i fr from P( P(X i | Parents( s(X i )) )) • Re Return (x (x 1 , x , x 2 , …, …, x n )

Prior Sampling • This s process ss generates s sa samples s with pr proba babi bility ty: … i.e … .e. th . the B BN’s s joint probability • Let the number of sa samples s of an eve vent be • Then Then • i.e., the sa sampling procedure is s consi sist stent* (different from consi sist stent heurist stic, or arc consi sist stency) y) Example • We We’ll ll dr draw w a ba batch of sa samples f s from t the B BN: +c +c, -s, +r +r, +w +w C +c, +s +c +s, +r +r, +w +w S R -c, +s +s, +r +r, -w +c +c, -s, +r +r, +w +w W -c, c, -s, s, -r, +w +w • If we want to kn know P( P(W) Count outcomes <+ • Co <+w: 4 : 4, , -w: 1 : 1> lize to get P(W) = • No Normaliz = <+w <+w: 0 : 0.8 .8, , -w: 0 : 0.2 .2> • Estimate will get closer to the true distribution with more samples • Can estimate anything else, too • What about P(C | +w +w) ? P( P(C | +r, r, +w +w) ? P( P(C | -r, r, -w) ? • Fa Fast st: can use fewer samples if less time (what’s the drawback?)

Rejection Sampling Rejection Sampling • Le Let’ t’s s sa say y we want P( P(C) • No point keeping all samples around C • Just tally counts of C as we go S R W • Le Let’ t’s s sa say y we want P( P(C | +s) idea: tally C outcomes, but ignore • Sa Same me id +c, -s, +r +c +r, +w +w (reject) samples which don’t have S= S=+s +c, +s +c +s, +r +r, +w +w • This is called re reje jectio ion samplin ling -c, +s +s, +r +r, -w +c +c, -s, +r +r, +w +w • It is also consistent for conditional probabilities -c, c, -s, s, -r, +w +w (i.e., correct in the limit of large N )

Rejection Sampling • In Input: t: evidence assignments • For For i = 1, 2, …, …, n • Sa Sample mple x i fr from om P(X P(X i | Parents( s(X i )) )) • If If x i not consi sist stent with evi vidence • Re Reje ject: Return – no sample is generated in this cycle • Re Return (x (x 1 , x , x 2 , …, …, x n ) Likelihood Weighting

<latexit sha1_base64="91+0FuZ923Dcgnlovm9ic1E65zk=">AHK3icfZVb9MwFMe9AWU28YeanokIaIqmQd3eBp2pjG2ViF6mpJidx21Dngu107Sx/Fl7hgU/DE4hXvgd2EiCJwxylsc75nb+P3WPbibFPmWl+X1q+cfNW4/bKnebde/cfPFxde3RKo4S46MSNcETOHUgR9kN0wnyG0XlMEAwcjM6c6YHyn80QoX4UvmeLGA0DOA79ke9CJk0Xq+v9zQOjdyA9rPadG6/nls4vVtkx09bSO1beae+BrPUv1hrA9iI3CVDIXAwpHVhmzIYcEua7GImnVAUQ3cKx2guyEMEB3yNHvRajafFvw8RJfxnKE5EzX2ALKJaBb1OAxoZq0YR1HIaNmKodSli6BsdYKy4gDN4io9L0PCWVONBcqRQ+N5BqnOXPwQkS/PhoX3DT6HUNa2tHVBmCvByxdk1DPhoxJgiFObO7bVi93RoTkiM0T/KVJzKuEi9hmTaz0E3WExlWp1ez5D/lWGmb7cr9Ij9dBYZb+XYdfyxmtEfeTNVz8Je1MAHCxiWxdVr/Yc+ypaikLt5rfpbAsMxKmZTmLBKXi4QbJm3CgIYOhxe4ZcMbCG3EYhTQhSNcNtJ8KeLAj54W1LCKEFZSEyNvVL0aIXzSAW3DZk0AgjV5UK37Axgh5l0YawX1UD5qI8/IzP0GLzEJjFhpzpTFXGjPRGBsxWDPHpAwmVaGPFSE2SXWqMriCYXkIeTUcq8pBbcRBYknvrb2kIwDqNYzihGBLCLqULn02QT7gc8oz/1Cj/LD6OkvzrYSUh9es4/FBopOvgtGRkr7zN0vIpo8TULXDasgx0chsw9Swsc7mJ0MN7C40ON23NWik6+abUMHpnfFStd7fG0LvnG51rG6n+267vbef3x4r4DF4AjaBXbAHngD+uAEuGABPoHP4Evja+Nb40fjZ4YuL+Ux6DUGr9+A/T9jb4=</latexit> Likelihood Weighting • Pr Problem m with rejection samp mpling: • Id Idea: ea: fi fix ev eviden ence ce an and sa sample t the r rest st • If evidence is unlikely, rejects lots of samples • Pr Probl blem: sample distribution not consistent! • Evidence not exploited as you sample • So Solution: Assign a we weig ight by according to probability of evidence given parents • Consider P( P( Sh Shape pe | bl blue ) pyramid, blue pyramid, green pyramid, blue pyramid, red sphere, blue Shape Color Shape Color sphere, blue cube, blue cube, red sphere, blue sphere, green Likelihood Weighting Ex Examp mple: P ( C, R | + s, + w ) +c 0.5 -c 0.5 Cloudy Cloudy +s 0.1 +r 0.8 +c +c -s 0.9 -r 0.2 -c +s 0.5 -c +r 0.2 Sprinkler Rain Rain -s 0.5 -r 0.8 Samples: WetGrass WetGrass +w 0.99 +r +s +c, +s, +r, +w -w 0.01 -r +w 0.90 … -w 0.10 +w 0.90 +r -s -w 0.10 -r +w 0.01 In Intuition: As Assign higher w w to to “good” samples -w 0.99 (i.e. samples with high probability for evidence)

Likelihood Weighting • In Input: t: evidence assignment • w w = 1. 1.0 • for or i = 1 , 2 , …, n • if X i is s an evi vidence va variable • X i = x i (from evidence) • w w = w * P( P(x i | Parents( s(X i )) )) • else se • Sam Sampl ple x i fro from P( P(X i | Parents( s(X i )) )) • Ret etur urn n (x (x 1 , x , x 2 , …, …, x n ) , w Likelihood Weighting • Sampling dist stribution if z sa sampled and e fixe xed evi vidence Cloudy C S R • Now, sa samples s have ve we weig ights ts W • Together, weighted sa sampling dist stribution is s consi sist stent

Ex Exerc rcise: Sampling ng fro rom p(B p(B, E, A | +j, j, +m) ) in in Ala larm Netwo twork B P(B) E P(E) B E A P(A|B,E) B E +b +e +a 0.95 +b 0.001 +e 0.002 +b +e -a 0.05 -b 0.999 -e 0.998 +b -e +a 0.94 A +b -e -a 0.06 A J P(J|A) A M P(M|A) -b +e +a 0.29 +a +j 0.9 +a +m 0.7 -b +e -a 0.71 +a -j 0.1 +a -m 0.3 J M -b -e +a 0.001 -a +j 0.05 -a +m 0.01 -b -e -a 0.999 -a -j 0.95 -a -m 0.99 1. What is the probability of p( p(-b, b, -e, e, -a) a) ? 2. What weight w w will likelihood weighting assign to a sample -b, b,-e, e, -a ? 3. What weight w w will likelihood weighting assign to a sample -b,-e, + , +a ? 4. Will rejection sampling reject more or less than 99.9% of the samples? Likelihood Weighting • The The Good ood: We take • Like kelihood weighting doesn’t t solve all our r problems ke evidence in into to account t as as we we gen ener erat ate e do down wnstr tream sa samples Evidence influences samples for do down wnstr tream • E.g. here, W’ W’s value will get picked based on variables, but not up upst stream ones • the evidence values of S , R (were aren’t more likely to sample a “good” value for C C that matches the evidence) • More of our samples will reflect the state of the world suggested by the evidence • We would like ke to consider evidence when we sam sample every var ariab able e (lead eads s to o Gib Gibbs samplin ling) C S R W

CS 4100: Artificial Intelligence Bayes Nets: Sampling Jan-Willem - PDF document

CS 4100: Artificial Intelligence Bayes Nets: Sampling Jan-Willem van de Meent, Northeastern University [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Lecture Overview What is Artificial Intelligence? Agents acting in an environment

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

CS 4100: Artificial Intelligence Bayes Nets Jan-Willem van de Meent, Northeastern University

Sampling from PDFs II CS295, Spring 2017 Shuang Zhao Computer Science Department University of

Adaptive rejection Metropolis sampling Dr. Jarad Niemi STAT 615 - Iowa State University November

Rejection Sampling Schemes for Extracting Uniform Distribution from Biased PUFs Rei Ueno , Kohei

CSE 473: Artificial Intelligence Bayesian Networks: Inference Hanna Hajishirzi Many slides over

CS786 Lecture 13: May 14, 2012 Sampling techniques [KF Chapter 12] CS786 P. Poupart 2012 1

Inference Connecting models to data The problem with infection data Often only observe a

RIP to HIP: The Data Reject Heterogeneous Labor Income Profiles Dmytro Hryshko Econ 312, Spring

Familywise error rate control by interactive unmasking Boyan Aaditya Larry Duan Ramdas