CS 4100: Artificial Intelligence Bayes Nets: Sampling Jan-Willem - - PDF document

cs 4100 artificial intelligence
SMART_READER_LITE
LIVE PREVIEW

CS 4100: Artificial Intelligence Bayes Nets: Sampling Jan-Willem - - PDF document

CS 4100: Artificial Intelligence Bayes Nets: Sampling Jan-Willem van de Meent, Northeastern University [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at


slide-1
SLIDE 1

CS 4100: Artificial Intelligence

Bayes’ Nets: Sampling

Jan-Willem van de Meent, Northeastern University

[These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Bayes’ Net Representation

  • A

A di directed, d, acyclic graph ph, o , one n node p per r random v variable

  • A

A co conditional al probab ability tab able (CP CPT) ) for each node de

  • A collection of distributions over X, one for

each possible assignment to parent variables

  • Ba

Bayes’ne nets implicitly enc ncode jo join int dis istrib ributio ions

  • As a product of local conditional distributions
  • To see what probability a BN gives to a full assignment,

multiply all the relevant conditionals together:

slide-2
SLIDE 2

Variable Elimination

  • Interleave

ve jo join inin ing and and marginalizi zing

  • dk entries

s computed for a factor ove ver k va variables s with domain si size zes s d

  • Or

Orde dering of elimination of hidden va variables s can can af affect ect si size ze of factors s ge generate ted

  • Worst

st case se: running time exp xponential in in the si size ze of the Baye yes’ s’ net

… …

Approximate Inference: Sampling

slide-3
SLIDE 3

Sampling

  • Sampling is a lot like

ke repeated simulation

  • Predicting the weather, basketball games, …
  • Ba

Basic idea

  • Dr

Draw N samples from a sampling distribution S

  • Co

Compute an approximate posterior probability

  • Show this co

conver verges ges to the true probability P

  • Why

Why sampl ple?

  • Re

Reinforcement Learning: Can approximate (q-)values even when you don’t know the transition function

  • In

Inference: : getting a sample is faster than computing the right answer (e.g. with variable elimination)

Sampling

  • Sampling from give

ven dist stribution

  • St

Step p 1: Get sample u from uniform distribution over [0 [0, 1)

  • E.g. ra

random() m() in python

  • St

Step p 2: Convert this sample u into an outcome for the given distribution by having each target outcome associated with a sub-interval of [0 [0,1) with sub-interval size equal to probability of the outcome

  • Example
  • If ra

random() returns u = = 0.83, then our sample is C = = blue

  • E.g, after sampling 8 times:

C P(C) red 0.6 green 0.1 blue 0.3

0.6 0.7 1.0 u u = 0.83

slide-4
SLIDE 4

Sampling in Bayes’ Nets

  • Pr

Prior ior Sa Samp mplin ling

  • Re

Rejecti tion Sampling

  • Like

kelihood Weighting

  • Gi

Gibbs Sampling

Prior Sampling

slide-5
SLIDE 5

Prior Sampling

Cloudy Sprinkler Rain WetGrass Cloudy Sprinkler Rain WetGrass

+c 0.5

  • c

0.5 +c +s 0.1

  • s

0.9

  • c

+s 0.5

  • s

0.5 +c +r 0.8

  • r

0.2

  • c

+r 0.2

  • r

0.8 +s +r +w 0.99

  • w

0.01

  • r

+w 0.90

  • w

0.10

  • s

+r +w 0.90

  • w

0.10

  • r

+w 0.01

  • w

0.99

Samples: +c, -s, +r, +w

  • c, +s, -r, +w

Prior Sampling

  • For

For i = 1, 2, …, n

  • Sa

Sampl ple xi fr from P( P(Xi | Parents( s(Xi)) ))

  • Re

Return (x (x1, x , x2, …, …, xn)

slide-6
SLIDE 6

Prior Sampling

  • This

s process ss generates s sa samples s with pr proba babi bility ty: … … i.e .e. th . the B BN’s s joint probability

  • Let the number of sa

samples s of an eve vent be

  • Then

Then

  • i.e., the sa

sampling procedure is s consi sist stent* (different from consi sist stent heurist stic, or arc consi sist stency) y)

Example

  • We

We’ll ll dr draw w a ba batch of sa samples f s from t the B BN:

+c +c, -s, +r +r, +w +w +c +c, +s +s, +r +r, +w +w

  • c, +s

+s, +r +r, -w +c +c, -s, +r +r, +w +w

  • c,

c, -s, s, -r, +w +w

  • If we want to kn

know P( P(W)

  • Co

Count outcomes <+ <+w: 4 : 4, , -w: 1 : 1>

  • No

Normaliz lize to get P(W) = = <+w <+w: 0 : 0.8 .8, , -w: 0 : 0.2 .2>

  • Estimate will get closer to the true distribution with more samples
  • Can estimate anything else, too
  • What about P(C | +w

+w)? P( P(C | +r, r, +w +w)? P( P(C | -r, r, -w)?

  • Fa

Fast st: can use fewer samples if less time (what’s the drawback?)

S R W C

slide-7
SLIDE 7

Rejection Sampling

+c +c, -s, +r +r, +w +w +c +c, +s +s, +r +r, +w +w

  • c, +s

+s, +r +r, -w +c +c, -s, +r +r, +w +w

  • c,

c, -s, s, -r, +w +w

Rejection Sampling

  • Le

Let’ t’s s sa say y we want P( P(C)

  • No point keeping all samples around
  • Just tally counts of C as we go
  • Le

Let’ t’s s sa say y we want P( P(C | +s)

  • Sa

Same me id idea: tally C outcomes, but ignore (reject) samples which don’t have S= S=+s

  • This is called re

reje jectio ion samplin ling

  • It is also consistent for conditional probabilities

(i.e., correct in the limit of large N)

S R W C

slide-8
SLIDE 8

Rejection Sampling

  • In

Input: t: evidence assignments

  • For

For i = 1, 2, …, …, n

  • Sa

Sample mple xi fr from

  • m P(X

P(Xi | Parents( s(Xi)) ))

  • If

If xi not consi sist stent with evi vidence

  • Re

Reje ject: Return – no sample is generated in this cycle

  • Re

Return (x (x1, x , x2, …, …, xn)

Likelihood Weighting

slide-9
SLIDE 9
  • Id

Idea: ea: fi fix ev eviden ence ce an and sa sample t the r rest st

  • Pr

Probl blem: sample distribution not consistent!

  • So

Solution: Assign a we weig ight by according to probability of evidence given parents

Likelihood Weighting

  • Pr

Problem m with rejection samp mpling:

  • If evidence is unlikely, rejects lots of samples
  • Evidence not exploited as you sample
  • Consider P(

P( Sh Shape pe | bl blue )

Shape Color Shape Color

pyramid, green pyramid, red sphere, blue cube, red sphere, green pyramid, blue pyramid, blue sphere, blue cube, blue sphere, blue

Likelihood Weighting

+c 0.5

  • c

0.5 +c +s 0.1

  • s

0.9

  • c

+s 0.5

  • s

0.5 +c +r 0.8

  • r

0.2

  • c

+r 0.2

  • r

0.8 +s +r +w 0.99

  • w

0.01

  • r

+w 0.90

  • w

0.10

  • s

+r +w 0.90

  • w

0.10

  • r

+w 0.01

  • w

0.99

Samples: +c, +s, +r, +w … Cloudy Sprinkler Rain WetGrass Cloudy Rain WetGrass

P(C, R | +s, +w)

<latexit sha1_base64="91+0FuZ923Dcgnlovm9ic1E65zk=">AHK3icfZVb9MwFMe9AWU28YeanokIaIqmQd3eBp2pjG2ViF6mpJidx21Dngu107Sx/Fl7hgU/DE4hXvgd2EiCJwxylsc75nb+P3WPbibFPmWl+X1q+cfNW4/bKnebde/cfPFxde3RKo4S46MSNcETOHUgR9kN0wnyG0XlMEAwcjM6c6YHyn80QoX4UvmeLGA0DOA79ke9CJk0Xq+v9zQOjdyA9rPadG6/nls4vVtkx09bSO1beae+BrPUv1hrA9iI3CVDIXAwpHVhmzIYcEua7GImnVAUQ3cKx2guyEMEB3yNHvRajafFvw8RJfxnKE5EzX2ALKJaBb1OAxoZq0YR1HIaNmKodSli6BsdYKy4gDN4io9L0PCWVONBcqRQ+N5BqnOXPwQkS/PhoX3DT6HUNa2tHVBmCvByxdk1DPhoxJgiFObO7bVi93RoTkiM0T/KVJzKuEi9hmTaz0E3WExlWp1ez5D/lWGmb7cr9Ij9dBYZb+XYdfyxmtEfeTNVz8Je1MAHCxiWxdVr/Yc+ypaikLt5rfpbAsMxKmZTmLBKXi4QbJm3CgIYOhxe4ZcMbCG3EYhTQhSNcNtJ8KeLAj54W1LCKEFZSEyNvVL0aIXzSAW3DZk0AgjV5UK37Axgh5l0YawX1UD5qI8/IzP0GLzEJjFhpzpTFXGjPRGBsxWDPHpAwmVaGPFSE2SXWqMriCYXkIeTUcq8pBbcRBYknvrb2kIwDqNYzihGBLCLqULn02QT7gc8oz/1Cj/LD6OkvzrYSUh9es4/FBopOvgtGRkr7zN0vIpo8TULXDasgx0chsw9Swsc7mJ0MN7C40ON23NWik6+abUMHpnfFStd7fG0LvnG51rG6n+267vbef3x4r4DF4AjaBXbAHngD+uAEuGABPoHP4Evja+Nb40fjZ4YuL+Ux6DUGr9+A/T9jb4=</latexit>

Ex Examp mple:

In Intuition: As Assign higher w w to to “good” samples (i.e. samples with high probability for evidence)

slide-10
SLIDE 10

Likelihood Weighting

  • In

Input: t: evidence assignment

  • w

w = 1. 1.0

  • for
  • r i = 1, 2, …, n
  • if Xi is

s an evi vidence va variable

  • Xi = xi (from evidence)
  • w

w = w * P( P(xi | Parents( s(Xi)) ))

  • else

se

  • Sam

Sampl ple xi fro from P( P(Xi | Parents( s(Xi)) ))

  • Ret

etur urn n (x (x1, x , x2, …, …, xn), w

Likelihood Weighting

  • Sampling dist

stribution if z sa sampled and e fixe xed evi vidence

  • Now, sa

samples s have ve we weig ights ts

  • Together, weighted sa

sampling dist stribution is s consi sist stent

Cloudy R C S W

slide-11
SLIDE 11

Ex Exerc rcise: Sampling ng fro rom p(B p(B, E, A | +j, j, +m) ) in in Ala larm Netwo twork

B P(B) +b 0.001

  • b

0.999 E P(E) +e 0.002

  • e

0.998 A J P(J|A) +a +j 0.9 +a

  • j

0.1

  • a

+j 0.05

  • a
  • j

0.95 A M P(M|A) +a +m 0.7 +a

  • m

0.3

  • a

+m 0.01

  • a
  • m

0.99

B E A M J

  • 1. What is the probability of p(

p(-b, b, -e, e, -a) a)?

  • 2. What weight w

w will likelihood weighting assign to a sample -b, b,-e, e, -a?

  • 3. What weight w

w will likelihood weighting assign to a sample -b,-e, + , +a?

  • 4. Will rejection sampling reject more or less than 99.9% of the samples?

B E A P(A|B,E) +b +e +a 0.95 +b +e

  • a

0.05 +b

  • e

+a 0.94 +b

  • e
  • a

0.06

  • b

+e +a 0.29

  • b

+e

  • a

0.71

  • b
  • e

+a 0.001

  • b
  • e
  • a

0.999

Likelihood Weighting

  • The

The Good

  • od: We take

ke evidence in into to account t as as we we gen ener erat ate e do down wnstr tream sa samples

  • E.g. here, W’

W’s value will get picked based on the evidence values of S, R

  • More of our samples will reflect the state
  • f the world suggested by the evidence
  • Like

kelihood weighting doesn’t t solve all our r problems

  • Evidence influences samples for do

down wnstr tream variables, but not up upst stream ones (were aren’t more likely to sample a “good” value for C C that matches the evidence)

  • We would like

ke to consider evidence when we sam sample every var ariab able e (lead eads s to

  • Gib

Gibbs samplin ling)

S R W C

slide-12
SLIDE 12

Gibbs Sampling Gibbs Sampling

  • Go

Goal: : generate samples from p( p(X1, X , X2, …, …, Xn | | e1, e , e2, …, …, em) )

  • Gibbs

s Sampling:

  • Initialize

ze a full assignment X1=x =x1, , X2=x =x2, , …, …, Xn=xn

  • Re

Repe peat

  • Se

Select any non-evidence variable Xi

  • Co

Compute te p( p(Xi

i | e

| e1, e , e2, …, …, em , , xj≠i) )

  • Sa

Sampl ple Xi

i from p(X

(Xi

i | e

| e1, e , e2, …, …, em , , xj≠i) )

  • Pr

Prope perty ty: : Samples

  • In

Intu tuiti tion: both upst stream and downst stream affect probability of sampling Xi

i =

= xi

i at each iteration.

slide-13
SLIDE 13
  • St

Step 2: In Initial alize e other er var ariab ables es

  • E.g. use prior sampling

Gibbs Sampling Example: P( S | +r)

  • Ste

Step 1 p 1: Fix x evi vidence

  • R

R = +r

  • Ste

Step p 3: 3: Re Repeat

  • Choose

se a non-evidence variable X

  • Resa

sample X from P( X | all other va variables) s)

S +r W C S +r W C S +r W C S +r W C S +r W C S +r W C S +r W C S +r W C

Efficient Resampling of One Variable

  • Sam

ample e from

  • m P(

P(S | S | + +c, + , +r, , -w) w)

  • Many

y things s cancel out – only y CPTs s with S re rema main!

  • More generally:

y: only y CPTs s that have ve resa sampled va variable need need to be consi sidered, and joined together

S +r W C

slide-14
SLIDE 14

Bayes’ Net Sampling Summary

  • Pr

Prior Sa Sampl pling g P( P( Q ) Q )

  • Like

kelihood Weighting P( P( Q | Q | e e)

  • Re

Rejection n Sampling ng P( P( Q | e )

  • Gi

Gibbs bbs Sampl pling P( P( Q | e )

Further Reading on Gibbs Sampling*

  • Gibbs

s sa sampling produces s sa samples s from the query y dist stribution P( ( Q | e e ) in in lim limit it of re-sa sampling infinitely y often

  • Gibbs

s sa sampling is s a sp special case se of more general methods s called Marko kov v chain Monte Carlo (MCMC) methods s

  • Me

Metro ropolis-Hast stings is s one of the more famous s MCMC methods s (in fact, Gibbs s sa sampling is s a sp special case se of Metropolis-Hast stings) s)

  • When people sa

say y “Monte Carlo” methods s – they y mean “sa sampling”