Sampling Techniques for Probabilistic and Deterministic Graphical - - PowerPoint PPT Presentation

sampling techniques for probabilistic and deterministic
SMART_READER_LITE
LIVE PREVIEW

Sampling Techniques for Probabilistic and Deterministic Graphical - - PowerPoint PPT Presentation

Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading Darwiche chapter 15, related papers slides11b 828X 2019 Algorithms for Reasoning with graphical models


slide-1
SLIDE 1

Sampling Techniques for Probabilistic and Deterministic Graphical models

ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter

Reading” Darwiche chapter 15, related papers

slides11b 828X 2019

slide-2
SLIDE 2

Algorithms for Reasoning with graphical models

Slides Set 11(part b):

Rina Dechter

slides11b 828X 2019

Sampling Techniques for Probabilistic and Deterministic Graphical models

(Reading” Darwiche chapter 15, cutset‐sampling paper posted)

slide-3
SLIDE 3

Overview

  • 1. Probabilistic Reasoning/Graphical models
  • 2. Importance Sampling
  • 3. Markov Chain Monte Carlo: Gibbs Sampling
  • 4. Sampling in presence of Determinism
  • 5. Rao‐Blackwellisation
  • 6. AND/OR importance sampling

slides11b 828X 2019

slide-4
SLIDE 4

Markov Chain

  • A Markov chain is a discrete random process with

the property that the next state depends only on the current state (Markov Property):

x1 x2 x3 x4

) | ( ) ,..., , | (

1 1 2 1  

t t t t

x x P x x x x P

  • If P(Xt|xt‐1) does not depend on t (time

homogeneous) and state space is finite, then it is

  • ften expressed as a transition function (aka

transition matrix)

1 ) (  

x

x X P

slides11b 828X 2019

slide-5
SLIDE 5

Example: Drunkard’s Walk

  • a random walk on the number line where, at

each step, the position may change by +1 or −1 with equal probability

1 2 3

,...} 2 , 1 , { ) (  X D

5 . 5 . ) 1 ( ) 1 ( n n P n P  

transition matrix P(X) 1 2

slides11b 828X 2019

slide-6
SLIDE 6

Example: Weather Model

5 . 5 . 1 . 9 . ) ( ) ( sunny rainy sunny P rainy P

transition matrix P(X)

} , { ) ( sunny rainy X D 

rain rain rain rain sun

slides11b 828X 2019

slide-7
SLIDE 7

Multi‐Variable System

  • state is an assignment of values to all the

variables

x1

t

x2

t

x3

t

x1

t+1

x2

t+1

x3

t+1

} ,..., , {

2 1 t n t t t

x x x x  finite discrete X D X X X X

i

, ) ( }, , , {

3 2 1

 

slides11b 828X 2019

slide-8
SLIDE 8

Bayesian Network System

  • Bayesian Network is a representation of the

joint probability distribution over 2 or more variables

X1

t

X2

t

X3

t

} , , {

3 2 1 t t t t

x x x x 

X1 X2 X3

} , , {

3 2 1

X X X X 

x1

t+1

x2

t+1

x3

t+1 slides11b 828X 2019

slide-9
SLIDE 9

Stationary Distribution Existence

  • If the Markov chain is time‐homogeneous,

then the vector (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point”), if its entries sum up to 1 and satisfy:

  • Finite state space Markov chain has a unique

stationary distribution if and only if:

– The chain is irreducible – All of its states are positive recurrent

) (

) | ( ) ( ) (

X D x j i j i

i

x x P x x  

slides11b 828X 2019

slide-10
SLIDE 10

Irreducible

  • A state x is irreducible if under the transition rule
  • ne has nonzero probability of moving from x to

any other state and then coming back in a finite number of steps

  • If one state is irreducible, then all the states

must be irreducible

(Liu, Ch. 12, pp. 249, Def. 12.1.1)

slides11b 828X 2019

slide-11
SLIDE 11

Recurrent

  • A state x is recurrent if the chain returns to x

with probability 1

  • Let M(x ) be the expected number of steps to

return to state x

  • State x is positive recurrent if M(x ) is finite

The recurrent states in a finite state chain are positive recurrent .

slides11b 828X 2019

slide-12
SLIDE 12

Stationary Distribution Convergence

  • Consider infinite Markov chain:

n n n

P P x x P P

) (

) | (  

  • Initial state is not important in the limit

“The most useful feature of a “good” Markov chain is its fast forgetfulness of its past…” (Liu, Ch. 12.1)

) (

lim

n n

P

 

 

  • If the chain is both irreducible and aperiodic,

then:

slides11b 828X 2019

slide-13
SLIDE 13

Aperiodic

  • Define d(i) = g.c.d.{n > 0 | it is possible to go

from i to i in n steps}. Here, g.c.d. means the greatest common divisor of the integers in the

  • set. If d(i)=1 for i, then chain is aperiodic
  • Positive recurrent, aperiodic states are ergodic

slides11b 828X 2019

slide-14
SLIDE 14

Markov Chain Monte Carlo

  • How do we estimate P(X), e.g., P(X|e) ?
  • Generate samples that form Markov Chain

with stationary distribution =P(X|e)

  • Estimate  from samples (observed states):

visited states x0,…,xn can be viewed as “samples” from distribution 

) , ( 1 ) (

1 t T t

x x T x

   ) ( lim x

T

 

 

slides11b 828X 2019

slide-15
SLIDE 15

MCMC Summary

  • Convergence is guaranteed in the limit
  • Samples are dependent, not i.i.d.
  • Convergence (mixing rate) may be slow
  • The stronger correlation between states, the

slower convergence!

  • Initial state is not important, but… typically,

we throw away first K samples ‐ “burn‐in”

slides11b 828X 2019

slide-16
SLIDE 16

Gibbs Sampling (Geman&Geman,1984)

  • Gibbs sampler is an algorithm to generate a

sequence of samples from the joint probability distribution of two or more random variables

  • Sample new variable value one variable at a

time from the variable’s conditional distribution:

  • Samples form a Markov chain with stationary

distribution P(X|e) ) \ | ( } ,..., , ,.., | ( ) (

1 1 1 i t i t n t i t i t i i

x x X P x x x x X P X P  

 

slides11b 828X 2019

slide-17
SLIDE 17

Gibbs Sampling: Illustration

The process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk): In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables Xi).

slides11b 828X 2019

slide-18
SLIDE 18

Ordered Gibbs Sampler

Generate sample xt+1 from xt : In short, for i=1 to N:

) , \ | ( from sampled ) , ,..., , | ( ... ) , ,..., , | ( ) , ,..., , | (

1 1 1 1 2 1 1 1 3 1 1 2 1 2 2 3 2 1 1 1 1

e x x X P x X e x x x X P x X e x x x X P x X e x x x X P x X

i t i t i i t N t t N t N N t N t t t t N t t t

       

        

Process All Variables In Some Order

slides11b 828X 2019

slide-19
SLIDE 19

Transition Probabilities in BN

Markov blanket:

: ) | ( ) \ | (

t i i i t i

markov X P x x X P 

i j ch

X j j i i i t i

pa x P pa x P x x x P ) | ( ) | ( ) \ | ( ) ( ) (

 

j j ch

X j i i i

pa ch pa X markov

Xi

Given Markov blanket (parents, children, and their parents), Xi is independent of all other nodes

Computation is linear in the size of Markov blanket! U U U

slides11b 828X 2019

slide-20
SLIDE 20

Ordered Gibbs Sampling Algorithm (Pearl,1988)

Input: X, E=e Output: T samples {xt} Fix evidence E=e, initialize x0 at random

1. For t = 1 to T (compute samples) 2. For i = 1 to N (loop through variables) 3. xi

t+1  P(Xi | markovi t)

4. End For 5. End For

slides11b 828X 2019

slide-21
SLIDE 21

Gibbs Sampling Example ‐ BN

X1 X4 X8 X5 X2 X3 X9 X7 X6

} { }, ,..., , {

9 9 2 1

X E X X X X  

X1 = x1 X6 = x6 X2 = x2 X7 = x7 X3 = x3 X8 = x8 X4 = x4 X5 = x5

slides11b 828X 2019

slide-22
SLIDE 22

Gibbs Sampling Example ‐ BN

X1 X4 X8 X5 X2 X3 X9 X7 X6

) , ,..., | (

9 8 2 1 1 1

x x x X P x 

} { }, ,..., , {

9 9 2 1

X E X X X X  

) , ,..., | (

9 8 1 1 2 1 2

x x x X P x 

slides11b 828X 2019

slide-23
SLIDE 23

Answering Queries P(xi |e) = ?

  • Method 1: count # of samples where Xi = xi (histogram estimator):

  

T t t i i i i i

markov x X P T x X P

1

) | ( 1 ) (

 

T t t i i i

x x T x X P

1

) , ( 1 ) ( 

Dirac delta f-n

  • Method 2: average probability (mixture estimator):
  • Mixture estimator converges faster (consider

estimates for the unobserved values of Xi; prove via Rao‐Blackwell theorem)

slides11b 828X 2019

slide-24
SLIDE 24

Rao‐Blackwell Theorem

Rao‐Blackwell Theorem: Let random variable set X be composed of two groups of variables, R and L. Then, for the joint distribution (R,L) and function g, the following result applies )] ( [ } | ) ( { [ R g Var L R g E Var  for a function of interest g, e.g., the mean or covariance (Casella&Robert,1996, Liu et. al. 1995).

  • theorem makes a weak promise, but works well in practice!
  • improvement depends on the choice of R and L

slides11b 828X 2019

slide-25
SLIDE 25

Importance vs. Gibbs

 

   

       

T t t t t t T t t T t

x Q x P x g T g e X Q X x g T X g e X P e X P e X P x

1 1

) ( ) ( ) ( 1 ) | ( ) ( 1 ) ( ˆ ) | ( ) | ( ˆ ) | ( ˆ

wt

Gibbs: Importance:

slides11b 828X 2019

slide-26
SLIDE 26

Gibbs Sampling: Convergence

  • Sample from P(X|e)P(X|e)
  • Converges iff chain is irreducible and ergodic
  • Intuition ‐ must be able to explore all states:

– if Xi and Xj are strongly correlated, Xi=0 Xj=0, then, we cannot explore states with Xi=1 and Xj=1

  • All conditions are satisfied when all

probabilities are positive

  • Convergence rate can be characterized by the

second eigen‐value of transition matrix

slides11b 828X 2019

slide-27
SLIDE 27

Gibbs: Speeding Convergence

Reduce dependence between samples (autocorrelation)

  • Skip samples
  • Randomize Variable Sampling Order
  • Employ blocking (grouping)
  • Multiple chains

Reduce variance (cover in the next section)

slides11b 828X 2019

slide-28
SLIDE 28

Blocking Gibbs Sampler

  • Sample several variables together, as a block
  • Example: Given three variables X,Y,Z, with domains of

size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample: + Can improve convergence greatly when two variables are strongly correlated! ‐ Domain of the block variable grows exponentially with the #variables in a block! ) | , ( ) , ( ) ( ) , | (

1 1 1 1 1     

   

t t t t t t t t

x Z Y P w z y w P z y X P x

slides11b 828X 2019

slide-29
SLIDE 29

Gibbs: Multiple Chains

  • Generate M chains of size K
  • Each chain produces independent estimate Pm:

   

  

 M i m

P M P

1

1 ˆ

K t i t i i m

x x x P K e x P

1

) \ | ( 1 ) | (

Treat Pm as independent random variables.

  • Estimate P(xi|e) as average of Pm (xi|e) :

slides11b 828X 2019

slide-30
SLIDE 30

Gibbs Sampling Summary

  • Markov Chain Monte Carlo method

(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)

  • Samples are dependent, form Markov Chain
  • Sample from

which converges to

  • Guaranteed to converge when all P > 0
  • Methods to improve convergence:

– Blocking – Rao‐Blackwellised

) | ( e X P ) | ( e X P

slides11b 828X 2019

slide-31
SLIDE 31

Overview

  • 1. Probabilistic Reasoning/Graphical models
  • 2. Importance Sampling
  • 3. Markov Chain Monte Carlo: Gibbs Sampling
  • 4. Sampling in presence of Determinism
  • 5. Rao‐Blackwellisation
  • 6. AND/OR importance sampling

slides11b 828X 2019

slide-32
SLIDE 32

Sampling: Performance

  • Gibbs sampling

– Reduce dependence between samples

  • Importance sampling

– Reduce variance

  • Cutset sampling Achieve both by sampling a

subset of variables and integrating out the rest (reduce dimensionality), aka Rao‐ Blackwellisation

  • Exploit graph structure to manage the extra cost

slides11b 828X 2019

slide-33
SLIDE 33

Smaller Subset State‐Space

  • Smaller state‐space is easier to cover

} , , , {

4 3 2 1

X X X X X  } , {

2 1 X

X X  64 ) (  X D 16 ) (  X D

slides11b 828X 2019

slide-34
SLIDE 34

Smoother Distribution

00 01 10 11 0.1 0.2 00 01 10 11

P(X1,X2,X3,X4)

0‐0.1 0.1‐0.2 0.2‐0.26 1 0.1 0.2 1

P(X1,X2)

0‐0.1 0.1‐0.2 0.2‐0.26

slides11b 828X 2019

slide-35
SLIDE 35

Speeding Up Convergence

  • Mean Squared Error of the estimator:

   

P Var BIAS P MSE

Q Q

 

2

 

        

2 2

] [ ˆ ] ˆ [ ] ˆ [ P E P E P Var P MSE

Q Q Q Q

  • Reduce variance  speed up convergence !
  • In case of unbiased estimator, BIAS=0

slides11b 828X 2019

slide-36
SLIDE 36

Rao‐Blackwellisation

)} ( ~ { ]} | ) ( [ { )} ( { )} ( ˆ { ]} | ) ( [ { )} ( { ]} | ) ( {var[ ]} | ) ( [ { )} ( { ]} | ) ( [ ] | ) ( [ { 1 ) ( ~ )} ( ) ( { 1 ) ( ˆ

1 1

x g Var T l x h E Var T x h Var x g Var l x g E Var x g Var l x g E l x g E Var x g Var l x h E l x h E T x g x h x h T x g L R X

T T

              

Liu, Ch.2.3

U

slides11b 828X 2019

slide-37
SLIDE 37

Rao‐Blackwellisation

  • X=RL
  • Importance Sampling:
  • Gibbs Sampling:

– autocovariances are lower (less correlation between samples) – if Xi and Xj are strongly correlated, Xi=0  Xj=0,

  • nly include one of them into a sampling set

} ) ( ) ( { } ) , ( ) , ( { R Q R P Var L R Q L R P Var

Q Q

Liu, Ch.2.5.5 “Carry out analytical computation as much as possible” ‐ Liu

slides11b 828X 2019

slide-38
SLIDE 38

Blocking Gibbs Sampler vs. Collapsed

  • Standard Gibbs:

(1)

  • Blocking:

(2)

  • Collapsed:

(3)

X Y Z

) , | ( ), , | ( ), , | ( y x z P z x y P z y x P ) | , ( ), , | ( x z y P z y x P ) | ( ), | ( x y P y x P

Faster Convergence

slides11b 828X 2019

slide-39
SLIDE 39

Collapsed Gibbs Sampling

Generating Samples

Generate sample ct+1 from ct :

) , \ | ( ) , ,..., , | ( ... ) , ,..., , | ( ) , ,..., , | (

1 1 1 1 2 1 1 1 3 1 1 2 1 2 2 3 2 1 1 1 1

e c c c P c C e c c c c P c C e c c c c P c C e c c c c P c C

i t i t i i t K t t K t K K t K t t t t K t t t

from sampled        

        

In short, for i=1 to K:

slides11b 828X 2019

slide-40
SLIDE 40

Collapsed Gibbs Sampler

Input: C X, E=e Output: T samples {ct } Fix evidence E=e, initialize c0 at random

1. For t = 1 to T (compute samples) 2. For i = 1 to N (loop through variables) 3. ci

t+1  P(Ci | ct\ci)

4. End For 5. End For

slides11b 828X 2019

slide-41
SLIDE 41

Calculation Time

  • Computing P(ci| ct\ci,e) is more expensive

(requires inference)

  • Trading #samples for smaller variance:

– generate more samples with higher covariance – generate fewer samples with lower covariance

  • Must control the time spent computing

sampling probabilities in order to be time‐ effective!

slides11b 828X 2019

slide-42
SLIDE 42

Exploiting Graph Properties

Recall… computation time is exponential in the adjusted induced width of a graph

  • w‐cutset is a subset of variable s.t. when they

are observed, induced width of the graph is w

  • when sampled variables form a w‐cutset ,

inference is exp(w) (e.g., using Bucket Tree Elimination)

  • cycle‐cutset is a special case of w‐cutset

Sampling w‐cutset  w‐cutset sampling!

slides11b 828X 2019

slide-43
SLIDE 43

What If C=Cycle‐Cutset ?

} { }, {

9 5 2

X E ,x x c  

X1 X7 X5 X4 X2 X9 X8 X3 X6 X1 X7 X4 X9 X8 X3 X6

P(x2,x5,x9) – can compute using Bucket Elimination (probability of evidence) P(x2,x5,x9) – computation complexity is O(N)

slides11b 828X 2019

slide-44
SLIDE 44

Computing Transition Probabilities

) , , 1 ( : ) , , ( :

9 3 2 9 3 2

x x x P BE x x x P BE  

X1 X7 X5 X4 X2 X9 X8 X3 X6

) , , 1 ( ) | 1 ( ) , , ( ) | ( ) , , 1 ( ) , , (

9 3 2 3 2 9 3 2 3 2 9 3 2 9 3 2

x x x P x x P x x x P x x P x x x P x x x P             

Compute joint probabilities: Normalize:

slides11b 828X 2019

slide-45
SLIDE 45

Cutset Sampling‐Answering Queries

  • Query: ci C, P(ci |e)=? same as Gibbs:

computed while generating sample t using bucket tree elimination compute after generating sample t using bucket tree elimination

T t i t i i

e c c c P T |e c P

1

) , \ | ( 1 ) ( ˆ

T t t i i

,e c x P T |e) (x P

1

) | ( 1

  • Query: xi X\C, P(xi |e)=?

slides11b 828X 2019

slide-46
SLIDE 46

Cutset Sampling vs. Cutset Conditioning

) | ( ) | ( ) ( ) | ( ) | ( 1

) ( ) ( 1

e c P c,e x P T c count c,e x P ,e c x P T |e) (x P

C D c i C D c i T t t i i

    

  

  

  • Cutset Conditioning
  • Cutset Sampling

) | ( ) | (

) (

e c P c,e x P |e) P(x

C D c i i

  

slides11b 828X 2019

slide-47
SLIDE 47

Cutset Sampling Example

                ) ( ) ( ) ( 3 1 ) | ( ) ( ) ( ) (

9 2 5 2 9 1 5 2 9 5 2 9 2 9 2 5 2 3 2 9 1 5 2 2 2 9 5 2 1 2

,x | x x P ,x | x x P ,x | x x P x x P ,x | x x P x ,x | x x P x ,x | x x P x  

X1 X7 X6 X5 X4 X2 X9 X8 X3

Estimating P(x2|e) for sampling node X2 :

Sample 1 Sample 2 Sample 3

slides11b 828X 2019

slide-48
SLIDE 48

Cutset Sampling Example

) , , | ( } , { ) , , | ( } , { ) , , | ( } , {

9 3 5 3 2 3 3 5 3 2 3 9 2 5 2 2 3 2 5 2 2 2 9 1 5 1 2 3 1 5 1 2 1

x x x x P x x c x x x x P x x c x x x x P x x c      

Estimating P(x3 |e) for non‐sampled node X3 :

X1 X7 X6 X5 X4 X2 X9 X8 X3

             ) , , | ( ) , , | ( ) , , | ( 3 1 ) | (

9 3 5 3 2 3 9 2 5 2 2 3 9 1 5 1 2 3 9 3

x x x x P x x x x P x x x x P x x P

slides11b 828X 2019

slide-49
SLIDE 49

CPCS54 Test Results

MSE vs. # samples (left) and time (right) Ergodic, | X| = 54, D(Xi)= 2, | C| = 15, | E| = 3 Exact Time = 30 sec using Cutset Conditioning

CPCS54, n=54, |C|=15, |E|=3 0.001 0.002 0.003 0.004 1000 2000 3000 4000 5000 # samples

Cutset Gibbs

CPCS54, n=54, |C|=15, |E|=3

0.0002 0.0004 0.0006 0.0008 5 10 15 20 25 Time(sec)

Cutset Gibbs

slides11b 828X 2019

slide-50
SLIDE 50

CPCS179 Test Results

MSE vs. # samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry) | X| = 179, | C| = 8, 2< = D(Xi)< = 4, | E| = 35 Exact Time = 122 sec using Cutset Conditioning

CPCS179, n=179, |C|=8, |E|=35

0.002 0.004 0.006 0.008 0.01 0.012 100 500 1000 2000 3000 4000 # samples Cutset Gibbs

CPCS179, n=179, |C|=8, |E|=35

0.002 0.004 0.006 0.008 0.01 0.012 20 40 60 80 Time(sec)

Cutset Gibbs

slides11b 828X 2019

slide-51
SLIDE 51

CPCS360b Test Results

MSE vs. # samples (left) and time (right) Ergodic, | X| = 360, D(Xi)= 2, | C| = 21, | E| = 36 Exact Time > 60 min using Cutset Conditioning Exact Values obtained via Bucket Elimination

CPCS360b, n=360, |C|=21, |E|=36

0.00004 0.00008 0.00012 0.00016 200 400 600 800 1000 # samples

Cutset Gibbs

CPCS360b, n=360, |C|=21, |E|=36

0.00004 0.00008 0.00012 0.00016 1 2 3 5 10 20 30 40 50 60 Time(sec)

Cutset Gibbs

slides11b 828X 2019

slide-52
SLIDE 52

Random Networks

MSE vs. # samples (left) and time (right) | X| = 100, D(Xi) = 2,| C| = 13, | E| = 15-20 Exact Time = 30 sec using Cutset Conditioning

RANDOM, n=100, |C|=13, |E|=15-20

0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 200 400 600 800 1000 1200

# samples Cutset Gibbs

RANDOM, n=100, |C|=13, |E|=15-20

0.0002 0.0004 0.0006 0.0008 0.001 1 2 3 4 5 6 7 8 9 10 11 Time(sec) Cutset Gibbs

slides11b 828X 2019

slide-53
SLIDE 53

Coding Networks

Cutset Transforms Non‐Ergodic Chain to Ergodic

MSE vs. time (right) Non-Ergodic, | X| = 100, D(Xi)= 2, | C| = 13-16, | E| = 50 Sample Ergodic Subspace U= { U1, U2,… Uk} Exact Time = 50 sec using Cutset Conditioning

x1 x2 x3 x4 u1 u2 u3 u4 p1 p2 p3 p4 y4 y3 y2 y1

Coding Networks, n=100, |C|=12-14

0.001 0.01 0.1 10 20 30 40 50 60 Time(sec)

IBP Gibbs Cutset

slides11b 828X 2019

slide-54
SLIDE 54

Non‐Ergodic Hailfinder

MSE vs. # samples (left) and time (right) Non-Ergodic, | X| = 56, | C| = 5, 2 < = D(Xi) < = 11, | E| = 0 Exact Time = 2 sec using Loop-Cutset Conditioning

HailFinder, n=56, |C|=5, |E|=1

0.0001 0.001 0.01 0.1 1 1 2 3 4 5 6 7 8 9 10

Time(sec)

Cutset Gibbs

HailFinder, n=56, |C|=5, |E|=1

0.0001 0.001 0.01 0.1 500 1000 1500

# samples Cutset Gibbs

slides11b 828X 2019

slide-55
SLIDE 55

CPCS360b ‐ MSE

cpcs360b, N=360, |E|=[20-34], w*=20, MSE

0.000005 0.00001 0.000015 0.00002 0.000025

200 400 600 800 1000 1200 1400 1600 Time (sec)

Gibbs IBP |C|=26,fw=3 |C|=48,fw=2

MSE vs. Time Ergodic, | X| = 360, | C| = 26, D(Xi)= 2 Exact Time = 50 min using BTE

slides11b 828X 2019

slide-56
SLIDE 56

Cutset Importance Sampling

  • Apply Importance Sampling over cutset C

 

 

 

T t t T t t t

w T c Q e c P T e P

1 1

1 ) ( ) , ( 1 ) ( ˆ

T t t t i i

w c c T e c P

1

) , ( 1 ) | (  

T t t t i i

w e c x P T e x P

1

) , | ( 1 ) | ( 

where P(ct,e) is computed using Bucket Elimination

(Gogate & Dechter, 2005) and (Bidyuk & Dechter, 2006)

slides11b 828X 2019

slide-57
SLIDE 57

Likelihood Cutset Weighting (LCS)

  • Z=Topological Order{C,E}
  • Generating sample t+1:

For End If End ) ,..., | ( Else If : do For

1 1 1 1 1

1

    

    

t i t i t i i i t i i i

z z Z P z e ,z z z E Z Z Z

  • computed while generating

sample t using bucket tree elimination

  • can be memoized for some

number of instances K (based on memory available

KL[P(C|e), Q(C)] ≤ KL[P(X|e), Q(X)]

slides11b 828X 2019

slide-58
SLIDE 58

Pathfinder 1

slides11b 828X 2019

slide-59
SLIDE 59

Pathfinder 2

slides11b 828X 2019

slide-60
SLIDE 60

Link

slides11b 828X 2019

slide-61
SLIDE 61

Summary

  • i.i.d. samples
  • Unbiased estimator
  • Generates samples fast
  • Samples from Q
  • Reject samples with

zero‐weight

  • Improves on cutset
  • Dependent samples
  • Biased estimator
  • Generates samples

slower

  • Samples from P(X|e)
  • Does not converge in

presence of constraints

  • Improves on cutset

Importance Sampling Gibbs Sampling

slides11b 828X 2019

slide-62
SLIDE 62

CPCS360b

cpcs360b, N=360, |LC|=26, w*=21, |E|=15

1.E-05 1.E-04 1.E-03 1.E-02 2 4 6 8 10 12 14

Time (sec) MSE

LW AIS-BN Gibbs LCS IBP

LW – likelihood weighting LCS – likelihood weighting on a cutset

slides11b 828X 2019

slide-63
SLIDE 63

CPCS422b

1.0E-05 1.0E-04 1.0E-03 1.0E-02 10 20 30 40 50 60

MSE Time (sec)

cpcs422b, N=422, |LC|=47, w*=22, |E|=28

LW AIS-BN Gibbs LCS IBP

LW – likelihood weighting LCS – likelihood weighting on a cutset

slides11b 828X 2019

slide-64
SLIDE 64

Coding Networks

coding, N=200, P=3, |LC|=26, w*=21

1.0E-05 1.0E-04 1.0E-03 1.0E-02 1.0E-01 2 4 6 8 10

Time (sec) MSE

LW AIS-BN Gibbs LCS IBP

LW – likelihood weighting LCS – likelihood weighting on a cutset

slides11b 828X 2019

slide-65
SLIDE 65

slides11b 828X 2019

slide-66
SLIDE 66

slides11b 828X 2019

slide-67
SLIDE 67

slides11b 828X 2019