Sampling Techniques for Probabilistic and Deterministic Graphical - - PowerPoint PPT Presentation

sampling techniques for probabilistic
SMART_READER_LITE
LIVE PREVIEW

Sampling Techniques for Probabilistic and Deterministic Graphical - - PowerPoint PPT Presentation

Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2017 Bozhena Bidyuk Rina Dechter Reading Darwiche chapter 15, related papers Overview 1. Probabilistic Reasoning/Graphical models 2. Importance


slide-1
SLIDE 1

Sampling Techniques for Probabilistic and Deterministic Graphical models

ICS 276, Spring 2017 Bozhena Bidyuk Rina Dechter

Reading” Darwiche chapter 15, related papers

slide-2
SLIDE 2

Overview

  • 1. Probabilistic Reasoning/Graphical models
  • 2. Importance Sampling
  • 3. Markov Chain Monte Carlo: Gibbs Sampling
  • 4. Sampling in presence of Determinism
  • 5. Rao-Blackwellisation
  • 6. AND/OR importance sampling
slide-3
SLIDE 3

Markov Chain

  • A Markov chain is a discrete random process with

the property that the next state depends only on the current state (Markov Property):

3

x1 x2 x3 x4

) | ( ) ,..., , | (

1 1 2 1  

t t t t

x x P x x x x P

  • If P(Xt|xt-1) does not depend on t (time

homogeneous) and state space is finite, then it is

  • ften expressed as a transition function (aka

transition matrix)

1 ) (  

x

x X P

slide-4
SLIDE 4

Example: Drunkard’s Walk

  • a random walk on the number line where, at

each step, the position may change by +1 or −1 with equal probability

4

1 2 3

,...} 2 , 1 , { ) (  X D

5 . 5 . ) 1 ( ) 1 ( n n P n P  

transition matrix P(X) 1 2

slide-5
SLIDE 5

Example: Weather Model

5

5 . 5 . 1 . 9 . ) ( ) ( sunny rainy sunny P rainy P

transition matrix P(X)

} , { ) ( sunny rainy X D 

rain rain rain rain sun

slide-6
SLIDE 6

Multi-Variable System

  • state is an assignment of values to all the

variables

6

x1

t

x2

t

x3

t

x1

t+1

x2

t+1

x3

t+1

} ,..., , {

2 1 t n t t t

x x x x  finite discrete X D X X X X

i

, ) ( }, , , {

3 2 1

 

slide-7
SLIDE 7

Bayesian Network System

  • Bayesian Network is a representation of the

joint probability distribution over 2 or more variables

X1

t

X2

t

X3

t

} , , {

3 2 1 t t t t

x x x x 

X1 X2 X3

7

} , , {

3 2 1

X X X X 

x1

t+1

x2

t+1

x3

t+1

slide-8
SLIDE 8

8

Stationary Distribution Existence

  • If the Markov chain is time-homogeneous,

then the vector (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point”), if its entries sum up to 1 and satisfy:

  • Finite state space Markov chain has a unique

stationary distribution if and only if:

– The chain is irreducible – All of its states are positive recurrent

) (

) | ( ) ( ) (

X D x j i j i

i

x x P x x  

slide-9
SLIDE 9

9

Irreducible

  • A state x is irreducible if under the transition rule
  • ne has nonzero probability of moving from x to

any other state and then coming back in a finite number of steps

  • If one state is irreducible, then all the states

must be irreducible

(Liu, Ch. 12, pp. 249, Def. 12.1.1)

slide-10
SLIDE 10

10

Recurrent

  • A state x is recurrent if the chain returns to x

with probability 1

  • Let M(x ) be the expected number of steps to

return to state x

  • State x is positive recurrent if M(x ) is finite

The recurrent states in a finite state chain are positive recurrent .

slide-11
SLIDE 11

Stationary Distribution Convergence

  • Consider infinite Markov chain:

n n n

P P x x P P

) (

) | (  

  • Initial state is not important in the limit

“The most useful feature of a “good” Markov chain is its fast forgetfulness of its past…” (Liu, Ch. 12.1)

) (

lim

n n

P

 

 

11

  • If the chain is both irreducible and aperiodic,

then:

slide-12
SLIDE 12

Aperiodic

  • Define d(i) = g.c.d.{n > 0 | it is possible to go

from i to i in n steps}. Here, g.c.d. means the greatest common divisor of the integers in the

  • set. If d(i)=1 for i, then chain is aperiodic
  • Positive recurrent, aperiodic states are ergodic

12

slide-13
SLIDE 13

Markov Chain Monte Carlo

  • How do we estimate P(X), e.g., P(X|e) ?

13

  • Generate samples that form Markov Chain

with stationary distribution =P(X|e)

  • Estimate  from samples (observed states):

visited states x0,…,xn can be viewed as “samples” from distribution 

) , ( 1 ) (

1 t T t

x x T x

   ) ( lim x

T

 

 

slide-14
SLIDE 14

MCMC Summary

  • Convergence is guaranteed in the limit
  • Samples are dependent, not i.i.d.
  • Convergence (mixing rate) may be slow
  • The stronger correlation between states, the

slower convergence!

  • Initial state is not important, but… typically,

we throw away first K samples - “burn-in”

14

slide-15
SLIDE 15

Gibbs Sampling (Geman&Geman,1984)

  • Gibbs sampler is an algorithm to generate a

sequence of samples from the joint probability distribution of two or more random variables

  • Sample new variable value one variable at a

time from the variable’s conditional distribution:

  • Samples form a Markov chain with stationary

distribution P(X|e)

15

) \ | ( } ,..., , ,.., | ( ) (

1 1 1 i t i t n t i t i t i i

x x X P x x x x X P X P  

 

slide-16
SLIDE 16

Gibbs Sampling: Illustration

The process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk): In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables Xi).

slide-17
SLIDE 17

Ordered Gibbs Sampler

Generate sample xt+1 from xt : In short, for i=1 to N:

17

) , \ | ( from sampled ) , ,..., , | ( ... ) , ,..., , | ( ) , ,..., , | (

1 1 1 1 2 1 1 1 3 1 1 2 1 2 2 3 2 1 1 1 1

e x x X P x X e x x x X P x X e x x x X P x X e x x x X P x X

i t i t i i t N t t N t N N t N t t t t N t t t

       

        

Process All Variables In Some Order

slide-18
SLIDE 18

Transition Probabilities in BN

Markov blanket:

: ) | ( ) \ | (

t i i i t i

markov X P x x X P 

i j ch

X j j i i i t i

pa x P pa x P x x x P ) | ( ) | ( ) \ | ( ) ( ) (

 

j j ch

X j i i i

pa ch pa X markov

Xi

Given Markov blanket (parents, children, and their parents), Xi is independent of all other nodes

18

Computation is linear in the size of Markov blanket!

slide-19
SLIDE 19

Ordered Gibbs Sampling Algorithm (Pearl,1988)

Input: X, E=e Output: T samples {xt} Fix evidence E=e, initialize x0 at random

1. For t = 1 to T (compute samples) 2. For i = 1 to N (loop through variables) 3. xi

t+1  P(Xi | markovi t)

4. End For 5. End For

slide-20
SLIDE 20

Gibbs Sampling Example - BN

20 X1 X4 X8 X5 X2 X3 X9 X7 X6

} { }, ,..., , {

9 9 2 1

X E X X X X  

X1 = x1 X6 = x6 X2 = x2 X7 = x7 X3 = x3 X8 = x8 X4 = x4 X5 = x5

slide-21
SLIDE 21

Gibbs Sampling Example - BN

21 X1 X4 X8 X5 X2 X3 X9 X7 X6

) , ,..., | (

9 8 2 1 1 1

x x x X P x 

} { }, ,..., , {

9 9 2 1

X E X X X X  

) , ,..., | (

9 8 1 1 2 1 2

x x x X P x 

slide-22
SLIDE 22

Answering Queries P(xi |e) = ?

  • Method 1: count # of samples where Xi = xi (histogram estimator):

  

T t t i i i i i

markov x X P T x X P

1

) | ( 1 ) (

 

T t t i i i

x x T x X P

1

) , ( 1 ) ( 

Dirac delta f-n

  • Method 2: average probability (mixture estimator):
  • Mixture estimator converges faster (consider

estimates for the unobserved values of Xi; prove via Rao-Blackwell theorem)

slide-23
SLIDE 23

Rao-Blackwell Theorem

Rao-Blackwell Theorem: Let random variable set X be composed of two groups of variables, R and L. Then, for the joint distribution (R,L) and function g, the following result applies

23

)] ( [ } | ) ( { [ R g Var L R g E Var 

for a function of interest g, e.g., the mean or covariance (Casella&Robert,1996, Liu et. al. 1995).

  • theorem makes a weak promise, but works well in practice!
  • improvement depends the choice of R and L
slide-24
SLIDE 24

Importance vs. Gibbs

 

   

       

T t t t t t T t t T t

x Q x P x g T g e X Q X x g T X g e X P e X P e X P x

1 1

) ( ) ( ) ( 1 ) | ( ) ( 1 ) ( ˆ ) | ( ) | ( ˆ ) | ( ˆ

wt

Gibbs: Importance:

slide-25
SLIDE 25

Gibbs Sampling: Convergence

25

  • Sample from `P(X|e)P(X|e)
  • Converges iff chain is irreducible and ergodic
  • Intuition - must be able to explore all states:

– if Xi and Xj are strongly correlated, Xi=0 Xj=0, then, we cannot explore states with Xi=1 and Xj=1

  • All conditions are satisfied when all

probabilities are positive

  • Convergence rate can be characterized by the

second eigen-value of transition matrix

slide-26
SLIDE 26

Gibbs: Speeding Convergence

Reduce dependence between samples (autocorrelation)

  • Skip samples
  • Randomize Variable Sampling Order
  • Employ blocking (grouping)
  • Multiple chains

Reduce variance (cover in the next section)

26

slide-27
SLIDE 27

Blocking Gibbs Sampler

  • Sample several variables together, as a block
  • Example: Given three variables X,Y,Z, with domains of

size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample: + Can improve convergence greatly when two variables are strongly correlated!

  • Domain of the block variable grows exponentially with

the #variables in a block!

27

) | , ( ) , ( ) ( ) , | (

1 1 1 1 1     

   

t t t t t t t t

x Z Y P w z y w P z y X P x

slide-28
SLIDE 28

Gibbs: Multiple Chains

  • Generate M chains of size K
  • Each chain produces independent estimate Pm:

28

   

 M i m

P M P

1

1 ˆ

K t i t i i m

x x x P K e x P

1

) \ | ( 1 ) | (

Treat Pm as independent random variables.

  • Estimate P(xi|e) as average of Pm (xi|e) :
slide-29
SLIDE 29

Gibbs Sampling Summary

  • Markov Chain Monte Carlo method

(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)

  • Samples are dependent, form Markov Chain
  • Sample from

which converges to

  • Guaranteed to converge when all P > 0
  • Methods to improve convergence:

– Blocking – Rao-Blackwellised

29

) | ( e X P ) | ( e X P

slide-30
SLIDE 30

Overview

  • 1. Probabilistic Reasoning/Graphical models
  • 2. Importance Sampling
  • 3. Markov Chain Monte Carlo: Gibbs Sampling
  • 4. Sampling in presence of Determinism
  • 5. Rao-Blackwellisation
  • 6. AND/OR importance sampling
slide-31
SLIDE 31

Sampling: Performance

  • Gibbs sampling

– Reduce dependence between samples

  • Importance sampling

– Reduce variance

  • Achieve both by sampling a subset of variables

and integrating out the rest (reduce dimensionality), aka Rao-Blackwellisation

  • Exploit graph structure to manage the extra cost

31

slide-32
SLIDE 32

Smaller Subset State-Space

  • Smaller state-space is easier to cover

32

} , , , {

4 3 2 1

X X X X X  } , {

2 1 X

X X  64 ) (  X D 16 ) (  X D

slide-33
SLIDE 33

Smoother Distribution

33

00 01 10 11 0.1 0.2 00 01 10 11

P(X1,X2,X3,X4)

0-0.1 0.1-0.2 0.2-0.26 1 0.1 0.2 1

P(X1,X2)

0-0.1 0.1-0.2 0.2-0.26

slide-34
SLIDE 34

Speeding Up Convergence

  • Mean Squared Error of the estimator:

   

P Var BIAS P MSE

Q Q

 

2

 

        

2 2

] [ ˆ ] ˆ [ ] ˆ [ P E P E P Var P MSE

Q Q Q Q

  • Reduce variance  speed up convergence !
  • In case of unbiased estimator, BIAS=0

34

slide-35
SLIDE 35

Rao-Blackwellisation

35

)} ( ~ { ]} | ) ( [ { )} ( { )} ( ˆ { ]} | ) ( [ { )} ( { ]} | ) ( {var[ ]} | ) ( [ { )} ( { ]} | ) ( [ ] | ) ( [ { 1 ) ( ~ )} ( ) ( { 1 ) ( ˆ

1 1

x g Var T l x h E Var T x h Var x g Var l x g E Var x g Var l x g E l x g E Var x g Var l x h E l x h E T x g x h x h T x g L R X

T T

              

Liu, Ch.2.3

slide-36
SLIDE 36

Rao-Blackwellisation

  • X=RL
  • Importance Sampling:
  • Gibbs Sampling:

– autocovariances are lower (less correlation between samples) – if Xi and Xj are strongly correlated, Xi=0  Xj=0,

  • nly include one fo them into a sampling set

36

} ) ( ) ( { } ) , ( ) , ( { R Q R P Var L R Q L R P Var

Q Q

Liu, Ch.2.5.5 “Carry out analytical computation as much as possible” - Liu

slide-37
SLIDE 37

Blocking Gibbs Sampler vs. Collapsed

  • Standard Gibbs:

(1)

  • Blocking:

(2)

  • Collapsed:

(3)

37

X Y Z

) , | ( ), , | ( ), , | ( y x z P z x y P z y x P ) | , ( ), , | ( x z y P z y x P ) | ( ), | ( x y P y x P

Faster Convergence

slide-38
SLIDE 38

Collapsed Gibbs Sampling

Generating Samples

Generate sample ct+1 from ct :

38

) , \ | ( ) , ,..., , | ( ... ) , ,..., , | ( ) , ,..., , | (

1 1 1 1 2 1 1 1 3 1 1 2 1 2 2 3 2 1 1 1 1

e c c c P c C e c c c c P c C e c c c c P c C e c c c c P c C

i t i t i i t K t t K t K K t K t t t t K t t t

from sampled        

        

In short, for i=1 to K:

slide-39
SLIDE 39

Collapsed Gibbs Sampler

Input: C X, E=e Output: T samples {ct } Fix evidence E=e, initialize c0 at random

1. For t = 1 to T (compute samples) 2. For i = 1 to N (loop through variables) 3. ci

t+1  P(Ci | ct\ci)

4. End For 5. End For

slide-40
SLIDE 40

Calculation Time

  • Computing P(ci| ct\ci,e) is more expensive

(requires inference)

  • Trading #samples for smaller variance:

– generate more samples with higher covariance – generate fewer samples with lower covariance

  • Must control the time spent computing

sampling probabilities in order to be time- effective!

40

slide-41
SLIDE 41

Exploiting Graph Properties

Recall… computation time is exponential in the adjusted induced width of a graph

  • w-cutset is a subset of variable s.t. when they

are observed, induced width of the graph is w

  • when sampled variables form a w-cutset ,

inference is exp(w) (e.g., using Bucket Tree Elimination)

  • cycle-cutset is a special case of w-cutset

41

Sampling w-cutset  w-cutset sampling!

slide-42
SLIDE 42

What If C=Cycle-Cutset ?

42

} { }, {

9 5 2

X E ,x x c  

X1 X7 X5 X4 X2 X9 X8 X3 X6 X1 X7 X4 X9 X8 X3 X6

P(x2,x5,x9) – can compute using Bucket Elimination P(x2,x5,x9) – computation complexity is O(N)

slide-43
SLIDE 43

Computing Transition Probabilities

43

) , , 1 ( : ) , , ( :

9 3 2 9 3 2

x x x P BE x x x P BE  

X1 X7 X5 X4 X2 X9 X8 X3 X6

) , , 1 ( ) | 1 ( ) , , ( ) | ( ) , , 1 ( ) , , (

9 3 2 3 2 9 3 2 3 2 9 3 2 9 3 2

x x x P x x P x x x P x x P x x x P x x x P             

Compute joint probabilities: Normalize:

slide-44
SLIDE 44

Cutset Sampling-Answering Queries

  • Query: ci C, P(ci |e)=? same as Gibbs:

44

computed while generating sample t using bucket tree elimination compute after generating sample t using bucket tree elimination

T t i t i i

e c c c P T |e c P

1

) , \ | ( 1 ) ( ˆ

T t t i i

,e c x P T |e) (x P

1

) | ( 1

  • Query: xi X\C, P(xi |e)=?
slide-45
SLIDE 45

Cutset Sampling vs. Cutset Conditioning

45

) | ( ) | ( ) ( ) | ( ) | ( 1

) ( ) ( 1

e c P c,e x P T c count c,e x P ,e c x P T |e) (x P

C D c i C D c i T t t i i

    

  

  

  • Cutset Conditioning
  • Cutset Sampling

) | ( ) | (

) (

e c P c,e x P |e) P(x

C D c i i

  

slide-46
SLIDE 46

Cutset Sampling Example

46

                ) ( ) ( ) ( 3 1 ) | ( ) ( ) ( ) (

9 2 5 2 9 1 5 2 9 5 2 9 2 9 2 5 2 3 2 9 1 5 2 2 2 9 5 2 1 2

,x | x x P ,x | x x P ,x | x x P x x P ,x | x x P x ,x | x x P x ,x | x x P x  

X1 X7 X6 X5 X4 X2 X9 X8 X3

Estimating P(x2|e) for sampling node X2 :

Sample 1 Sample 2 Sample 3

slide-47
SLIDE 47

Cutset Sampling Example

47

) , , | ( } , { ) , , | ( } , { ) , , | ( } , {

9 3 5 3 2 3 3 5 3 2 3 9 2 5 2 2 3 2 5 2 2 2 9 1 5 1 2 3 1 5 1 2 1

x x x x P x x c x x x x P x x c x x x x P x x c      

Estimating P(x3 |e) for non-sampled node X3 :

X1 X7 X6 X5 X4 X2 X9 X8 X3

             ) , , | ( ) , , | ( ) , , | ( 3 1 ) | (

9 3 5 3 2 3 9 2 5 2 2 3 9 1 5 1 2 3 9 3

x x x x P x x x x P x x x x P x x P

slide-48
SLIDE 48

CPCS54 Test Results

48

MSE vs. #samples (left) and time (right) Ergodic, |X|=54, D(Xi)=2, |C|=15, |E|=3 Exact Time = 30 sec using Cutset Conditioning

CPCS54, n=54, |C|=15, |E|=3 0.001 0.002 0.003 0.004 1000 2000 3000 4000 5000 # samples

Cutset Gibbs

CPCS54, n=54, |C|=15, |E|=3

0.0002 0.0004 0.0006 0.0008 5 10 15 20 25 Time(sec)

Cutset Gibbs

slide-49
SLIDE 49

CPCS179 Test Results

49

MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry) |X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35 Exact Time = 122 sec using Cutset Conditioning

CPCS179, n=179, |C|=8, |E|=35

0.002 0.004 0.006 0.008 0.01 0.012 100 500 1000 2000 3000 4000 # samples Cutset Gibbs

CPCS179, n=179, |C|=8, |E|=35

0.002 0.004 0.006 0.008 0.01 0.012 20 40 60 80 Time(sec)

Cutset Gibbs

slide-50
SLIDE 50

CPCS360b Test Results

50

MSE vs. #samples (left) and time (right) Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36 Exact Time > 60 min using Cutset Conditioning Exact Values obtained via Bucket Elimination

CPCS360b, n=360, |C|=21, |E|=36

0.00004 0.00008 0.00012 0.00016 200 400 600 800 1000 # samples

Cutset Gibbs

CPCS360b, n=360, |C|=21, |E|=36

0.00004 0.00008 0.00012 0.00016 1 2 3 5 10 20 30 40 50 60 Time(sec)

Cutset Gibbs

slide-51
SLIDE 51

Random Networks

51

MSE vs. #samples (left) and time (right) |X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20 Exact Time = 30 sec using Cutset Conditioning

RANDOM, n=100, |C|=13, |E|=15-20

0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 200 400 600 800 1000 1200

# samples Cutset Gibbs

RANDOM, n=100, |C|=13, |E|=15-20

0.0002 0.0004 0.0006 0.0008 0.001 1 2 3 4 5 6 7 8 9 10 11 Time(sec) Cutset Gibbs

slide-52
SLIDE 52

Coding Networks

Cutset Transforms Non-Ergodic Chain to Ergodic

52

MSE vs. time (right) Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50 Sample Ergodic Subspace U={U1, U2,…Uk} Exact Time = 50 sec using Cutset Conditioning

x1 x2 x3 x4 u1 u2 u3 u4 p1 p2 p3 p4 y4 y3 y2 y1

Coding Networks, n=100, |C|=12-14

0.001 0.01 0.1 10 20 30 40 50 60 Time(sec)

IBP Gibbs Cutset

slide-53
SLIDE 53

Non-Ergodic Hailfinder

53

MSE vs. #samples (left) and time (right) Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0 Exact Time = 2 sec using Loop-Cutset Conditioning

HailFinder, n=56, |C|=5, |E|=1

0.0001 0.001 0.01 0.1 1 1 2 3 4 5 6 7 8 9 10

Time(sec)

Cutset Gibbs

HailFinder, n=56, |C|=5, |E|=1

0.0001 0.001 0.01 0.1 500 1000 1500

# samples Cutset Gibbs

slide-54
SLIDE 54

CPCS360b - MSE

54

cpcs360b, N=360, |E|=[20-34], w*=20, MSE

0.000005 0.00001 0.000015 0.00002 0.000025

200 400 600 800 1000 1200 1400 1600 Time (sec)

Gibbs IBP |C|=26,fw=3 |C|=48,fw=2

MSE vs. Time Ergodic, |X| = 360, |C| = 26, D(Xi)=2 Exact Time = 50 min using BTE

slide-55
SLIDE 55

Cutset Importance Sampling

  • Apply Importance Sampling over cutset C

 

 

 

T t t T t t t

w T c Q e c P T e P

1 1

1 ) ( ) , ( 1 ) ( ˆ

T t t t i i

w c c T e c P

1

) , ( 1 ) | (  

T t t t i i

w e c x P T e x P

1

) , | ( 1 ) | ( 

where P(ct,e) is computed using Bucket Elimination

(Gogate & Dechter, 2005) and (Bidyuk & Dechter, 2006)

slide-56
SLIDE 56

Likelihood Cutset Weighting (LCS)

  • Z=Topological Order{C,E}
  • Generating sample t+1:

56

For End If End ) ,..., | ( Else If : do For

1 1 1 1 1

1

    

    

t i t i t i i i t i i i

z z Z P z e ,z z z E Z Z Z

  • computed while generating

sample t using bucket tree elimination

  • can be memoized for some

number of instances K (based on memory available

KL[P(C|e), Q(C)] ≤ KL[P(X|e), Q(X)]

slide-57
SLIDE 57

Pathfinder 1

57

slide-58
SLIDE 58

Pathfinder 2

58

slide-59
SLIDE 59

Link

59

slide-60
SLIDE 60

Summary

  • i.i.d. samples
  • Unbiased estimator
  • Generates samples fast
  • Samples from Q
  • Reject samples with

zero-weight

  • Improves on cutset
  • Dependent samples
  • Biased estimator
  • Generates samples

slower

  • Samples from `P(X|e)
  • Does not converge in

presence of constraints

  • Improves on cutset

60

Importance Sampling Gibbs Sampling

slide-61
SLIDE 61

CPCS360b

61

cpcs360b, N=360, |LC|=26, w*=21, |E|=15

1.E-05 1.E-04 1.E-03 1.E-02 2 4 6 8 10 12 14

Time (sec) MSE

LW AIS-BN Gibbs LCS IBP

LW – likelihood weighting LCS – likelihood weighting on a cutset

slide-62
SLIDE 62

CPCS422b

62 1.0E-05 1.0E-04 1.0E-03 1.0E-02 10 20 30 40 50 60

MSE Time (sec)

cpcs422b, N=422, |LC|=47, w*=22, |E|=28

LW AIS-BN Gibbs LCS IBP

LW – likelihood weighting LCS – likelihood weighting on a cutset

slide-63
SLIDE 63

Coding Networks

63

coding, N=200, P=3, |LC|=26, w*=21

1.0E-05 1.0E-04 1.0E-03 1.0E-02 1.0E-01 2 4 6 8 10

Time (sec) MSE

LW AIS-BN Gibbs LCS IBP

LW – likelihood weighting LCS – likelihood weighting on a cutset