overview
play

Overview 1. Probabilistic Reasoning/Graphical models 2. Importance - PowerPoint PPT Presentation

Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview 1. Probabilistic


  1. Likelihood Weighting: Sampling Sample in topological order over X ! e e e e e e e e e Clamp evidence, Sample x i  P(X i |pa i ), P(X i |pa i ) is a look-up in CPT! 29

  2. Likelihood Weighting: Proposal Distribution   Q ( X \ E ) P ( X | pa , e ) i i Notice: Q is another Bayesian network  X X \ E i Example :    Given a Bayesian network : P(X , X , X ) P(X ) P(X | X ) P(X | X , X ) and 1 2 3 1 2 1 3 1 2  Evidence X x . 2 2    Q(X , X ) P ( X ) P ( X | X , X x ) 1 3 1 3 1 2 2 Weights :  Given a sample : x ( x ,.., x ) 1 n    P ( x | pa , e ) P ( e | pa ) i i j j P x e ( , )   X X \ E E E   w i j  Q ( x ) P ( x | pa , e ) i i  X X \ E i   P ( e | pa ) j j  E E j 30

  3. Likelihood Weighting: Estimates T 1  ˆ  ( t ) Estimate P(e): P ( e ) w T  t 1 Estimate Posterior Marginals: T  ( t ) ( t ) w g ( x ) ˆ x P ( x , e ) i ˆ    i t 1 P ( x | e ) ˆ i T  P ( e ) ( t ) w  t 1   ( t ) t g ( x ) 1 if x x and equals zero otherwise x i i i 31

  4. Likelihood Weighting • Converges to exact posterior marginals • Generates Samples Fast • Sampling distribution is close to prior (especially if E  Leaf Nodes) • Increasing sampling variance  Convergence may be slow  Many samples with P(x (t) )=0 rejected 32

  5. Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • Error estimation • State-of-the-art importance sampling techniques 33

  6. absolute

  7. Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • State-of-the-art importance sampling techniques 38

  8. Proposal selection • One should try to select a proposal that is as close as possible to the posterior distribution.   2   Var [ w ( z )] 1 P ( z , e )  ˆ      Q Var P ( e ) P ( e ) Q ( z )   Q   T N Q ( z )  z Z P ( z , e )   P ( e ) 0 , to have a zero - variance estimator Q ( z ) P ( z , e )   Q ( z ) P ( e )   Q ( z ) P ( z | e )

  9. Perfect sampling using Bucket Elimination • Algorithm: – Run Bucket elimination on the problem along an ordering o=(X N ,..,X 1 ). – Sample along the reverse ordering: (X 1 ,..,X N ) – At each variable X i , recover the probability P(X i |x 1 ,...,x i-1 ) by referring to the bucket.

  10. Bucket Elimination Query:    Elimination Order: d,e,b,c A P ( a | e 0 ) P ( a , e 0 ) A    B P ( a , e 0 ) P ( a ) P ( b | a ) P ( c | a ) P ( d | a , b ) P ( e | b , c ) B C C  c , b , e 0 , d      P ( a ) P ( c | a ) P ( b | a ) P ( e | b , c ) P ( d | a , b ) D E D E  c b e 0 d Original Functions Messages Bucket Tree D E D,A,B E,B,C   f ( a , b ) P ( d | a , b ) D: P ( d | a , b ) D f D ( a , b ) f E ( b , c ) d   P ( e | b , c ) f E ( b , c ) P ( e 0 | b , c ) E: B  B,A,C  f ( a , c ) P ( b | a ) f ( a , b ) f ( b , c ) P ( b | a ) B: B D E f B ( a , c ) b   P ( c | a ) f ( a ) P ( c | a ) f ( a , c ) C: C B C c C,A   P ( a , e 0 ) p ( A ) f ( a ) A: P ( a ) C f C ( a ) A A Time and space exp(w*) 41

  11. Bucket elimination (BE) Algorithm elim-bel (Dechter 1996)  Elimination operator b bucket B: P(B|A) P(D|B,A) P(e|B,C) B bucket C: P(C|A) h B (A, D, C, e) C h C bucket D: (A, D, e) D h D bucket E: (A, e) E bucket A: P(a) h E (a) A P(e) SP2 42

  12. Sampling from the output of BE (Dechter 2002)    Set A a, D d, C c in the bucket    Sample : B b Q(B | a, e, d) P ( B | a ) P ( d | B , a ) P ( e | b , c ) bucket B: P(B|A) P(D|B,A) P(e|B,C)   Set A a, D d in the bucket bucket C: P(C|A) h B (A, D, C, e)     B Sample : C c Q(C | a, e, d) P ( C | A ) h (a, d, C, e)  Set A a in the bucket bucket D: h C (A, D, e)    C Sample : D d Q(D | a, e) h (a, D, e) bucket E: h D (A, e) Evidence bucket : ignore bucket A: P(A)   E Q(A) P(A) h (A) h E (A)   Sample : A a Q(A) SP2 43

  13. Mini- buckets: “local inference” • Computation in a bucket is time and space exponential in the number of variables involved • Therefore, partition functions in a bucket into “mini - buckets” on smaller number of variables • Can control the size of each “mini - bucket”, yielding polynomial complexity. SP2 44

  14. Mini-Bucket Elimination Space and Time constraints: Mini-buckets Maximum scope size of the new Σ B  Σ B  function generated should be bounded by 2 bucket B: P(e|B,C) P(B|A) P(D|B,A) P(C|A) h B (C,e) bucket C: BE generates a function having scope size 3. So it cannot be used. h B (A,D) bucket D: h C (A,e) bucket E: P(A) h D (A) h E (A) bucket A: Approximation of P(e) 45 45 SP2

  15. Sampling from the output of MBE bucket B: P(e|B,C) P(B|A) P(D|B,A) P(C|A) h B (C,e) bucket C: h B (A,D) bucket D: Sampling is same as in BE-sampling h C (A,e) bucket E: except that now we construct Q from a randomly selected “mini - bucket” h D (A) h E (A) bucket A: 46 46 SP2

  16. IJGP-Sampling (Gogate and Dechter, 2005) • Iterative Join Graph Propagation (IJGP) – A Generalized Belief Propagation scheme (Yedidia et al., 2002) • IJGP yields better approximations of P(X|E) than MBE – (Dechter, Kask and Mateescu, 2002) • Output of IJGP is same as mini-bucket “clusters” • Currently the best performing IS scheme!

  17. Current Research question • Given a Bayesian network with evidence or a Markov network representing function P, generate another Bayesian network representing a function Q (from a family of distributions, restricted by structure) such that Q is closest to P. • Current approaches – Mini-buckets – Ijgp – Both • Experimented, but need to be justified theoretically.

  18. Algorithm: Approximate Sampling 1) Run IJGP or MBE 2) At each branch point compute the edge probabilities by consulting output of IJGP or MBE • Rejection Problem: – Some assignments generated are non solutions

  19. Adaptive Importance Sampling      1 Initial Proposal Q ( Z ) Q ( Z ) Q ( Z | pa ( Z )) ... Q ( Z | pa ( Z )) 1 2 2 n n ˆ   P ( E e ) 0  For i 1 to k do 1 N k Generate samples z ,..., z from Q N 1  ˆ ˆ     i P (E e) P ( E e ) w ( z ) k N  j 1        k 1 k k Update Q Q ( k ) Q Q ' End ˆ  P ( E e ) Re turn k

  20. Adaptive Importance Sampling • General case • Given k proposal distributions • Take N samples out of each distribution • Approximate P(e) k 1    ˆ     P ( e ) Avg weight jth proposal k  j 1

  21. Estimating Q'(z)     ' Q ( Z ) Q ' ( Z ) Q ' ( Z | pa ( Z )) ... Q ' ( Z | pa ( Z )) 1 2 2 n n where each Q' (Z | Z ,.., Z ) i 1 i - 1 is estimated by importance sampling

  22. Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling

  23. Markov Chain x 1 x 2 x 3 x 4 • A Markov chain is a discrete random process with the property that the next state depends only on the current state ( Markov Property ) :    t 1 2 t 1 t t 1 P ( x | x , x ,..., x ) P ( x | x ) • If P(X t |x t-1 ) does not depend on t ( time homogeneous ) and state space is finite, then it is often expressed as a transition function (aka  transition matrix )   P ( X x ) 1 x 54

  24. Example: Drunkard’s Walk • a random walk on the number line where, at each step, the position may change by +1 or −1 with equal probability 1 2 1 2 3   P ( n 1 ) P ( n 1 )  D ( X ) { 0 , 1 , 2 ,...} n 0 . 5 0 . 5 transition matrix P(X) 55

  25. Example: Weather Model rain rain rain sun rain  D ( X ) { rainy , sunny } P ( rainy ) P ( sunny ) rainy 0 . 9 0 . 1 sunny 0 . 5 0 . 5 transition matrix P(X) 56

  26. Multi-Variable System   X { X , X , X }, D ( X ) discrete , finite 1 2 3 i • state is an assignment of values to all the variables t t+1 x 1 x 1 t t+1 x 2 x 2 t t+1 x 3 x 3 x  t t t t { x , x ,..., x } 1 2 n 57

  27. Bayesian Network System • Bayesian Network is a representation of the joint probability distribution over 2 or more variables t t+1 X 1 x 1 X 1 t t+1 X 2 x 2 X 2 X 3 t t+1 X 3 x 3 X  x  { X , X , X } t t t t { x , x , x } 1 2 3 1 2 3 58

  28. Stationary Distribution Existence • If the Markov chain is time-homogeneous, then the vector  (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point” ), if its entries sum up to 1 and satisfy:     ( x ) ( x ) P ( x | x ) i j i j  x D ( X ) i • Finite state space Markov chain has a unique stationary distribution if and only if: – The chain is irreducible – All of its states are positive recurrent 59

  29. Irreducible • A state x is irreducible if under the transition rule one has nonzero probability of moving from x to any other state and then coming back in a finite number of steps • If one state is irreducible, then all the states must be irreducible (Liu, Ch. 12, pp. 249, Def. 12.1.1) 60

  30. Recurrent • A state x is recurrent if the chain returns to x with probability 1 • Let M( x ) be the expected number of steps to return to state x • State x is positive recurrent if M( x ) is finite The recurrent states in a finite state chain are positive recurrent . 61

  31. Stationary Distribution Convergence • Consider infinite Markov chain:   ( n ) n 0 0 n P P ( x | x ) P P • If the chain is both irreducible and aperiodic , then:   ( n ) lim P   n • Initial state is not important in the limit “The most useful feature of a “good” Markov chain is its fast forgetfulness of its past…” (Liu, Ch. 12.1) 62

  32. Aperiodic • Define d(i) = g.c.d.{n > 0 | it is possible to go from i to i in n steps}. Here, g.c.d. means the greatest common divisor of the integers in the set. If d(i)=1 for  i , then chain is aperiodic • Positive recurrent, aperiodic states are ergodic 63

  33. Markov Chain Monte Carlo • How do we estimate P(X) , e.g., P(X|e) ? • Generate samples that form Markov Chain with stationary distribution  =P(X|e) • Estimate  from samples (observed states): visited states x 0 ,…, x n can be viewed as “samples” from distribution  T 1     t ( x ) ( x , x ) T  t 1    lim ( x )   T 64

  34. MCMC Summary • Convergence is guaranteed in the limit • Initial state is not important, but… typically, we throw away first K samples - “ burn-in ” • Samples are dependent, not i.i.d. • Convergence ( mixing rate ) may be slow • The stronger correlation between states, the slower convergence! 65

  35. Gibbs Sampling (Geman&Geman,1984) • Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables • Sample new variable value one variable at a time from the variable’s conditional distribution:   t t t t t P ( X ) P ( X | x ,.., x , x ,..., x } P ( X | x \ x )   i i 1 i 1 i 1 n i i • Samples form a Markov chain with stationary distribution P(X|e) 66

  36. Gibbs Sampling: Illustration The process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk): In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables X i ).

  37. Ordered Gibbs Sampler Generate sample x t+1 from x t :    t 1 t t t X x P ( X | x , x ,..., x , e ) Process 1 1 1 2 3 N     All t 1 t 1 t t X x P ( X | x , x ,..., x , e ) 2 2 2 1 3 N Variables ... In Some Order       t 1 t 1 t 1 t 1 X x P ( X | x , x ,..., x , e )  N N N 1 2 N 1 In short, for i=1 to N:    t 1 t X x sampled from P ( X | x \ x , e ) i i i i 68

  38. Transition Probabilities in BN Given Markov blanket (parents, children, and their parents), X i is independent of all other nodes X i Markov blanket :     markov ( X ) pa ch ( pa ) i i i j  X j ch j  t t P ( X | x \ x ) P ( X | markov ) : i i i i   t P ( x | x \ x ) P ( x | pa ) P ( x | pa ) i i i i j j  X j ch i Computation is linear in the size of Markov blanket! 69

  39. Ordered Gibbs Sampling Algorithm (Pearl,1988) Input: X, E=e Output: T samples {x t } Fix evidence E=e, initialize x 0 at random 1. For t = 1 to T (compute samples) 2. For i = 1 to N (loop through variables) t+1  P(X i | markov i t ) 3. x i 4. End For 5. End For

  40. Gibbs Sampling Example - BN   X { X , X ,..., X }, E { X } 1 2 9 9 X 1 = x 1 0 X1 X3 X6 X 6 = x 6 0 X 2 = x 2 0 X2 X5 X8 X 7 = x 7 0 X 3 = x 3 0 X9 X 8 = x 8 0 X4 X7 X 4 = x 4 0 X 5 = x 5 0 71

  41. Gibbs Sampling Example - BN   X { X , X ,..., X }, E { X } 1 2 9 9 X1 X3 X6 x  1 0 0 P ( X | x ,..., x , x ) 1 1 2 8 9 X2 X5 X8 x  1 1 0 P ( X | x ,..., x , x ) 2 2 1 8 9  X9 X4 X7 72

  42. Answering Queries P(x i |e) = ? • Method 1 : count # of samples where X i = x i ( histogram estimator ): Dirac delta f-n T 1     t P ( X x ) ( x , x ) i i i T  t 1 • Method 2 : average probability ( mixture estimator ): T 1     t P ( X x ) P ( X x | markov ) i i i i i T  t 1 • Mixture estimator converges faster (consider estimates for the unobserved values of X i ; prove via Rao-Blackwell theorem)

  43. Rao-Blackwell Theorem Rao-Blackwell Theorem: Let random variable set X be composed of two groups of variables, R and L. Then, for the joint distribution  (R,L) and function g, the following result applies  Var [ E { g ( R ) | L } Var [ g ( R )] for a function of interest g, e.g., the mean or covariance ( Casella&Robert,1996, Liu et. al. 1995 ). • theorem makes a weak promise, but works well in practice! • improvement depends the choice of R and L 74

  44. Importance vs. Gibbs ˆ  t x P ( X | e ) Gibbs: ˆ       T P ( X | e ) P ( X | e ) T 1   ˆ t g ( X ) g ( x ) T  t 1  Importance: t w t X Q ( X | e ) t t T 1 g ( x ) P ( x )   g t T Q ( x )  t 1

  45. Gibbs Sampling: Convergence • Sample from ` P(X|e)  P(X|e) • Converges iff chain is irreducible and ergodic • Intuition - must be able to explore all states: – if X i and X j are strongly correlated, X i =0  X j =0, then, we cannot explore states with X i =1 and X j =1 • All conditions are satisfied when all probabilities are positive • Convergence rate can be characterized by the second eigen-value of transition matrix 76

  46. Gibbs: Speeding Convergence Reduce dependence between samples (autocorrelation) • Skip samples • Randomize Variable Sampling Order • Employ blocking (grouping) • Multiple chains Reduce variance (cover in the next section) 77

  47. Blocking Gibbs Sampler • Sample several variables together, as a block • Example: Given three variables X,Y,Z , with domains of size 2, group Y and Z together to form a variable W ={ Y,Z } with domain size 4. Then, given sample ( x t , y t , z t ), compute next sample:    t 1 t t t x P ( X | y , z ) P ( w )       t 1 t 1 t 1 t 1 ( y , z ) w P ( Y , Z | x ) + Can improve convergence greatly when two variables are strongly correlated! - Domain of the block variable grows exponentially with the #variables in a block! 78

  48. Gibbs: Multiple Chains • Generate M chains of size K • Each chain produces independent estimate P m : K 1   t P ( x | e ) P ( x | x \ x ) m i i i K  t 1 • Estimate P(x i |e) as average of P m (x i |e) : M 1      ˆ    P P m M  i 1 Treat P m as independent random variables. 79

  49. Gibbs Sampling Summary • Markov Chain Monte Carlo method (Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994) • Samples are dependent , form Markov Chain • Sample from which converges to P ( X | e ) P ( X | e ) • Guaranteed to converge when all P > 0 • Methods to improve convergence: – Blocking – Rao-Blackwellised 80

  50. Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling

  51. Sampling: Performance • Gibbs sampling – Reduce dependence between samples • Importance sampling – Reduce variance • Achieve both by sampling a subset of variables and integrating out the rest (reduce dimensionality), aka Rao-Blackwellisation • Exploit graph structure to manage the extra cost 82

  52. Smaller Subset State-Space • Smaller state-space is easier to cover X  X  { X , X , X , X } { X 1 X , } 1 2 3 4 2   D ( X ) 64 D ( X ) 16 83

  53. Smoother Distribution P(X 1 ,X 2 ,X 3 ,X 4 ) P(X 1 ,X 2 ) 0-0.1 0.1-0.2 0.2-0.26 0-0.1 0.1-0.2 0.2-0.26 0.2 0.2 1 0.1 11 0.1 0 10 0 0 01 0 00 01 00 1 10 11 84

  54. Speeding Up Convergence • Mean Squared Error of the estimator:       2 MSE P BIAS Var P Q Q • In case of unbiased estimator, BIAS=0     ˆ ˆ ˆ 2      2 MSE [ P ] Var [ P ] E P E [ P ]   Q Q Q Q • Reduce variance  speed up convergence ! 85

  55. Rao-Blackwellisation   X R L 1    ˆ 1 T  g ( x ) { h ( x ) h ( x )} T 1 ~    1 T  g ( x ) { E [ h ( x ) | l ] E [ h ( x ) | l ]} T   Var { g ( x )} Var { E [ g ( x ) | l ]} E {var[ g ( x ) | l ]}  Var { g ( x )} Var { E [ g ( x ) | l ]} Var { h ( x )} Var { E [ h ( x ) | l ]} ~    ˆ Var { g ( x )} Var { g ( x )} T T Liu, Ch.2.3 86

  56. Rao-Blackwellisation “Carry out analytical computation as much as possible” - Liu • X=R  L • Importance Sampling: P ( R , L ) P ( R )  Var { } Var { } Q Q Q ( R , L ) Q ( R ) Liu, Ch.2.5.5 • Gibbs Sampling: – autocovariances are lower (less correlation between samples) – if X i and X j are strongly correlated, X i =0  X j =0, only include one fo them into a sampling set 87

  57. Blocking Gibbs Sampler vs. Collapsed X Y Z • Standard Gibbs: P ( x | y , z ), P ( y | x , z ), P ( z | x , y ) (1) Faster • Blocking: Convergence P ( x | y , z ), P ( y , z | x ) (2) • Collapsed: (3) P ( x | y ), P ( y | x ) 88

  58. Collapsed Gibbs Sampling Generating Samples Generate sample c t+1 from c t :    t 1 t t t C c P ( c | c , c ,..., c , e ) 1 1 1 2 3 K     t 1 t 1 t t C c P ( c | c , c ,..., c , e ) 2 2 2 1 3 K ...       t 1 t 1 t 1 t 1 C c P ( c | c , c ,..., c , e )  K K K 1 2 K 1 In short, for i=1 to K:    t 1 t C c sampled from P ( c | c \ c , e ) i i i i 89

  59. Collapsed Gibbs Sampler Input: C  X, E=e Output: T samples {c t } Fix evidence E=e, initialize c 0 at random 1. For t = 1 to T (compute samples) 2. For i = 1 to N (loop through variables) t+1  P(C i | c t \ c i ) 3. c i 4. End For 5. End For

  60. Calculation Time • Computing P(c i | c t \ c i ,e) is more expensive (requires inference) • Trading #samples for smaller variance: – generate more samples with higher covariance – generate fewer samples with lower covariance • Must control the time spent computing sampling probabilities in order to be time- effective! 91

  61. Exploiting Graph Properties Recall… computation time is exponential in the adjusted induced width of a graph • w -cutset is a subset of variable s.t. when they are observed, induced width of the graph is w • when sampled variables form a w -cutset , inference is exp( w ) (e.g., using Bucket Tree Elimination ) • cycle-cutset is a special case of w -cutset Sampling w -cutset  w-cutset sampling! 92

  62. What If C=Cycle-Cutset ?   0 0 0 c { x ,x }, E { X } 2 5 9 P(x 2 ,x 5 ,x 9 ) – can compute using Bucket Elimination X1 X2 X3 X1 X3 X4 X5 X6 X4 X6 X9 X9 X7 X8 X7 X8 P(x 2 ,x 5 ,x 9 ) – computation complexity is O(N) 93

  63. Computing Transition Probabilities Compute joint probabilities: X1 X2 X3  BE : P ( x 0 , x , x ) 2 3 9  BE : P ( x 1 , x , x ) X4 X5 X6 2 3 9 Normalize: X9 X7 X8      P ( x 0 , x , x ) P ( x 1 , x , x ) 2 3 9 2 3 9     P ( x 0 | x ) P ( x 0 , x , x ) 2 3 2 3 9     P ( x 1 | x ) P ( x 1 , x , x ) 2 3 2 3 9 94

  64. Cutset Sampling-Answering Queries • Query:  c i  C, P(c i |e)=? same as Gibbs: T 1  ˆ  t P ( c |e ) P ( c | c \ c , e ) i i i T  t 1 computed while generating sample t using bucket tree elimination • Query:  x i  X\C, P(x i |e)=? T 1   t P (x |e) P ( x | c ,e ) i i T  t 1 compute after generating sample t using bucket tree elimination 95

  65. Cutset Sampling vs. Cutset Conditioning • Cutset Conditioning    P(x |e) P ( x | c,e ) P ( c | e ) i i  c D ( C ) • Cutset Sampling T 1   t P (x |e) P ( x | c ,e ) i i T  t 1 count ( c )    P ( x | c,e ) i T  c D ( C )    P ( x | c,e ) P ( c | e ) i  c D ( C ) 96

  66. Cutset Sampling Example Estimating P(x 2 |e) for sampling node X 2 : Sample 1  1 0 x P ( x | x ,x ) 2 2 5 9  X1 X2 X3 Sample 2  2 1 x P ( x | x ,x ) 2 2 5 9  X4 X5 X6 Sample 3  3 2 x P ( x | x ,x ) 2 2 5 9   0 P ( x | x ,x ) 2 5 9   1 X9 X7 X8   1   P ( x | x ) P ( x | x ,x ) 2 9 2 5 9 3    2 P ( x | x ,x )   2 5 9 97

  67. Cutset Sampling Example Estimating P(x 3 |e) for non-sampled node X 3 :   1 1 1 1 1 c { x , x } P ( x | x , x , x ) 2 5 3 2 5 9 X1 X2 X3   2 2 2 2 2 c { x , x } P ( x | x , x , x ) 2 5 3 2 5 9   3 3 3 3 3 c { x , x } P ( x | x , x , x ) 2 5 3 2 5 9 X4 X5 X6   1 1 P ( x | x , x , x ) 3 2 5 9   1   2 2   P ( x | x ) P ( x | x , x , x ) 3 9 3 2 5 9 3   X9  3 3 X7 X8 P ( x | x , x , x )   3 2 5 9 98

  68. CPCS54 Test Results CPCS54, n=54, |C|=15, |E|=3 CPCS54, n=54, |C|=15, |E|=3 Cutset Gibbs Cutset Gibbs 0.004 0.0008 0.003 0.0006 0.002 0.0004 0.001 0.0002 0 0 0 1000 2000 3000 4000 5000 0 5 10 15 20 25 # samples Time(sec) MSE vs. #samples (left) and time (right) Ergodic, |X|=54, D(X i )=2, |C|=15, |E|=3 Exact Time = 30 sec using Cutset Conditioning 99

  69. CPCS179 Test Results CPCS179, n=179, |C|=8, |E|=35 CPCS179, n=179, |C|=8, |E|=35 Cutset Gibbs Cutset Gibbs 0.012 0.012 0.01 0.01 0.008 0.008 0.006 0.006 0.004 0.004 0.002 0.002 0 0 100 500 1000 2000 3000 4000 0 20 40 60 80 # samples Time(sec) MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry) |X| = 179, |C| = 8, 2<= D(X i )<=4, |E| = 35 Exact Time = 122 sec using Cutset Conditioning 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend