Reasoning with graphical models
Slides Set 9(part b):
Rina Dechter
slides 9b 276 2020
Sampling Techniques for Probabilistic and Deterministic Graphical models
(Reading” Darwiche chapter 15, cutset-sampling paper posted)
Slides Set 9(part b): Sampling Techniques for Probabilistic and - - PowerPoint PPT Presentation
Reasoning with graphical models Slides Set 9(part b): Sampling Techniques for Probabilistic and Deterministic Graphical models Rina Dechter (Reading Darwiche chapter 15, cutset-sampling paper posted) slides 9b 276 2020 Overview 1.
slides 9b 276 2020
(Reading” Darwiche chapter 15, cutset-sampling paper posted)
slides 9b 276 2020
x1 x2 x3 x4
1 1 2 1
t t t t
x
slides 9b 276 2020
1 2 3
transition matrix P(X) 1 2
slides 9b 276 2020
transition matrix P(X)
rain rain rain rain sun
slides 9b 276 2020
x1
t
x2
t
x3
t
x1
t+1
x2
t+1
x3
t+1
2 1 t n t t t
i
3 2 1
slides 9b 276 2020
X1
t
X2
t
X3
t
3 2 1 t t t t
X1 X2 X3
3 2 1
x1
t+1
x2
t+1
x3
t+1 slides 9b 276 2020
) (
X D x j i j i
i
slides 9b 276 2020
(Liu, Ch. 12, pp. 249, Def. 12.1.1)
slides 9b 276 2020
The recurrent states in a finite state chain are positive recurrent .
slides 9b 276 2020
n n n
) (
) (
n n
slides 9b 276 2020
slides 9b 276 2020
1 t T t
T
slides 9b 276 2020
slides 9b 276 2020
1 1 1 i t i t n t i t i t i i
slides 9b 276 2020
The process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk): In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables Xi).
slides 9b 276 2020
1 1 1 1 2 1 1 1 3 1 1 2 1 2 2 3 2 1 1 1 1
i t i t i i t N t t N t N N t N t t t t N t t t
Process All Variables In Some Order
slides 9b 276 2020
: ) | ( ) \ | (
t i i i t i
markov X P x x X P
i j ch
X j j i i i t i
pa x P pa x P x x x P ) | ( ) | ( ) \ | ( ) ( ) (
j j ch
X j i i i
pa ch pa X markov
Xi
Given Markov blanket (parents, children, and their parents), Xi is independent of all other nodes
Computation is linear in the size of Markov blanket! U U U
slides 9b 276 2020
t+1 P(Xi | markovi t)
slides 9b 276 2020
X1 X4 X8 X5 X2 X3 X9 X7 X6
} { }, ,..., , {
9 9 2 1
X E X X X X
slides 9b 276 2020
X1 X4 X8 X5 X2 X3 X9 X7 X6
9 8 2 1 1 1
} { }, ,..., , {
9 9 2 1
X E X X X X
9 8 1 1 2 1 2
slides 9b 276 2020
T t t i i i i i
1
T t t i i i
1
Dirac delta f-n
slides 9b 276 2020
T t t t t t T t t T t
1 1
wt
slides 9b 276 2020
slides 9b 276 2020
slides 9b 276 2020
1 1 1 1 1
t t t t t t t t
slides 9b 276 2020
M i m
1
K t i t i i m
1
slides 9b 276 2020
(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)
– Blocking – Rao-Blackwellised
) | ( e X P
slides 9b 276 2020
slides 9b 276 2020
slides 9b 276 2020
slides 9b 276 2020
slides 9b 276 2020
slides 9b 276 2020
4 3 2 1
2 1 X
slides 9b 276 2020
00 01 10 11 0.1 0.2 00 01 10 11
P(X1,X2,X3,X4)
0-0.1 0.1-0.2 0.2-0.26 1 0.1 0.2 1
P(X1,X2)
0-0.1 0.1-0.2 0.2-0.26
slides 9b 276 2020
Q Q
2
2 2
Q Q Q Q
slides 9b 276 2020
1 1
T T
Liu, Ch.2.3
U
slides 9b 276 2020
X Y Z
Faster Convergence
slides 9b 276 2020
1 1 1 1 2 1 1 1 3 1 1 2 1 2 2 3 2 1 1 1 1
i t i t i i t K t t K t K K t K t t t t K t t t
slides 9b 276 2020
t+1 P(Ci | ct\ci)
slides 9b 276 2020
slides 9b 276 2020
slides 9b 276 2020
} { }, {
9 5 2
X E ,x x c
X1 X7 X5 X4 X2 X9 X8 X3 X6 X1 X7 X4 X9 X8 X3 X6
P(x2,x5,x9) – can compute using Bucket Elimination (probability of evidence) P(x2,x5,x9) – computation complexity is O(N)
slides 9b 276 2020
9 3 2 9 3 2
X1 X7 X5 X4 X2 X9 X8 X3 X6
9 3 2 3 2 9 3 2 3 2 9 3 2 9 3 2
slides 9b 276 2020
computed while generating sample t using bucket tree elimination compute after generating sample t using bucket tree elimination
T t i t i i
1
T t t i i
1
slides 9b 276 2020
) ( ) ( 1
C D c i C D c i T t t i i
) (
C D c i i
slides 9b 276 2020
) ( ) ( ) ( 3 1 ) | ( ) ( ) ( ) (
9 2 5 2 9 1 5 2 9 5 2 9 2 9 2 5 2 3 2 9 1 5 2 2 2 9 5 2 1 2
,x | x x P ,x | x x P ,x | x x P x x P ,x | x x P x ,x | x x P x ,x | x x P x
X1 X7 X6 X5 X4 X2 X9 X8 X3
Sample 1 Sample 2 Sample 3
slides 9b 276 2020
) , , | ( } , { ) , , | ( } , { ) , , | ( } , {
9 3 5 3 2 3 3 5 3 2 3 9 2 5 2 2 3 2 5 2 2 2 9 1 5 1 2 3 1 5 1 2 1
x x x x P x x c x x x x P x x c x x x x P x x c
X1 X7 X6 X5 X4 X2 X9 X8 X3
) , , | ( ) , , | ( ) , , | ( 3 1 ) | (
9 3 5 3 2 3 9 2 5 2 2 3 9 1 5 1 2 3 9 3
x x x x P x x x x P x x x x P x x P
slides 9b 276 2020
MSE vs. #samples (left) and time (right) Ergodic, |X|=54, D(Xi)=2, |C|=15, |E|=3 Exact Time = 30 sec using Cutset Conditioning
CPCS54, n=54, |C|=15, |E|=3 0.001 0.002 0.003 0.004 1000 2000 3000 4000 5000 # samples
Cutset Gibbs
CPCS54, n=54, |C|=15, |E|=3
0.0002 0.0004 0.0006 0.0008 5 10 15 20 25 Time(sec)
Cutset Gibbs
slides 9b 276 2020
MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry) |X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35 Exact Time = 122 sec using Cutset Conditioning
CPCS179, n=179, |C|=8, |E|=35
0.002 0.004 0.006 0.008 0.01 0.012 100 500 1000 2000 3000 4000 # samples Cutset Gibbs
CPCS179, n=179, |C|=8, |E|=35
0.002 0.004 0.006 0.008 0.01 0.012 20 40 60 80 Time(sec)
Cutset Gibbs
slides 9b 276 2020
MSE vs. #samples (left) and time (right) Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36 Exact Time > 60 min using Cutset Conditioning Exact Values obtained via Bucket Elimination
CPCS360b, n=360, |C|=21, |E|=36
0.00004 0.00008 0.00012 0.00016 200 400 600 800 1000 # samples
Cutset Gibbs
CPCS360b, n=360, |C|=21, |E|=36
0.00004 0.00008 0.00012 0.00016 1 2 3 5 10 20 30 40 50 60 Time(sec)
Cutset Gibbs
slides 9b 276 2020
MSE vs. #samples (left) and time (right) |X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20 Exact Time = 30 sec using Cutset Conditioning
RANDOM, n=100, |C|=13, |E|=15-20
0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 200 400 600 800 1000 1200
# samples Cutset Gibbs
RANDOM, n=100, |C|=13, |E|=15-20
0.0002 0.0004 0.0006 0.0008 0.001 1 2 3 4 5 6 7 8 9 10 11 Time(sec) Cutset Gibbs
slides 9b 276 2020
Cutset Transforms Non-Ergodic Chain to Ergodic
MSE vs. time (right) Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50 Sample Ergodic Subspace U={U1, U2,…Uk} Exact Time = 50 sec using Cutset Conditioning
x1 x2 x3 x4 u1 u2 u3 u4 p1 p2 p3 p4 y4 y3 y2 y1
Coding Networks, n=100, |C|=12-14
0.001 0.01 0.1 10 20 30 40 50 60 Time(sec)
IBP Gibbs Cutset
slides 9b 276 2020
MSE vs. #samples (left) and time (right) Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0 Exact Time = 2 sec using Loop-Cutset Conditioning
HailFinder, n=56, |C|=5, |E|=1
0.0001 0.001 0.01 0.1 1 1 2 3 4 5 6 7 8 9 10
Time(sec)
Cutset Gibbs
HailFinder, n=56, |C|=5, |E|=1
0.0001 0.001 0.01 0.1 500 1000 1500
# samples Cutset Gibbs
slides 9b 276 2020
cpcs360b, N=360, |E|=[20-34], w*=20, MSE
0.000005 0.00001 0.000015 0.00002 0.000025
200 400 600 800 1000 1200 1400 1600 Time (sec)
Gibbs IBP |C|=26,fw=3 |C|=48,fw=2
MSE vs. Time Ergodic, |X| = 360, |C| = 26, D(Xi)=2 Exact Time = 50 min using BTE
slides 9b 276 2020
T t t T t t t
w T c Q e c P T e P
1 1
1 ) ( ) , ( 1 ) ( ˆ
T t t t i i
w c c T e c P
1
) , ( 1 ) | (
T t t t i i
w e c x P T e x P
1
) , | ( 1 ) | (
(Gogate & Dechter, 2005) and (Bidyuk & Dechter, 2006)
slides 9b 276 2020
slides 9b 276 2020
slides 9b 276 2020
slides 9b 276 2020
Importance Sampling Gibbs Sampling
slides 9b 276 2020
cpcs360b, N=360, |LC|=26, w*=21, |E|=15
1.E-05 1.E-04 1.E-03 1.E-02 2 4 6 8 10 12 14
Time (sec) MSE
LW AIS-BN Gibbs LCS IBP
LW – likelihood weighting LCS – likelihood weighting on a cutset
slides 9b 276 2020
1.0E-05 1.0E-04 1.0E-03 1.0E-02 10 20 30 40 50 60
MSE Time (sec)
cpcs422b, N=422, |LC|=47, w*=22, |E|=28
LW AIS-BN Gibbs LCS IBP
LW – likelihood weighting LCS – likelihood weighting on a cutset
slides 9b 276 2020
coding, N=200, P=3, |LC|=26, w*=21
1.0E-05 1.0E-04 1.0E-03 1.0E-02 1.0E-01 2 4 6 8 10
Time (sec) MSE
LW AIS-BN Gibbs LCS IBP
LW – likelihood weighting LCS – likelihood weighting on a cutset
slides 9b 276 2020