Sampling Techniques for Probabilistic and Deterministic Graphical models
ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter
Reading” Darwiche chapter 15, related papers
slides11b 828X 2019
Sampling Techniques for Probabilistic and Deterministic Graphical - - PowerPoint PPT Presentation
Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2018 Bozhena Bidyuk Rina Dechter Reading Darwiche chapter 15, related papers slides11b 828X 2019 Algorithms for Reasoning with graphical models
Reading” Darwiche chapter 15, related papers
slides11b 828X 2019
slides11b 828X 2019
(Reading” Darwiche chapter 15, cutset‐sampling paper posted)
slides11b 828X 2019
x1 x2 x3 x4
1 1 2 1
t t t t
x
slides11b 828X 2019
1 2 3
transition matrix P(X) 1 2
slides11b 828X 2019
transition matrix P(X)
rain rain rain rain sun
slides11b 828X 2019
x1
t
x2
t
x3
t
x1
t+1
x2
t+1
x3
t+1
2 1 t n t t t
i
3 2 1
slides11b 828X 2019
X1
t
X2
t
X3
t
3 2 1 t t t t
X1 X2 X3
3 2 1
x1
t+1
x2
t+1
x3
t+1 slides11b 828X 2019
) (
X D x j i j i
i
slides11b 828X 2019
(Liu, Ch. 12, pp. 249, Def. 12.1.1)
slides11b 828X 2019
The recurrent states in a finite state chain are positive recurrent .
slides11b 828X 2019
n n n
) (
) (
n n
slides11b 828X 2019
slides11b 828X 2019
1 t T t
T
slides11b 828X 2019
slides11b 828X 2019
1 1 1 i t i t n t i t i t i i
slides11b 828X 2019
The process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk): In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables Xi).
slides11b 828X 2019
1 1 1 1 2 1 1 1 3 1 1 2 1 2 2 3 2 1 1 1 1
i t i t i i t N t t N t N N t N t t t t N t t t
Process All Variables In Some Order
slides11b 828X 2019
: ) | ( ) \ | (
t i i i t i
markov X P x x X P
i j ch
X j j i i i t i
j j ch
X j i i i
Xi
Given Markov blanket (parents, children, and their parents), Xi is independent of all other nodes
Computation is linear in the size of Markov blanket! U U U
slides11b 828X 2019
t+1 P(Xi | markovi t)
slides11b 828X 2019
X1 X4 X8 X5 X2 X3 X9 X7 X6
} { }, ,..., , {
9 9 2 1
X E X X X X
slides11b 828X 2019
X1 X4 X8 X5 X2 X3 X9 X7 X6
9 8 2 1 1 1
} { }, ,..., , {
9 9 2 1
X E X X X X
9 8 1 1 2 1 2
slides11b 828X 2019
T t t i i i i i
1
T t t i i i
1
Dirac delta f-n
slides11b 828X 2019
slides11b 828X 2019
T t t t t t T t t T t
1 1
wt
slides11b 828X 2019
slides11b 828X 2019
slides11b 828X 2019
1 1 1 1 1
t t t t t t t t
slides11b 828X 2019
M i m
1
K t i t i i m
1
slides11b 828X 2019
(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)
– Blocking – Rao‐Blackwellised
) | ( e X P ) | ( e X P
slides11b 828X 2019
slides11b 828X 2019
slides11b 828X 2019
4 3 2 1
2 1 X
slides11b 828X 2019
00 01 10 11 0.1 0.2 00 01 10 11
P(X1,X2,X3,X4)
0‐0.1 0.1‐0.2 0.2‐0.26 1 0.1 0.2 1
P(X1,X2)
0‐0.1 0.1‐0.2 0.2‐0.26
slides11b 828X 2019
Q Q
2
2 2
Q Q Q Q
slides11b 828X 2019
1 1
T T
Liu, Ch.2.3
U
slides11b 828X 2019
Q Q
Liu, Ch.2.5.5 “Carry out analytical computation as much as possible” ‐ Liu
slides11b 828X 2019
X Y Z
Faster Convergence
slides11b 828X 2019
1 1 1 1 2 1 1 1 3 1 1 2 1 2 2 3 2 1 1 1 1
i t i t i i t K t t K t K K t K t t t t K t t t
slides11b 828X 2019
t+1 P(Ci | ct\ci)
slides11b 828X 2019
slides11b 828X 2019
slides11b 828X 2019
} { }, {
9 5 2
X E ,x x c
X1 X7 X5 X4 X2 X9 X8 X3 X6 X1 X7 X4 X9 X8 X3 X6
P(x2,x5,x9) – can compute using Bucket Elimination (probability of evidence) P(x2,x5,x9) – computation complexity is O(N)
slides11b 828X 2019
9 3 2 9 3 2
X1 X7 X5 X4 X2 X9 X8 X3 X6
9 3 2 3 2 9 3 2 3 2 9 3 2 9 3 2
slides11b 828X 2019
computed while generating sample t using bucket tree elimination compute after generating sample t using bucket tree elimination
T t i t i i
1
T t t i i
1
slides11b 828X 2019
) ( ) ( 1
C D c i C D c i T t t i i
) (
C D c i i
slides11b 828X 2019
) ( ) ( ) ( 3 1 ) | ( ) ( ) ( ) (
9 2 5 2 9 1 5 2 9 5 2 9 2 9 2 5 2 3 2 9 1 5 2 2 2 9 5 2 1 2
,x | x x P ,x | x x P ,x | x x P x x P ,x | x x P x ,x | x x P x ,x | x x P x
X1 X7 X6 X5 X4 X2 X9 X8 X3
Sample 1 Sample 2 Sample 3
slides11b 828X 2019
) , , | ( } , { ) , , | ( } , { ) , , | ( } , {
9 3 5 3 2 3 3 5 3 2 3 9 2 5 2 2 3 2 5 2 2 2 9 1 5 1 2 3 1 5 1 2 1
x x x x P x x c x x x x P x x c x x x x P x x c
X1 X7 X6 X5 X4 X2 X9 X8 X3
) , , | ( ) , , | ( ) , , | ( 3 1 ) | (
9 3 5 3 2 3 9 2 5 2 2 3 9 1 5 1 2 3 9 3
x x x x P x x x x P x x x x P x x P
slides11b 828X 2019
MSE vs. # samples (left) and time (right) Ergodic, | X| = 54, D(Xi)= 2, | C| = 15, | E| = 3 Exact Time = 30 sec using Cutset Conditioning
CPCS54, n=54, |C|=15, |E|=3 0.001 0.002 0.003 0.004 1000 2000 3000 4000 5000 # samples
Cutset Gibbs
CPCS54, n=54, |C|=15, |E|=3
0.0002 0.0004 0.0006 0.0008 5 10 15 20 25 Time(sec)
Cutset Gibbs
slides11b 828X 2019
MSE vs. # samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry) | X| = 179, | C| = 8, 2< = D(Xi)< = 4, | E| = 35 Exact Time = 122 sec using Cutset Conditioning
CPCS179, n=179, |C|=8, |E|=35
0.002 0.004 0.006 0.008 0.01 0.012 100 500 1000 2000 3000 4000 # samples Cutset Gibbs
CPCS179, n=179, |C|=8, |E|=35
0.002 0.004 0.006 0.008 0.01 0.012 20 40 60 80 Time(sec)
Cutset Gibbs
slides11b 828X 2019
MSE vs. # samples (left) and time (right) Ergodic, | X| = 360, D(Xi)= 2, | C| = 21, | E| = 36 Exact Time > 60 min using Cutset Conditioning Exact Values obtained via Bucket Elimination
CPCS360b, n=360, |C|=21, |E|=36
0.00004 0.00008 0.00012 0.00016 200 400 600 800 1000 # samples
Cutset Gibbs
CPCS360b, n=360, |C|=21, |E|=36
0.00004 0.00008 0.00012 0.00016 1 2 3 5 10 20 30 40 50 60 Time(sec)
Cutset Gibbs
slides11b 828X 2019
MSE vs. # samples (left) and time (right) | X| = 100, D(Xi) = 2,| C| = 13, | E| = 15-20 Exact Time = 30 sec using Cutset Conditioning
RANDOM, n=100, |C|=13, |E|=15-20
0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 200 400 600 800 1000 1200
# samples Cutset Gibbs
RANDOM, n=100, |C|=13, |E|=15-20
0.0002 0.0004 0.0006 0.0008 0.001 1 2 3 4 5 6 7 8 9 10 11 Time(sec) Cutset Gibbs
slides11b 828X 2019
Cutset Transforms Non‐Ergodic Chain to Ergodic
MSE vs. time (right) Non-Ergodic, | X| = 100, D(Xi)= 2, | C| = 13-16, | E| = 50 Sample Ergodic Subspace U= { U1, U2,… Uk} Exact Time = 50 sec using Cutset Conditioning
x1 x2 x3 x4 u1 u2 u3 u4 p1 p2 p3 p4 y4 y3 y2 y1
Coding Networks, n=100, |C|=12-14
0.001 0.01 0.1 10 20 30 40 50 60 Time(sec)
IBP Gibbs Cutset
slides11b 828X 2019
MSE vs. # samples (left) and time (right) Non-Ergodic, | X| = 56, | C| = 5, 2 < = D(Xi) < = 11, | E| = 0 Exact Time = 2 sec using Loop-Cutset Conditioning
HailFinder, n=56, |C|=5, |E|=1
0.0001 0.001 0.01 0.1 1 1 2 3 4 5 6 7 8 9 10
Time(sec)
Cutset Gibbs
HailFinder, n=56, |C|=5, |E|=1
0.0001 0.001 0.01 0.1 500 1000 1500
# samples Cutset Gibbs
slides11b 828X 2019
cpcs360b, N=360, |E|=[20-34], w*=20, MSE
0.000005 0.00001 0.000015 0.00002 0.000025
200 400 600 800 1000 1200 1400 1600 Time (sec)
Gibbs IBP |C|=26,fw=3 |C|=48,fw=2
MSE vs. Time Ergodic, | X| = 360, | C| = 26, D(Xi)= 2 Exact Time = 50 min using BTE
slides11b 828X 2019
T t t T t t t
w T c Q e c P T e P
1 1
1 ) ( ) , ( 1 ) ( ˆ
T t t t i i
w c c T e c P
1
) , ( 1 ) | (
T t t t i i
w e c x P T e x P
1
) , | ( 1 ) | (
(Gogate & Dechter, 2005) and (Bidyuk & Dechter, 2006)
slides11b 828X 2019
For End If End ) ,..., | ( Else If : do For
1 1 1 1 1
1
t i t i t i i i t i i i
z z Z P z e ,z z z E Z Z Z
sample t using bucket tree elimination
number of instances K (based on memory available
KL[P(C|e), Q(C)] ≤ KL[P(X|e), Q(X)]
slides11b 828X 2019
slides11b 828X 2019
slides11b 828X 2019
slides11b 828X 2019
Importance Sampling Gibbs Sampling
slides11b 828X 2019
cpcs360b, N=360, |LC|=26, w*=21, |E|=15
1.E-05 1.E-04 1.E-03 1.E-02 2 4 6 8 10 12 14
Time (sec) MSE
LW AIS-BN Gibbs LCS IBP
LW – likelihood weighting LCS – likelihood weighting on a cutset
slides11b 828X 2019
1.0E-05 1.0E-04 1.0E-03 1.0E-02 10 20 30 40 50 60
MSE Time (sec)
cpcs422b, N=422, |LC|=47, w*=22, |E|=28
LW AIS-BN Gibbs LCS IBP
LW – likelihood weighting LCS – likelihood weighting on a cutset
slides11b 828X 2019
coding, N=200, P=3, |LC|=26, w*=21
1.0E-05 1.0E-04 1.0E-03 1.0E-02 1.0E-01 2 4 6 8 10
Time (sec) MSE
LW AIS-BN Gibbs LCS IBP
LW – likelihood weighting LCS – likelihood weighting on a cutset
slides11b 828X 2019
slides11b 828X 2019
slides11b 828X 2019
slides11b 828X 2019