Sampling Techniques for Probabilistic and Deterministic Graphical models
ICS 276, Spring 2017 Bozhena Bidyuk Rina Dechter
Reading” Darwiche chapter 15, related papers
Sampling Techniques for Probabilistic and Deterministic Graphical - - PowerPoint PPT Presentation
Sampling Techniques for Probabilistic and Deterministic Graphical models ICS 276, Spring 2017 Bozhena Bidyuk Rina Dechter Reading Darwiche chapter 15, related papers Overview 1. Probabilistic Reasoning/Graphical models 2. Importance
Reading” Darwiche chapter 15, related papers
– Bayesian network, constraint networks, mixed network
– using inference, – search and hybrids
– tree-width, cycle-cutset, w-cutset
) var( 1
| ) | ( ) (
e X n i e i i pa
x P e P
X i i i C
Z ) (
) var( 1 ) var( 1
| ) | ( | ) | ( ) ( ) , ( ) | (
e X n j e j j X e X n j e j j i i
pa x P pa x P e P e x P e x P
i
x
8
graph is dense; (high treewidth) then:
10
11
t n t t t
2 1
– draw random number r [0, 1] – If (r < 0.3) then set X=0 – Else set X=1
12
13
) ,..., | ( ... ) | ( ) ( ) (
1 1 1 2 1
n n
X X X P X X P X P X P
– Likelihood Sampling – Choosing a Proposal Distribution
– Metropolis-Hastings – Gibbs sampling
Input: Bayesian network X= {X1,…,XN}, N- #nodes, T - # samples Output: T samples Process nodes in topological order – first process the ancestors of a node, then the node itself: 1. For t = 0 to T 2. For i = 0 to N 3. Xi sample xi
t from P(xi | pai)
15
16
) (
1
X P
) | (
1 2 X
X P
) | (
1 3 X
X P
) | ( from Sample . 4 ) | ( from Sample . 3 ) | ( from Sample . 2 ) ( from Sample . 1 sample generate // Evidence No
3 3 , 2 2 4 4 1 1 3 3 1 1 2 2 1 1
x X x X x P x x X x P x x X x P x x P x k
) , | ( ) | ( ) | ( ) ( ) , , , (
3 2 4 1 3 1 2 1 4 3 2 1
X X X P X X P X X P X P X X X X P
) , | (
3 2 4
X X X P
Input: Bayesian network X= {X1,…,XN}, N- #nodes E – evidence, T - # samples Output: T samples consistent with E
2. For i=1 to N 3. Xi sample xi
t from P(xi | pai)
4. If Xi in E and Xi xi, reject sample: 5. Goto Step 1.
17
18
) ( 1 x P
) | (
1 2 x
x P ) , | (
3 2 4
x x x P
) | (
1 3 x
x P
) | ( from Sample 5.
1, from start and sample reject 0, If . 4 ) | ( from Sample . 3 ) | ( from Sample . 2 ) ( from Sample . 1 sample generate // : Evidence
3 , 2 4 4 3 1 3 3 1 2 2 1 1 3
x x x P x x x x P x x x P x x P x k X
Expected value: Given a probability distribution P(X) and a function g(X) defined over a set of variables X = {X1, X2, … Xn}, the expected value of g w.r.t. P is Variance: The variance of g w.r.t. P is:
20
x P
2
x P P
– An estimator is a function of the samples. – It produces an estimate of the unknown parameter of the sampling distribution.
21
T t t P
1 T 2 1
– A distribution P(X) = (0.3, 0.7). – g(X) = 40 if X equals 0 = 50 if X equals 1.
22
46 10 6 50 4 40 1 50 40 samples X samples X samples g # ) ( # ) ( # ˆ
23
24
) ( , ) ( ) ( ˆ : )] ( [ ) ( ) , ( ) ( ) ( ) , ( ) , ( ) ( ) ( ) , ( , \ Z Q z w T e P z w E z Q e z P E z Q z Q e z P e z P e P z Q e z P E X Z Let
t T t t Q Q z z
z where 1 estimate Carlo Monte : as P(e) rewrite can we Then, satisfying
distributi (proposal) a be Q(Z) Let
1
T for ) ( ) ( 1 ) ( ˆ
. . 1
e P z w T e P
s a T i i
T z w Var z w T Var e P Var
Q N i i Q Q
)] ( [ ) ( 1 ) ( ˆ
1
Q Q Q Q Q Q
2 2
This quantity enclosed in the brackets is zero because the expected value of the estimator equals the expected value of g(x)
29
) | ( ) | ( E : biased is Estimate , , ) ( ˆ ) , ( ˆ ) | ( : estimate Ratio IS. by r denominato and numerator Estimate : Idea ) ( ) , ( ) ( ) , ( ) ( ) , ( ) , ( ) ( ) ( ) , ( ) | (
and x contains z if 1 is which function, delta
a be (z) Let
T 1 k T 1 k i xi
e x P e x P e) w(z e) )w(z (z e P e x P e x P z Q e z P E z Q e z P z E e z P e z P z e P e x P e x P
i i k k k x i i Q x Q z z x i i
i i i
– Harder to analyze – Liu suggests a measure called “Effective sample size”
30
T as ) | ( ) | ( e x P e x P
i i
) | ( )] | ( [ lim e x P e x P E
i i P T
– Q(Z)=Q(Z1)xQ(Z2|Z1)x….xQ(Zn|Z1,..Zn-1)
– Z1Q(Z1)=(0.2,0.8) – Z2 Q(Z2|Z1)=(0.1,0.9,0.2,0.8) – Z3 Q(Z3|Z1,Z2)=Q(Z3)=(0.5,0.5)
33
(Fung and Chang, 1990; Shachter and Peot, 1990)
34
Works well for likely evidence!
“Clamping” evidence+ logic sampling+ weighing samples by evidence likelihood Is an instance of importance sampling!
35
e e e e e e e e e
Sample in topological order over X ! Clamp evidence, Sample xi P(Xi|pai), P(Xi|pai) is a look-up in CPT!
36
E E j j E X X i i E X X E E j j i i n E X X i i
j i i j i
pa e P e pa x P pa e P e pa x P x Q e x P w x x x Weights x X X X P X P e pa X P E X Q ) | ( ) , | ( ) | ( ) , | ( ) ( ) , ( ) ,.., ( : : ) , | ( ) ( ) , | ( ) \ (
\ \ \
sample a Given ) X , Q(X . x X Evidence and ) X , X | P(X ) X | P(X ) P(X ) X , X , P(X : network Bayesian a Given : Example
1 2 2 1 3 1 3 1 2 2 2 1 3 1 2 1 3 2 1
Notice: Q is another Bayesian network
37
T t t
w T e P
1 ) (
1 ) ( ˆ
Estimate P(e):
zero equals and x if 1 ) ( ) ( ) ( ˆ ) , ( ˆ ) | ( ˆ
i ) ( 1 ) ( ) ( 1 ) ( t i t x T t t t x T t t i i
x x g w x g w e P e x P e x P
i i
Estimate Posterior Marginals:
38
39
absolute
46
) | ( ) ( ) ( ) ( ) , ( estimator variance
a have to , ) ( ) ( ) , ( ) ( ) ( ) ( ) , ( 1 )] ( [ ) ( ˆ
2
e z P z Q z Q e P e z P e P z Q e z P z Q e P z Q e z P N T z w Var e P Var
Z z Q Q
– Run Bucket elimination on the problem along an
– Sample along the reverse ordering: (X1,..,XN) – At each variable Xi, recover the probability P(Xi|x1,...,xi-1) by referring to the bucket.
51
) , ( ) | ( e a P e a P
d e b c
c b e P b a d P a c P a b P a P e a P
, , ,
) , | ( ) , | ( ) | ( ) | ( ) ( ) , (
d c b e
b a d P c b e P a b P a c P a P ) , | ( ) , | ( ) | ( ) | ( ) (
Elimination Order: d,e,b,c Query:
D: E: B: C: A:
d D
b a d P b a f ) , | ( ) , ( ) , | ( b a d P ) , | ( c b e P ) , | ( ) , ( c b e P c b fE
b E D B
c b f b a f a b P c a f ) , ( ) , ( ) | ( ) , ( ) ( ) ( ) , ( a f A p e a P
C
) (a P ) | ( a c P
c B C
c a f a c P a f ) , ( ) | ( ) ( ) | ( a b P
D,A,B E,B,C B,A,C C,A A ) , ( b a fD ) , ( c b fE ) , ( c a fB ) (a fC
A A D D E E C C B B
Bucket Tree
D E B C A
Original Functions Messages Time and space exp(w*)
SP2 53
Algorithm elim-bel (Dechter 1996)
b
Elimination operator
P(e)
bucket B: P(a) P(C|A) P(B|A) P(D|B,A) P(e|B,C) bucket C: bucket D: bucket E: bucket A: B C D E A
e) (A, hD
(a) hE
e) C, D, (A, hB e) D, (A, hC
SP2 54
(Dechter 2002) bucket B: P(A) P(C|A) P(B|A) P(D|B,A) P(e|B,C) bucket C: bucket D: bucket E: bucket A:
e) (A, hD
(A) hE
e) C, D, (A, hB
e) D, (A, hC
Q(A) a A : (A) h P(A) Q(A)
E
Sample ignore : bucket Evidence
e) D, (a, h e) a, | Q(D d D : Sample bucket in the a A Set
C
e) C, d, (a, h ) | ( d) e, a, | Q(C c C : Sample bucket the in d D a, A Set
B
A C P
) , | ( ) , | ( ) | ( d) e, a, | Q(B b B : Sample bucket the in c C d, D a, A Set c b e P a B d P a B P
SP2 55
SP2 56 56
bucket A: bucket E: bucket D: bucket C: bucket B:
ΣB
P(B|A) P(D|B,A) hE(A) hB(A,D) P(e|B,C)
Mini-buckets
ΣB
P(C|A) hB(C,e) hD(A) hC(A,e) Approximation of P(e) Space and Time constraints: Maximum scope size of the new function generated should be bounded by 2 BE generates a function having scope size 3. So it cannot be used. P(A)
SP2 57 57
bucket A: bucket E: bucket D: bucket C: bucket B:
P(B|A) P(D|B,A) hE(A) hB(A,D) P(e|B,C) P(C|A) hB(C,e) hD(A) hC(A,e) Sampling is same as in BE-sampling except that now we construct Q from a randomly selected “mini- bucket”
– A Generalized Belief Propagation scheme (Yedidia et al., 2002)
– (Dechter, Kask and Mateescu, 2002)
– Mini-buckets – Ijgp – Both
– Some assignments generated are non solutions
k ) ( ˆ Re ' ) ( Q Update ) ( N 1 ) ( ˆ e) (E P ˆ Q z ,..., z samples Generate do k to 1 i For ) ( ˆ )) ( | ( ... )) ( | ( ) ( ) ( Q Proposal Initial
1 k 1 N 1 2 2 1 1
e E P turn End Q Q k Q z w e E P from e E P Z pa Z Q Z pa Z Q Z Q Z
k k i N j k k n n
1
k j
1
1 i 2 2 1 ' n n