Algorithms for Reasoning with graphical models
Slides Set 10:
Rina Dechter
slides10 828X 2019
Bounded In Inference Non-iteratively; Min ini-Bucket Eli limination
(Class Notes (8-9), Darwiche chapter 14
Slides Set 10: Bounded In Inference Non-iteratively; Min - - PowerPoint PPT Presentation
Algorithms for Reasoning with graphical models Slides Set 10: Bounded In Inference Non-iteratively; Min ini-Bucket Eli limination Rina Dechter (Class Notes (8-9), Darwiche chapter 14 slides10 828X 2019 Outline Mini-bucket elimination
slides10 828X 2019
(Class Notes (8-9), Darwiche chapter 14
slides10 828X 2019
Sum-Inference Max-Inference Mixed-Inference
slides10 828X 2019
− =
) var( 1
e X n i e i i pa
X i i i C
− = − − =
) var( 1 ) var( 1
e X n j e j j X e X n j e j j i i
i
x
slides10 828X 2019
) , ( ) | ( = = e a P e a P
=
= =
d e b c
c b e P b a d P a c P a b P a P e a P
, , ,
) , | ( ) , | ( ) | ( ) | ( ) ( ) , (
=
=
d c b e
b a d P c b e P a b P a c P a P ) , | ( ) , | ( ) | ( ) | ( ) (
Elimination Order: d,e,b,c Query:
D: E: B: C: A:
=
d D
b a d P b a f ) , | ( ) , ( ) , | ( b a d P ) , | ( c b e P ) , | ( ) , ( c b e P c b fE = =
=
b E D B
c b f b a f a b P c a f ) , ( ) , ( ) | ( ) , ( ) ( ) ( ) , ( a f A p e a P
C
= = ) (a P
) | ( a c P
=
c B C
c a f a c P a f ) , ( ) | ( ) ( ) | ( a b P
D,A,B E,B,C B,A,C C,A A
) , ( b a fD ) , ( c b fE ) , ( c a fB ) (a fC
A A D D E E C C B B
Bucket Tree
D E B C A
Original Functions Messages Time and space exp(w*)
OPT
Algorithm BE-mpe (Dechter 1996, Bertele and Briochi, 1977)
slides10 828X 2019
A D E C B B C E D
W*=4 “induced width” (max clique size)
slides10 828X 2019
Return optimal configuration (a*,b*,c*,d*,e*)
E: C: D: B: A:
OPT = optimal value
slides10 828X 2019
slides10 828X 2019
slides10 828X 2019
exponential in the number of variables involved
into “mini-buckets” on smaller number of variables
X f1(X) 1.0 1 2.0 2 3.0 3 4.0 X f2(X) 1.0 1 2.0 2 2.0 3 0.0 X F (X) 1.0 1 4.0 2 6.0 3 0.0
4.0 = 4.0 + 2.0 = 8.0
× × ×
Split a bucket into mini-buckets ―> bound complexity Exponential complexity decrease: bucket (X) =
slides10 828X 2019
A D E C B
bucket E: bucket C: bucket D: bucket B: bucket A:
mini-buckets
U = upper bound [Dechter & Rish 2003]
slides10 828X 2019
P(d|a,b) p(b|a) P(e|b,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a) λ𝐶→𝐸 a, d = 𝑛𝑏𝑦𝑐 P(d|a,b) p(b|a) λ𝐶→𝐷 𝑓, 𝑑 = 𝑛𝑏𝑦𝑐P(e|b,c) λ𝐶→𝐸(𝑏, 𝑒) = 𝑛𝑏𝑦𝑒 …. e=0
bucket E: bucket C: bucket D: bucket B: bucket A:
mini-buckets
U = upper bound [Dechter & Rish 2003]
slides10 828X 2019
P(d|a,b) p(b|a) P(e|b’,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a) λ𝐶→𝐸 a, d = 𝑛𝑏𝑦𝑐 P(d|a,b) p(b|a) λ𝐶→𝐷 𝑓, 𝑑 = 𝑛𝑏𝑦𝑐P(e|b,c) λ𝐶→𝐸(𝑏, 𝑒) = 𝑛𝑏𝑦𝑒 ….
A D E C B B’
e=0
A D E C B A D E C B B’ mini-buckets
[Dechter and Rish, 1997; 2003]
E: C: D: B: A:
U = upper bound P(d|a,b) p(b|a) P(e|b’,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a)
max
𝐶′ ෑ 𝑔
max
𝐶
ෑ 𝑔
e=0
Greedy configuration = lower bound
E: C: D: B: A:
mini-buckets
U = upper bound P(d|a,b) p(b|a) P(e|b’,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a) a*= 𝑏𝑠𝑛𝑏𝑦 𝑏P(a) 𝑠𝑓𝑢𝑣𝑠𝑜(a∗,e∗,d∗,c∗,b∗) λ𝐸→𝐵(𝑏) λ𝐹→𝐵(𝑏) e*= 0 d*= 𝑏𝑠𝑛𝑏𝑦 𝑒λ𝐶→𝐸(a∗, 𝑒) c*= 𝑏𝑠𝑛𝑏𝑦 𝑓λ𝐶→𝐷(e∗, 𝑑) b*= 𝑏𝑠𝑛𝑏𝑦 𝑐P(e∗|b, c∗)P(d|a∗,b) p(b|a∗) e=0
slides10 828X 2019 22
Before Splitting: Network N After Splitting: Network N'
Variables in different buckets are renamed and duplicated (Kask et. al., 2001), (Geffner et. al., 2007), (Choi, Chavira, Darwiche , 2007)
E: C: D: B: A: U = Upper bound Example: MBE-mpe(3) versus BE-mpe E: C: D: B: A: OPT 2: 3: 3: 3: 1:
max variables in a mini-bucket
𝑥∗ = 2 𝑥∗ =4 P(d|a,b) p(b|a) P(e|b’,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a) e=0
slides10 828X 2019
bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑑, 𝑏 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏
min
𝐶 𝑔(∙)
min
𝐶 𝑔(∙)
mini-buckets
𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound
ො 𝑏 = arg min
𝑏 𝑔 𝑏 + 𝜇𝐹→𝐵(𝑏)
Ƹ 𝑓 = arg min
𝑓
𝜇𝐷→𝐹(ො 𝑏, 𝑓) + 𝜇𝐸→𝐹(ො 𝑏, 𝑓) መ 𝑒 = arg min
𝑒 𝑔 ො
𝑏, 𝑒 + 𝜇𝐶→𝐸(𝑒, Ƹ 𝑓) Ƹ 𝑑 = arg min
𝑑
𝜇𝐶→𝐷 ො 𝑏, 𝑑 + 𝑔 𝑑, ො 𝑏 + 𝑔(𝑑, Ƹ 𝑓) 𝑐 = arg min
𝑐 𝑔 ො
𝑏, 𝑐 + 𝑔 𝑐, Ƹ 𝑑 +𝑔 𝑐, መ 𝑒 + 𝑔(𝑐, Ƹ 𝑓)
Greedy configuration = upper bound
[Dechter and Rish, 2003]
slides10 828X 2019
slides10 828X 2019
n
1
slides10 828X 2019
[Dechter and Rish, 1997], [Liu and Ihler, 2011], [Liu and Ihler, 2013]
slides10 828X 2019
slides10 828X 2019
lower-bound)
BE-bel
Approximates BE-map.
) ( max ) ( ) ( ) ( ) ( ) ( ) ( ) ( X g x f x g x f x g x f x g x f
X X X X X X
slides10 828X 2019
slides10 828X 2019
slides10 828X 2019
Anytime-mpe(0.0001) U/L error vs time
Time and parameter i
1 10 100 1000
Upper/Lower
0.6 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8
cpcs422b cpcs360b
i=1 i=21
Test case: no evidence
505.2 70.3 anytime-mpe( ), 110.5 70.3 anytime-mpe( ), 1697.6 115.8 elim-mpe cpcs422 cpcs360 Algorithm Time (sec)
4
10 − =
1
10 − =
slides10 828X 2019
slides10 828X 2019
slides10 828X 2019
A B C
(Qiang Liu slides)
slides10 828X 2019
(Qiang Liu slides)
slides10 828X 2019
New York, 1934.
(Qiang Liu slides)
New York, 1934.
(Qiang Liu slides)
slides10 828X 2019
bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏
mini-buckets
𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) U = upper bound
𝑦 𝑥
ෑ 𝑔(𝑦)
𝜇𝐶 𝑏, 𝑑, 𝑒, 𝑓 =
𝑐
𝑔 𝑏, 𝑐 ⋅ 𝑔 𝑐, 𝑑 ⋅ 𝑔 𝑐, 𝑒 ⋅ 𝑔 𝑐, 𝑓
Exact bucket elimination:
≤
𝑐 𝑥1
𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 ⋅
𝑐 𝑥2
𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓
= 𝜇𝐶→𝐷 (𝑏, 𝑑) ⋅ 𝜇𝐶→𝐸 (𝑒, 𝑓)
(mini-buckets)
where
𝑦 𝑥
𝑔 𝑦 =
𝑦
𝑔 𝑦
1 𝑥 𝑥
is the weighted or “power” sum operator
𝑦 𝑥
𝑔
1 𝑦 𝑔 2 𝑦 ≤ 𝑦 𝑥1
𝑔
1 𝑦
𝑦 𝑥2
𝑔
2 𝑦
where
𝑥1 + 𝑥2 = 𝑥 𝑥1 > 0, 𝑥2 > 0
and
𝑥1 > 0, 𝑥2 < 0
(lower bound if ) [Liu and Ihler, 2011]
(for summation)
slides10 828X 2019
slides10 828X 2019
Bucket Elimination
A B C E D
MAX SUM
B: C: D: E: A:
MAP* is the marginal MAP value constrained elimination order
slides7 828X 2019
slides10 828X 2019
bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏
mini-buckets
𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) U = upper bound
[Liu and Ihler, 2011; 2013] [Dechter and Rish, 2003]
Marginal MAP
Σ𝐶 Σ𝐷
maxA maxE maxD
𝜇𝐶→𝐷 𝑏, 𝑑 =
𝑐 𝑥1
𝑔 𝑏, 𝑐 𝑔(𝑐, 𝑑) 𝜇𝐶→𝐸 𝑒, 𝑓 =
𝑐 𝑥2
𝑔 𝑐, 𝑒 𝑔(𝑐, 𝑓) (𝑥1 + 𝑥2 = 1)
𝜇𝐹→𝐵 𝑏 = max
𝑓
𝜇𝐷→𝐹 𝑏, 𝑓 𝜇𝐸→𝐹(𝑏, 𝑓) 𝑉 = max
𝑏
𝑔 𝑏 𝜇𝐹→𝐵(𝑏)
Can optimize over cost-shifting and weights (single pass “MM” or iterative message passing)
A B C E D
slides10 828X 2019
Process max buckets With max mini-buckets And sum buckets with weighted Mini-buckets
slides10 828X 2019
Initial partitioning
slides10 828X 2019
slides10 828X 2019
slides10 828X 2019
ABC
) , | ( ) | ( ) ( ) , (
) 2 , 1 (
b a c p a b p a p c b h
a
=
EFG
) , ( ) , | ( ) | ( ) , (
) 2 , 3 ( , ) 1 , 2 (
f b h d c f p b d p c b h
f d
= ) , ( ) , | ( ) | ( ) , (
) 2 , 1 ( , ) 3 , 2 (
c b h d c f p b d p f b h
d c
= ) , ( ) , | ( ) , (
) 3 , 4 ( ) 2 , 3 (
f e h f b e p f b h
e
= ) , ( ) , | ( ) , (
) 3 , 2 ( ) 4 , 3 (
f b h f b e p f e h
b
= ) , | ( ) , (
) 3 , 4 (
f e g G p f e h
e
= =
EF BF BC
BCDF
G E F C D B A
Time and space: exp(cluster size)= exp(treewidth) EXACT algorithm
slides10 828X 2019
slides10 828X 2019
We can replace the sum with power sum For weights that sum to 1 in each mini-bucket
) , | ( ) | ( ) ( ) , (
1 ) 2 , 1 (
b a c p a b p a p c b h
a
=
A B C p(a), p(b|a), p(c|a,b) B C D p(d|b), h(1,2)(b,c) C D F p(f|c,d) B E F p(e|b,f), h1
(2,3)(b), h2 (2,3)(f)
E F G p(g|e,f)
EF BC BF
, 2 ) 3 , 2 (
d c
1 ) 2 , 1 ( , 1 ) 3 , 2 (
d c
G E F C D B A
Time and space: exp(i-bound) APPROXIMATE algorithm
Number of variables in a mini-cluster
slides10 828X 2019
slides10 828X 2019
EF BF BC ) , | ( ) | ( ) ( : ) , (
1 ) 2 , 1 (
b a c p a b p a p c b h
a
=
) 2 , 1 (
H
) , | ( max : ) ( ) , ( ) | ( : ) (
, 2 ) 1 , 2 ( 1 ) 2 , 3 ( , 1 ) 1 , 2 (
d c f p c h f b h b d p b h
f d f d
= =
) 1 , 2 (
H
) , | ( max : ) ( ) , ( ) | ( : ) (
, 2 ) 3 , 2 ( 1 ) 2 , 1 ( , 1 ) 3 , 2 (
d c f p f h c b h b d p b h
d c d c
= =
) 3 , 2 (
H
) , ( ) , | ( : ) , (
1 ) 3 , 4 ( 1 ) 2 , 3 (
f e h f b e p f b h
e
=
) 2 , 3 (
H
) ( ) ( ) , | ( : ) , (
2 ) 3 , 2 ( 1 ) 3 , 2 ( 1 ) 4 , 3 (
f h b h f b e p f e h
b
=
) 4 , 3 (
H
) , | ( : ) , (
1 ) 3 , 4 (
f e g G p f e h
e
= =
) 3 , 4 (
H
ABC
BEF EFG BCDF
G E F C D B A
slides10 828X 2019
ABC
) 2 , 1 (
BEF EFG
) 1 , 2 (
) 3 , 2 (
) 2 , 3 (
) 4 , 3 (
) 3 , 4 (
EF BF BC BCDF
) , (
1 ) 2 , 1 (
c b h
) ( ) (
2 ) 1 , 2 ( 1 ) 1 , 2 (
c h b h ) ( ) (
2 ) 3 , 2 ( 1 ) 3 , 2 (
f h b h
) , (
1 ) 2 , 3 (
f b h ) , (
1 ) 4 , 3 (
f e h ) , (
1 ) 3 , 4 (
f e h
) 2 , 1 (
) 1 , 2 (
) 3 , 2 (
) 2 , 3 (
) 4 , 3 (
) 3 , 4 (
ABC
BEF EFG EF BF BC BCDF
CTE MC
G E F C D B A
(Dechter and Rish, 2003, Rollon and Dechter 2010) Use greedy heuristic derived from a distance function to decide which functions go into a single mini-bucket Scope-based Partitioning Heuristic (SCP) aims at minimizing the number of mini-buckets in the partition by including in each minibucket as many functions as respecting the i bound is satisfied
slides10 828X 2019
slides10 828X 2019
Scope-based Partitioning Heuristic. The scope-based partition heuristic (SCP) aims at minimizing the number of mini-buckets in the partition by including in each mini-bucket as many functions as possible as long as the i bound is satisfied. First, single function mini-buckets are decreasingly ordered according to their arity from left to right. Then, each mini-bucket is absorbed into the left-most mini-bucket with whom it can be merged. The time complexity of Partition(B, i) , where B is the bucket to be partitioned, and |B|,the number of functions in the bucket, using the SCP heuristic is O(|B| log (|B|) + |B|^2) . The scope-based heuristic is is quite fast, its shortcoming is that it does not consider the actual information in the functions.
slides10 828X 2019
slides10 828X 2019
slides10 828X 2019
Comparing Mini-clustering against Belief Propagation. What is belief propagation
slides10 828X 2019
) (
1
1 u
X
1
2
3
2
1
) (
1
2 x
U
) (
1
2 u
X
) (
1
3 x
U
) BEL(U update : step One
1
A B C D E F G H + + + + + + a b c d e f g h p1 p2 p3 p4 p5 p6 Input bits Parity bits Received bits Received bits Gaussian channel noise
σ σ
slides10 828X 2019
slides10 828X 2019
Error-correcting linear block code State-of-the-art: approximate algorithm – iterative belief propagation (IBP) (Pearl’s poly-tree algorithm applied to loopy networks)
Bit error rate (BER) as a function of noise (sigma):
slides10 828X 2019
MBE-mpe is better on low w* codes IBP (or BP) is better on randomly generated (high w*) codes.
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
2 4 6 8 10 12 14 16 18
NHD
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 MC IBP
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
2 4 6 8 10 12 14 16 18
Absolute error
0.00 0.01 0.02 0.03 0.04 0.05 0.06 MC IBP
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
2 4 6 8 10 12 14 16 18
Relative error
0.00 0.02 0.04 0.06 0.08 0.10 0.12 MC IBP
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
2 4 6 8 10 12 14 16 18
Time (seconds)
2 4 6 8 10 12 MC IBP
slides10 828X 2019
slides10 828X 2019
) (
1
1 u
X
1
2
3
2
1
) (
1
2 x
U
) (
1
2 u
X
) (
1
3 x
U
) BEL(U update : step One
1
slides10 828X 2019
slides10 828X 2019
slides10 828X 2019
93
) p(b|a ) p(a ) , | b a p(c ) ,d p(f|c ) P(d|b ) , | f b p(e ) , f p(g|e
94
) p(b|a ) p(a ) , | b a p(c ) ,d p(f|c ) P(d|b ) , | f b p(e ) , f p(g|e
A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) B E F p(e|b,f) E F G p(g|e,f) EF BF BC
95
) , | ( ) | ( ) ( ) , (
) 2 , 1 (
b a c p a b p a p c b h
a
= ) , ( ) , | ( ) | ( ) , (
) 2 , 3 ( , ) 1 , 2 (
f b h d c f p b d p c b h
f d
= ) , ( ) , | ( ) | ( ) , (
) 2 , 1 ( , ) 3 , 2 (
c b h d c f p b d p f b h
d c
= ) , ( ) , | ( ) , (
) 3 , 4 ( ) 2 , 3 (
f e h f b e p f b h
e
= ) , ( ) , | ( ) , (
) 3 , 2 ( ) 4 , 3 (
f b h f b e p f e h
b
= ) , | ( ) , (
) 3 , 4 (
f e g G p f e h
e
= =
G E F C D B A
ABC
BEF EFG
EF BF BC
BCDF
Time: O ( exp(w+1 )) Space: O ( exp(sep))
For each cluster P(X|e) is computed, also P(e)
96
property)
intersecti (running subtree connected a forms set the ble each varia For 2. and such that vertex
exactly is there function each For 1. : satisfying and sets, two x each verte with g associatin functions, labeling are and and tree a is where , , , triple a is network belief a for A χ(v)} V|X {v X X χ(v) ) scope(p ψ(v) p P p P ψ(v) X χ(v) V v ψ χ (V,E) T T X,D,G,P BN ion decomposit tree
i i i i i
= =
A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) B E F p(e|b,f) E F G p(g|e,f) EF BF BC
G E F C D B A
Belief network Tree decomposition
slides10 828X 2019
a) Fragment of an arc-labeled join-graph
a) Shrinking labels to make it a minimal arc-labeled join-graph
ABCDE BCE CDEF BC CDE CE ABCDE BCE CDEF BC DE CE
slides10 828X 2019
ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F F GH GI
ABCDE p(a), p(c), p(b|ac), p(d|abe),p(e|b,c) h(3,1)(bc) BCE CDEF BC CDE CE
h(3,1)(bc) h(1,2)
Minimal arc-labeled: sep(1,2)={D,E} elim(1,2)={A,B,C} Non-minimal arc-labeled: sep(1,2)={C,D,E} elim(1,2)={A,B}
=
c b a
bc h bc e p abe d p ac b p c p a p de h
, , ) 1 , 3 ( ) 2 , 1 (
) ( ) | ( ) | ( ) | ( ) ( ) ( ) (
=
b a
bc h bc e p abe d p ac b p c p a p cde h
, ) 1 , 3 ( ) 2 , 1 (
) ( ) | ( ) | ( ) | ( ) ( ) ( ) (
slides10 828X 2019
A D I B E J F G C H
Belief network
A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C A AB BC BE C C DE CE F H F FG GH H GI
Loopy BP graph
slides10 828X 2019
A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C A AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C A AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH GI
Arcs labeled with any single variable should form a TREE
slides10 828X 2019
A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH GI
ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F FG GH GI ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F FG GH GI ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F FG GH GI ABCDE FGHI GHIJ CDEF CDE F GHI slides10 828X 2019
A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C A AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC C DE CE H F F GH GI ABCDE FGI BCE GHIJ CDEF FGH BC DE CE F F GH GI ABCDE FGHI GHIJ CDEF CDE F GHI
slides10 828X 2019
slides10 828X 2019
a) schematic mini-bucket(i), i=3 b) arc-labeled join-graph decomposition
CDB CAB BA A
CB
P(D|B) P(C|A,B) P(A)
BA
P(B|A) FCD P(F|C,D) GFE EBF BF
EF
P(E|B,F) P(G|F,E)
B CD BF A
F
G: (GFE) E: (EBF) (EF) F: (FCD) (BF) D: (DB) (CD) C: (CAB) (CB) B: (BA) (AB) (B) A: (A)
G E F C D B A
slides10 828X 2019
slides10 828X 2019
O(deg•(n+N) •d i+1)
O(N•d)
◼ Measures:
◼ Absolute error ◼ Relative error ◼ Kulbach-Leibler (KL) distance ◼ Bit Error Rate ◼ Time
◼ Networks (all variables are binary):
◼ Random networks ◼ Grid networks (MxM) ◼ CPCS 54, 360, 422 ◼ Coding networks
slides10 828X 2019
Coding, N=400, 500 instances, 30 it, w*=43, sigma=.32
i-bound
2 4 6 8 10 12
BER
0.00237 0.00238 0.00239 0.00240 0.00241 0.00242 0.00243 IBP IJGP
Coding, N=400, 500 instances, 30 it, w*=43, sigma=.51
i-bound
2 4 6 8 10 12
BER
0.0745 0.0750 0.0755 0.0760 0.0765 0.0770 0.0775 0.0780 0.0785 IBP IJGP
Coding, N=400, 500 instances, 30 it, w*=43, sigma=.65
i-bound
2 4 6 8 10 12
BER
0.1900 0.1902 0.1904 0.1906 0.1908 0.1910 0.1912 0.1914 IBP IJGP
Coding, N=400, 1000 instances, 30 it, w*=43, sigma=.22
i-bound
2 4 6 8 10 12
BER
1e-5 1e-4 1e-3 1e-2 1e-1 IJGP MC IBP
σ = .22 σ = .51 σ = .65 σ = .32
slides10 828X 2019
CPCS 422, evid=0, w*=23, 1instance
i-bound
2 4 6 8 10 12 14 16 18
KL distance
0.0001 0.001 0.01 0.1 IJGP 30 it (at convergence) MC IBP 10 it (at convergence)
CPCS 422, evid=30, w*=23, 1instance
i-bound
3 4 5 6 7 8 9 10 11 12 13 14 15 16
KL distance
0.0001 0.001 0.01 0.1 IJGP at convergence MC IBP at convergence
evidence=0 evidence=30
slides10 828X 2019
CPCS 422, evid=0, w*=23, 1instance
number of iterations
5 10 15 20 25 30 35
KL distance
0.0001 0.001 0.01 0.1 IJGP (3) IJGP(10) IBP
CPCS 422, evid=30, w*=23, 1instance
number of iterations
5 10 15 20 25 30 35
KL distance
0.0001 0.001 0.01 0.1 1 IJGP(3) IJGP(10) IBP
evidence=0 evidence=30
slides10 828X 2019
slides10 828X 2019
Coding, N=400, 500 instances, 30 iterations, w*=43
i-bound
2 4 6 8 10 12
Time (seconds)
2 4 6 8 10 IJGP 30 iterations MC IBP 30 iterations
115
Lambda is Grounding for evidence e)
Theorem: Yedidia, Frieman and Weiss 2005
slides10 828X 2019
(Reparameterization) A B f(A,B) b b 6 + 3 b g 0 – 1 g b 0 + 3 g g 6 – 1 B C f(B,C) b b 6 – 3 b g 0 – 3 g b 0 + 1 g g 6 + 1 A B C f(A,B,C) b b b 12 b b g 6 b g b b g g 6 g b b 6 g b g g g b 6 g g g 12 B λ(B) b 3 g
+ 𝜇(𝐶) − 𝜇(𝐶) Modify the individual functions
keep the sum of functions the same = 0 + 6
slides10 828X 2019
127
B C f2(B,C) 1.0 1 0.0 1 1.0 1 1 3.0 A B C F(A,B,C) 3.0 1 2.0 1 2.0 1 1 4.0 1 4.5 1 1 3.5 1 1 4.0 1 1 1 6.0
A B f1(A,B) ¸(B) 2.0 1 3.5 1 1.0
+1
1 1 3.0 B C f2(B,C)
1.0 1 0.0 1 1.0
1 1 3.0 A B f1(A,B) 2.0 1 3.5 1 1.0 1 1 3.0
(Adjusting functions cancel each other)
slides10 828X 2019
(Decomposition bound is exact)
𝑦1 𝑦3 𝑦2
𝑔
13(𝑦1, 𝑦3)
𝑔
23(𝑦2, 𝑦3)
𝑔
12(𝑦1, 𝑦2)
𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2
𝑔
12(∙)
𝑔
13(∙)
𝑔
23(∙)
𝐺∗ = min
𝑦 𝛽
𝑔
𝛽(𝑦)
≥
𝛽
min
𝑦
𝑔
𝛽(𝑦)
slides10 828X 2019
𝑦1 𝑦3 𝑦2
𝑔
13(𝑦1, 𝑦3)
𝑔
23(𝑦2, 𝑦3)
𝑔
12(𝑦1, 𝑦2)
𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2
𝑔
12(∙)
𝑔
13(∙)
𝑔
23(∙)
𝐺∗ = min
𝑦 𝛽
𝑔
𝛽(𝑦)
≥
𝛽
min
𝑦
𝑔
𝛽(𝑦)
‒ Enforce lost equality constraints via Lagrange multipliers 𝜇1→13(𝑦1) 𝜇1→12(𝑦1) 𝜇3→13(𝑦3) 𝜇3→23(𝑦3) 𝜇2→13(𝑦2) 𝜇2→23(𝑦2)
∀𝑘 ∶
𝛽∋𝑘
𝜇𝑘→𝛽 𝑦𝑘 = 0 Reparameterization:
max
𝜇𝑗→𝛽
+
𝑗∈𝛽
𝜇𝑗→𝛽 𝑦𝑗
slides10 828X 2019
𝑦1 𝑦3 𝑦2
𝑔
13(𝑦1, 𝑦3)
𝑔
23(𝑦2, 𝑦3)
𝑔
12(𝑦1, 𝑦2)
𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2
𝑔
12(∙)
𝑔
13(∙)
𝑔
23(∙)
𝐺∗ = min
𝑦 𝛽
𝑔
𝛽(𝑦)
≥
𝛽
min
𝑦
𝑔
𝛽(𝑦)
Many names for the same class of bounds:
‒ Dual decomposition [Komodakis et al. 2007] ‒ TRW, MPLP [Wainwright et al. 2005; Globerson & Jaakkola, 2007] ‒ Soft arc consistency [Cooper & Schiex, 2004] ‒ Max-sum diffusion [Warner 2007] 𝜇1→13(𝑦1) 𝜇1→12(𝑦1) 𝜇3→13(𝑦3) 𝜇3→23(𝑦3) 𝜇2→13(𝑦2) 𝜇2→23(𝑦2)
∀𝑘 ∶
𝛽∋𝑘
𝜇𝑘→𝛽 𝑦𝑘 = 0 Reparameterization:
max
𝜇𝑗→𝛽
+
𝑗∈𝛽
𝜇𝑗→𝛽 𝑦𝑗
slides10 828X 2019
slides10 828X 2019
𝑦1 𝑦3 𝑦2
𝑔
13(𝑦1, 𝑦3)
𝑔
23(𝑦2, 𝑦3)
𝑔
12(𝑦1, 𝑦2)
𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2
𝑔
12(∙)
𝑔
13(∙)
𝑔
23(∙)
𝐺∗ = min
𝑦 𝛽
𝑔
𝛽(𝑦)
≥
𝛽
min
𝑦
𝑔
𝛽(𝑦)
Many ways to optimize the bound:
‒ Sub-gradient descent [Komodakis et al. 2007; Jojic et al. 2010] ‒ Coordinate descent [Warner 2007; Globerson & Jaakkola 2007; Sontag et al. 2009; Ihler et al. 2012] ‒ Proximal optimization [Ravikumar et al, 2010] ‒ ADMM [Meshi & Globerson 2011; Martins et al. 2011; Forouzan & Ihler 2013] 𝜇1→13(𝑦1) 𝜇1→12(𝑦1) 𝜇3→13(𝑦3) 𝜇3→23(𝑦3) 𝜇2→13(𝑦2) 𝜇2→23(𝑦2)
∀𝑘 ∶
𝛽∋𝑘
𝜇𝑘→𝛽 𝑦𝑘 = 0 Reparameterization:
max
𝜇𝑗→𝛽
+
𝑗∈𝛽
𝜇𝑗→𝛽 𝑦𝑗
A B f1(A,B) ¸(B) 1.0 1 0.0 1 0.0 1 1 2.5 2 1.0 1 2 3.0 B C f2(B,C)
5.0 1 2.0 1 1.0 1 1 1.5 2 0.2 2 1 0.0
slides10 828X 2019
A B B C
A B f1(A,B) ¸(B) 1.0
+1
1 0.0 1 0.0 1 1 2.5 2 1.0
1 2 3.0 B C f2(B,C)
5.0
1 2.0 1 1.0 1 1 1.5 2 0.2
+1
2 1 0.0
slides10 828X 2019
A B B C
A B f1(A,B) ¸(B) 1.0
+1
1 0.0 1 0.0 1 1 2.5 2 1.0
1 2 3.0 B C f2(B,C)
5.0
1 2.0 1 1.0 1 1 1.5 2 0.2
+1
2 1 0.0
slides10 828X 2019
A B B C
A B f1(A,B) ¸(B) 1.0
+2
1 0.0 1 0.0
1 1 2.5 2 1.0
1 2 3.0 B C f2(B,C)
5.0
1 2.0 1 1.0
+1
1 1 1.5 2 0.2
+1
2 1 0.0
slides10 828X 2019
A B B C
slides10 828X 2019
slides10 828X 2019
E: C: D: B: A:
slides10 828X 2019
U = upper bound
Join graph:
{A,B,C} {B,D,E} {A,C,E} {A,D,E} {A,E} {A}
{B} {D,E} {A} {A} {A,E} {A,C}
mini-buckets: “Join graph” message passing
One message exchange within each bucket, during downward sweep
(“regions”) and cost-shifting f’n scopes (“coordinates”)
[Ihler et al. 2012]
A B C D E
P(A) P(B|A) P(C|A) P(E|B,C) P(D|A,B)
Bucket B Bucket C Bucket D Bucket E Bucket A
P(B|A) P(D|A,B) P(E|B,C) P(C|A) E = 0 P(A) maxB∏ hB (A,D) MPE* is an upper bound on MPE --U Generating a solution yields a lower bound--L maxB∏ hD (A) hC (A,E) hB (C,E) hE (A)
W=2 m11 m12 m11,m12- moment-matching messages
slides10 828X 2019
slides10 828X 2019
slides10 828X 2019
slides10 828X 2019
slides10 828X 2019