Slides Set 10: Bounded In Inference Non-iteratively; Min - - PowerPoint PPT Presentation

slides set 10
SMART_READER_LITE
LIVE PREVIEW

Slides Set 10: Bounded In Inference Non-iteratively; Min - - PowerPoint PPT Presentation

Algorithms for Reasoning with graphical models Slides Set 10: Bounded In Inference Non-iteratively; Min ini-Bucket Eli limination Rina Dechter (Class Notes (8-9), Darwiche chapter 14 slides10 828X 2019 Outline Mini-bucket elimination


slide-1
SLIDE 1

Algorithms for Reasoning with graphical models

Slides Set 10:

Rina Dechter

slides10 828X 2019

Bounded In Inference Non-iteratively; Min ini-Bucket Eli limination

(Class Notes (8-9), Darwiche chapter 14

slide-2
SLIDE 2

Outline

  • Mini-bucket elimination
  • Weighted Mini-bucket
  • Mini-clustering
  • Re-parameterization, cost-shifting
  • Iterative Belief propagation
  • Iterative-join-graph propagation

slides10 828X 2019

slide-3
SLIDE 3

 Sum-Inference  Max-Inference  Mixed-Inference

Types of queries

  • NP-hard: exponentially many terms
  • We will focus on approximation algorithms
  • Anytime: very fast & very approximate ! Slower & more accurate

Harder

slides10 828X 2019

slide-4
SLIDE 4

Queries

  • Probability of evidence (or partition function)
  • Posterior marginal (beliefs):
  • Most Probable Explanation

 

− =

=

) var( 1

| ) | ( ) (

e X n i e i i pa

x P e P



=

X i i i C

Z ) ( 

   

− = − − =

= =

) var( 1 ) var( 1

| ) | ( | ) | ( ) ( ) , ( ) | (

e X n j e j j X e X n j e j j i i

pa x P pa x P e P e x P e x P

i

e) , x P( max arg * x

x

=

slides10 828X 2019

slide-5
SLIDE 5

Bucket Elimination

) , ( ) | ( =  = e a P e a P

=

= =

d e b c

c b e P b a d P a c P a b P a P e a P

, , ,

) , | ( ) , | ( ) | ( ) | ( ) ( ) , (

   

=

=

d c b e

b a d P c b e P a b P a c P a P ) , | ( ) , | ( ) | ( ) | ( ) (

Elimination Order: d,e,b,c Query:

D: E: B: C: A:

=

d D

b a d P b a f ) , | ( ) , ( ) , | ( b a d P ) , | ( c b e P ) , | ( ) , ( c b e P c b fE = =

=

b E D B

c b f b a f a b P c a f ) , ( ) , ( ) | ( ) , ( ) ( ) ( ) , ( a f A p e a P

C

= = ) (a P

) | ( a c P

=

c B C

c a f a c P a f ) , ( ) | ( ) ( ) | ( a b P

D,A,B E,B,C B,A,C C,A A

) , ( b a fD ) , ( c b fE ) , ( c a fB ) (a fC

A A D D E E C C B B

Bucket Tree

D E B C A

Original Functions Messages Time and space exp(w*)

slide-6
SLIDE 6

Finding MPE/MAP

OPT

bucket B: bucket C: bucket D: bucket E: bucket A: B C D E A

Algorithm BE-mpe (Dechter 1996, Bertele and Briochi, 1977)

slides10 828X 2019

A D E C B B C E D

W*=4 “induced width” (max clique size)

slide-7
SLIDE 7

Generating the Optimal Assignment

  • Given BE messages, select optimum config in reverse order

slides10 828X 2019

Return optimal configuration (a*,b*,c*,d*,e*)

E: C: D: B: A:

OPT = optimal value

slide-8
SLIDE 8

slides10 828X 2019

Approximate Inference

  • Metrics of evaluation
  • Absolute error: given e>0 and a query p= P(x|e), an estimate r has

absolute error e iff |p-r|<e

  • Relative error: the ratio r/p in [1-e,1+e].
  • Dagum and Luby 1993: approximation up to a relative error is NP-

hard.

  • Absolute error is also NP-hard if error is less than .5
slide-9
SLIDE 9

Outline

  • Mini-bucket elimination
  • Weighted Mini-bucket
  • Mini-clustering
  • Re-parameterization, cost-shifting
  • Iterative Belief propagation
  • Iterative-join-graph propagation

slides10 828X 2019

slide-10
SLIDE 10

slides10 828X 2019

Mini-Buckets: “Local Inference”

  • Computation in a bucket is time and space

exponential in the number of variables involved

  • Therefore, partition functions in a bucket

into “mini-buckets” on smaller number of variables

slide-11
SLIDE 11

Decomposition Bounds

  • Upper & lower bounds via approximate problem decomposition
  • Example: MAP inference
  • Relaxation: two “copies” of x, no longer required to be equal
  • Bound is tight (equality) if f1, f2 agree on maximizing value x

X f1(X) 1.0 1 2.0 2 3.0 3 4.0 X f2(X) 1.0 1 2.0 2 2.0 3 0.0 X F (X) 1.0 1 4.0 2 6.0 3 0.0

×

=

4.0 = 4.0 + 2.0 = 8.0

× × ×

slide-12
SLIDE 12

Mini-Bucket Approximation

Split a bucket into mini-buckets ―> bound complexity Exponential complexity decrease: bucket (X) =

slides10 828X 2019

slide-13
SLIDE 13

Mini-Bucket Elimination

A D E C B

bucket E: bucket C: bucket D: bucket B: bucket A:

mini-buckets

U = upper bound [Dechter & Rish 2003]

slides10 828X 2019

P(d|a,b) p(b|a) P(e|b,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a) λ𝐶→𝐸 a, d = 𝑛𝑏𝑦𝑐 P(d|a,b) p(b|a) λ𝐶→𝐷 𝑓, 𝑑 = 𝑛𝑏𝑦𝑐P(e|b,c) λ𝐶→𝐸(𝑏, 𝑒) = 𝑛𝑏𝑦𝑒 …. e=0

slide-14
SLIDE 14

Mini-Bucket Elimination

bucket E: bucket C: bucket D: bucket B: bucket A:

mini-buckets

U = upper bound [Dechter & Rish 2003]

slides10 828X 2019

P(d|a,b) p(b|a) P(e|b’,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a) λ𝐶→𝐸 a, d = 𝑛𝑏𝑦𝑐 P(d|a,b) p(b|a) λ𝐶→𝐷 𝑓, 𝑑 = 𝑛𝑏𝑦𝑐P(e|b,c) λ𝐶→𝐸(𝑏, 𝑒) = 𝑛𝑏𝑦𝑒 ….

A D E C B B’

e=0

slide-15
SLIDE 15

Mini-Bucket Elimination

A D E C B A D E C B B’ mini-buckets

[Dechter and Rish, 1997; 2003]

E: C: D: B: A:

U = upper bound P(d|a,b) p(b|a) P(e|b’,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a)

max

𝐶′ ෑ 𝑔

max

𝐶

ෑ 𝑔

e=0

slide-16
SLIDE 16

Mini-Bucket Decoding

  • Assign values in reverse order using approximate messages

Greedy configuration = lower bound

E: C: D: B: A:

mini-buckets

U = upper bound P(d|a,b) p(b|a) P(e|b’,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a) a*= 𝑏𝑠𝑕𝑛𝑏𝑦 𝑏P(a) 𝑠𝑓𝑢𝑣𝑠𝑜(a∗,e∗,d∗,c∗,b∗) λ𝐸→𝐵(𝑏) λ𝐹→𝐵(𝑏) e*= 0 d*= 𝑏𝑠𝑕𝑛𝑏𝑦 𝑒λ𝐶→𝐸(a∗, 𝑒) c*= 𝑏𝑠𝑕𝑛𝑏𝑦 𝑓λ𝐶→𝐷(e∗, 𝑑) b*= 𝑏𝑠𝑕𝑛𝑏𝑦 𝑐P(e∗|b, c∗)P(d|a∗,b) p(b|a∗) e=0

slide-17
SLIDE 17

slides10 828X 2019 22

Semantics of Mini-Bucket: Splitting a Node

U U

Û

Before Splitting: Network N After Splitting: Network N'

Variables in different buckets are renamed and duplicated (Kask et. al., 2001), (Geffner et. al., 2007), (Choi, Chavira, Darwiche , 2007)

slide-18
SLIDE 18

MBE-MPE(i): Algorithm MBE-mpe

  • Input: I – max number of variables allowed in a mini-bucket
  • Output: [lower bound (P of suboptimal solution), upper bound]

E: C: D: B: A: U = Upper bound Example: MBE-mpe(3) versus BE-mpe E: C: D: B: A: OPT 2: 3: 3: 3: 1:

max variables in a mini-bucket

𝑥∗ = 2 𝑥∗ =4 P(d|a,b) p(b|a) P(e|b’,c) λ𝐶→𝐷(e, c) λ𝐶→𝐸(𝑏, 𝑒) P(c|a) λ𝐸→𝐵(𝑏) P(a) e=0

slide-19
SLIDE 19

Mini-Bucket Decoding (for min-sum)

slides10 828X 2019

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑑, 𝑏 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

min

𝐶 ෍ 𝑔(∙)

min

𝐶 ෍ 𝑔(∙)

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound

ො 𝑏 = arg min

𝑏 𝑔 𝑏 + 𝜇𝐹→𝐵(𝑏)

Ƹ 𝑓 = arg min

𝑓

𝜇𝐷→𝐹(ො 𝑏, 𝑓) + 𝜇𝐸→𝐹(ො 𝑏, 𝑓) መ 𝑒 = arg min

𝑒 𝑔 ො

𝑏, 𝑒 + 𝜇𝐶→𝐸(𝑒, Ƹ 𝑓) Ƹ 𝑑 = arg min

𝑑

𝜇𝐶→𝐷 ො 𝑏, 𝑑 + 𝑔 𝑑, ො 𝑏 + 𝑔(𝑑, Ƹ 𝑓) ෠ 𝑐 = arg min

𝑐 𝑔 ො

𝑏, 𝑐 + 𝑔 𝑐, Ƹ 𝑑 +𝑔 𝑐, መ 𝑒 + 𝑔(𝑐, Ƹ 𝑓)

Greedy configuration = upper bound

[Dechter and Rish, 2003]

slide-20
SLIDE 20

(i,m)-Patitionings

slides10 828X 2019

slide-21
SLIDE 21

slides10 828X 2019

MBE(i,m), MBE(i)

  • Input: Belief network ( )
  • Output: upper and lower bounds
  • Initialize: put functions in buckets along ordering
  • Process each bucket from p=n to 1
  • Create (i,m)-partitions
  • Process each mini-bucket
  • (For mpe): assign values in ordering d
  • Return: mpe-configuration, upper and lower bounds

n

P P ,...,

1

slide-22
SLIDE 22
slide-23
SLIDE 23

Partitioning, Refinements

slides10 828X 2019

slide-24
SLIDE 24

Properties of MBE(i)

  • Complexity: O(r exp(i)) time and O(exp(i)) space
  • Yields a lower bound and an upper bound
  • Accuracy: determined by upper/lower (U/L) bound
  • Possible use of mini-bucket approximations
  • As anytime algorithms
  • As heuristics in search
  • Other tasks (similar mini-bucket approximations)
  • Belief updating, Marginal MAP, MEU, WCSP, Max-CSP

[Dechter and Rish, 1997], [Liu and Ihler, 2011], [Liu and Ihler, 2013]

slides10 828X 2019

slide-25
SLIDE 25

Anytime Approximation

slides10 828X 2019

slide-26
SLIDE 26

MBE for Belief Updating and for Probability of Evidence or Partition Function

  • Idea mini-bucket is the same:
  • So we can apply a sum in each mini-bucket, or better, one sum and the rest max, or min (for

lower-bound)

  • MBE-bel-max(i,m), MBE-bel-min(i,m) generating upper and lower-bound on beliefs approximates

BE-bel

  • MBE-map(i,m): max mini-buckets will be maximized, sum mini-buckets will be sum-max.

Approximates BE-map.

) ( max ) ( ) ( ) ( ) ( ) ( ) ( ) ( X g x f x g x f x g x f x g x f

X X X X X X

   

slides10 828X 2019

slide-27
SLIDE 27

slides10 828X 2019

slide-28
SLIDE 28

slides10 828X 2019

Anytime-mpe(0.0001) U/L error vs time

Time and parameter i

1 10 100 1000

Upper/Lower

0.6 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8

cpcs422b cpcs360b

i=1 i=21

CPCS Networks – Medical Diagnosis (noisy-OR model)

Test case: no evidence

505.2 70.3 anytime-mpe( ), 110.5 70.3 anytime-mpe( ), 1697.6 115.8 elim-mpe cpcs422 cpcs360 Algorithm Time (sec)

4

10 − = 

1

10 − = 

slide-29
SLIDE 29

Outline

  • Mini-bucket elimination
  • Weighted Mini-bucket
  • Mini-clustering
  • Re-parameterization, cost-shifting
  • Iterative Belief propagation
  • Iterative-join-graph propagation

slides10 828X 2019

slide-30
SLIDE 30

Decomposition for Sum

  • Generalize technique to sum via Holder’s inequality:
  • Define the weighted (or powered) sum:
  • “Temperature” interpolates between sum & max:
  • Different weights do not commute:

slides10 828X 2019

slide-31
SLIDE 31

The Power Sum and Holder Inequality

slides10 828X 2019

slide-32
SLIDE 32

Working Example

  • Model:
  • Markov network
  • Task:
  • Partition function

A B C

(Qiang Liu slides)

slides10 828X 2019

slide-33
SLIDE 33

Mini-Bucket (Basic Principles)

  • Upper bound
  • Lower bound

(Qiang Liu slides)

slides10 828X 2019

slide-34
SLIDE 34

Holder Inequality

  • Where and
  • When , the equality is achieved.
  • G. H. Hardy, J. E. Littlewood and G. Pólya, Inequalities, Cambridge Univ. Press, London and

New York, 1934.

(Qiang Liu slides)

slide-35
SLIDE 35

Reverse Holder Inequality

  • If

, but the direction of the inequality reverses.

  • G. H. Hardy, J. E. Littlewood and G. Pólya, Inequalities, Cambridge Univ. Press, London and

New York, 1934.

(Qiang Liu slides)

slide-36
SLIDE 36

Weighted Mini-Bucket

slides10 828X 2019

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) U = upper bound

𝑦 𝑥

ෑ 𝑔(𝑦)

𝜇𝐶 𝑏, 𝑑, 𝑒, 𝑓 = ෍

𝑐

𝑔 𝑏, 𝑐 ⋅ 𝑔 𝑐, 𝑑 ⋅ 𝑔 𝑐, 𝑒 ⋅ 𝑔 𝑐, 𝑓

Exact bucket elimination:

≤ ෍

𝑐 𝑥1

𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 ⋅ ෍

𝑐 𝑥2

𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓

= 𝜇𝐶→𝐷 (𝑏, 𝑑) ⋅ 𝜇𝐶→𝐸 (𝑒, 𝑓)

(mini-buckets)

where

𝑦 𝑥

𝑔 𝑦 = ෍

𝑦

𝑔 𝑦

1 𝑥 𝑥

is the weighted or “power” sum operator

𝑦 𝑥

𝑔

1 𝑦 𝑔 2 𝑦 ≤ ෍ 𝑦 𝑥1

𝑔

1 𝑦

𝑦 𝑥2

𝑔

2 𝑦

where

𝑥1 + 𝑥2 = 𝑥 𝑥1 > 0, 𝑥2 > 0

and

𝑥1 > 0, 𝑥2 < 0

(lower bound if ) [Liu and Ihler, 2011]

(for summation)

slide-37
SLIDE 37

slides10 828X 2019

slide-38
SLIDE 38

slides10 828X 2019

Weighted-mini-bucket for Marginal Map

slide-39
SLIDE 39

Bucket Elimination

A B C E D

MAX SUM

B: C: D: E: A:

MAP* is the marginal MAP value constrained elimination order

Bucket Elimination for MMAP

slides7 828X 2019

slide-40
SLIDE 40

MB and WMB for Marginal MAP

slides10 828X 2019

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) U = upper bound

[Liu and Ihler, 2011; 2013] [Dechter and Rish, 2003]

Marginal MAP

Σ𝐶 Σ𝐷

maxA maxE maxD

𝜇𝐶→𝐷 𝑏, 𝑑 = ෍

𝑐 𝑥1

𝑔 𝑏, 𝑐 𝑔(𝑐, 𝑑) 𝜇𝐶→𝐸 𝑒, 𝑓 = ෍

𝑐 𝑥2

𝑔 𝑐, 𝑒 𝑔(𝑐, 𝑓) (𝑥1 + 𝑥2 = 1)

. . .

𝜇𝐹→𝐵 𝑏 = max

𝑓

𝜇𝐷→𝐹 𝑏, 𝑓 𝜇𝐸→𝐹(𝑏, 𝑓) 𝑉 = max

𝑏

𝑔 𝑏 𝜇𝐹→𝐵(𝑏)

Can optimize over cost-shifting and weights (single pass “MM” or iterative message passing)

A B C E D

slide-41
SLIDE 41

MBE-map

slides10 828X 2019

Process max buckets With max mini-buckets And sum buckets with weighted Mini-buckets

slide-42
SLIDE 42

slides10 828X 2019

Initial partitioning

slide-43
SLIDE 43

slides10 828X 2019

slide-44
SLIDE 44

Complexity and Tractability of MBE(i,m)

slides10 828X 2019

slide-45
SLIDE 45

Outline

  • Mini-bucket elimination
  • Weighted Mini-bucket
  • Mini-clustering
  • Re-parameterization, cost-shifting
  • Iterative Belief propagation
  • Iterative-join-graph propagation

slides10 828X 2019

slide-46
SLIDE 46

ABC

2 4

) , | ( ) | ( ) ( ) , (

) 2 , 1 (

b a c p a b p a p c b h

a

  = 

1 3 BEF

EFG

) , ( ) , | ( ) | ( ) , (

) 2 , 3 ( , ) 1 , 2 (

f b h d c f p b d p c b h

f d

  =  ) , ( ) , | ( ) | ( ) , (

) 2 , 1 ( , ) 3 , 2 (

c b h d c f p b d p f b h

d c

  =  ) , ( ) , | ( ) , (

) 3 , 4 ( ) 2 , 3 (

f e h f b e p f b h

e

 =  ) , ( ) , | ( ) , (

) 3 , 2 ( ) 4 , 3 (

f b h f b e p f e h

b

 =  ) , | ( ) , (

) 3 , 4 (

f e g G p f e h

e

= =

EF BF BC

BCDF

Join-Tree Clustering (Cluster-Tree Elimination)

G E F C D B A

Time and space: exp(cluster size)= exp(treewidth) EXACT algorithm

slides10 828X 2019

slide-47
SLIDE 47

slides10 828X 2019

We can replace the sum with power sum For weights that sum to 1 in each mini-bucket

slide-48
SLIDE 48

Mini-Clustering, i-bound=3

) , | ( ) | ( ) ( ) , (

1 ) 2 , 1 (

b a c p a b p a p c b h

a

  = 

A B C p(a), p(b|a), p(c|a,b) B C D p(d|b), h(1,2)(b,c) C D F p(f|c,d) B E F p(e|b,f), h1

(2,3)(b), h2 (2,3)(f)

E F G p(g|e,f)

2 4 1 3

EF BC BF

) , | ( max ) (

, 2 ) 3 , 2 (

d c f p f h

d c

=

) , ( ) | ( ) (

1 ) 2 , 1 ( , 1 ) 3 , 2 (

c b h b d p b h

d c

 = 

G E F C D B A

Time and space: exp(i-bound) APPROXIMATE algorithm

Number of variables in a mini-cluster

slides10 828X 2019

slide-49
SLIDE 49

slides10 828X 2019

EF BF BC ) , | ( ) | ( ) ( : ) , (

1 ) 2 , 1 (

b a c p a b p a p c b h

a

  = 

) 2 , 1 (

H

) , | ( max : ) ( ) , ( ) | ( : ) (

, 2 ) 1 , 2 ( 1 ) 2 , 3 ( , 1 ) 1 , 2 (

d c f p c h f b h b d p b h

f d f d

=  = 

) 1 , 2 (

H

) , | ( max : ) ( ) , ( ) | ( : ) (

, 2 ) 3 , 2 ( 1 ) 2 , 1 ( , 1 ) 3 , 2 (

d c f p f h c b h b d p b h

d c d c

=  = 

) 3 , 2 (

H

) , ( ) , | ( : ) , (

1 ) 3 , 4 ( 1 ) 2 , 3 (

f e h f b e p f b h

e

 = 

) 2 , 3 (

H

) ( ) ( ) , | ( : ) , (

2 ) 3 , 2 ( 1 ) 3 , 2 ( 1 ) 4 , 3 (

f h b h f b e p f e h

b

  = 

) 4 , 3 (

H

) , | ( : ) , (

1 ) 3 , 4 (

f e g G p f e h

e

= =

) 3 , 4 (

H

ABC

2 4 1 3

BEF EFG BCDF

Mini-Clustering - Example

G E F C D B A

slide-50
SLIDE 50

slides10 828X 2019

Cluster Tree Elimination vs. Mini-Clustering

ABC

2 4

) , (

) 2 , 1 (

c b h

1 3

BEF EFG

) , (

) 1 , 2 (

c b h ) , (

) 3 , 2 (

f b h ) , (

) 2 , 3 (

f b h ) , (

) 4 , 3 (

f e h ) , (

) 3 , 4 (

f e h

EF BF BC BCDF

) , (

1 ) 2 , 1 (

c b h

) ( ) (

2 ) 1 , 2 ( 1 ) 1 , 2 (

c h b h ) ( ) (

2 ) 3 , 2 ( 1 ) 3 , 2 (

f h b h

) , (

1 ) 2 , 3 (

f b h ) , (

1 ) 4 , 3 (

f e h ) , (

1 ) 3 , 4 (

f e h

) 2 , 1 (

H

) 1 , 2 (

H

) 3 , 2 (

H

) 2 , 3 (

H

) 4 , 3 (

H

) 3 , 4 (

H

ABC

2 4 1 3

BEF EFG EF BF BC BCDF

CTE MC

G E F C D B A

slide-51
SLIDE 51

Heuristics for partitioning

(Dechter and Rish, 2003, Rollon and Dechter 2010) Use greedy heuristic derived from a distance function to decide which functions go into a single mini-bucket Scope-based Partitioning Heuristic (SCP) aims at minimizing the number of mini-buckets in the partition by including in each minibucket as many functions as respecting the i bound is satisfied

slides10 828X 2019

slide-52
SLIDE 52

Greedy Scope-based Partitioning

slides10 828X 2019

slide-53
SLIDE 53

Heuristic for Partitioning

Scope-based Partitioning Heuristic. The scope-based partition heuristic (SCP) aims at minimizing the number of mini-buckets in the partition by including in each mini-bucket as many functions as possible as long as the i bound is satisfied. First, single function mini-buckets are decreasingly ordered according to their arity from left to right. Then, each mini-bucket is absorbed into the left-most mini-bucket with whom it can be merged. The time complexity of Partition(B, i) , where B is the bucket to be partitioned, and |B|,the number of functions in the bucket, using the SCP heuristic is O(|B| log (|B|) + |B|^2) . The scope-based heuristic is is quite fast, its shortcoming is that it does not consider the actual information in the functions.

slides10 828X 2019

slide-54
SLIDE 54

Greedy Partition as a function of a distance function h

slides10 828X 2019

slide-55
SLIDE 55

slides10 828X 2019

Comparing Mini-clustering against Belief Propagation. What is belief propagation

slide-56
SLIDE 56

slides10 828X 2019

Iterative Belief Proapagation

  • Belief propagation is exact for poly-trees
  • IBP - applying BP iteratively to cyclic networks
  • No guarantees for convergence
  • Works well for many coding networks

) (

1

1 u

X

1

U

2

U

3

U

2

X

1

X

) (

1

2 x

U

 ) (

1

2 u

X

 ) (

1

3 x

U

 ) BEL(U update : step One

1

slide-57
SLIDE 57

Linear Block Codes

A B C D E F G H + + + + + + a b c d e f g h p1 p2 p3 p4 p5 p6 Input bits Parity bits Received bits Received bits Gaussian channel noise

σ σ

slides10 828X 2019

slide-58
SLIDE 58

slides10 828X 2019

Probabilistic decoding

Error-correcting linear block code State-of-the-art: approximate algorithm – iterative belief propagation (IBP) (Pearl’s poly-tree algorithm applied to loopy networks)

slide-59
SLIDE 59

MBE-mpe vs. IBP

Bit error rate (BER) as a function of noise (sigma):

slides10 828X 2019

MBE-mpe is better on low w* codes IBP (or BP) is better on randomly generated (high w*) codes.

slide-60
SLIDE 60

Grid 15x15 - 10 evidence

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

2 4 6 8 10 12 14 16 18

NHD

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 MC IBP

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

2 4 6 8 10 12 14 16 18

Absolute error

0.00 0.01 0.02 0.03 0.04 0.05 0.06 MC IBP

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

2 4 6 8 10 12 14 16 18

Relative error

0.00 0.02 0.04 0.06 0.08 0.10 0.12 MC IBP

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

2 4 6 8 10 12 14 16 18

Time (seconds)

2 4 6 8 10 12 MC IBP

slides10 828X 2019

slide-61
SLIDE 61

Outline

  • Mini-bucket elimination
  • Weighted Mini-bucket
  • Mini-clustering
  • Iterative Belief propagation
  • Iterative-join-graph propagation
  • Re-parameterization, cost-shifting

slides10 828X 2019

slide-62
SLIDE 62

Iterative Belief Proapagation

  • Belief propagation is exact for poly-trees
  • IBP - applying BP iteratively to cyclic networks
  • No guarantees for convergence
  • Works well for many coding networks
  • Lets combine iterative-nature with anytime--IJGP

) (

1

1 u

X

1

U

2

U

3

U

2

X

1

X

) (

1

2 x

U

 ) (

1

2 u

X

 ) (

1

3 x

U

 ) BEL(U update : step One

1

slides10 828X 2019

slide-63
SLIDE 63

Iterative Join Graph Propagation

  • Loopy Belief Propagation
  • Cyclic graphs
  • Iterative
  • Converges fast in practice (no guarantees though)
  • Very good approximations (e.g., turbo decoding, LDPC codes, SAT – survey propagation)
  • Mini-Clustering(i)
  • Tree decompositions
  • Only two sets of messages (inward, outward)
  • Anytime behavior – can improve with more time by increasing the i-bound
  • We want to combine:
  • Iterative virtues of Loopy BP
  • Anytime behavior of Mini-Clustering(i)

slides10 828X 2019

slide-64
SLIDE 64

slides10 828X 2019

IJGP - The basic idea

  • Apply Cluster Tree Elimination to any join-graph
  • We commit to graphs that are I-maps
  • Avoid cycles as long as I-mapness is not violated
  • Result: use minimal arc-labeled join-graphs
slide-65
SLIDE 65

93

G E F C D B A

) p(b|a ) p(a ) , | b a p(c ) ,d p(f|c ) P(d|b ) , | f b p(e ) , f p(g|e

Tree Decomposition for Belief Updating

slide-66
SLIDE 66

94

G E F C D B A

) p(b|a ) p(a ) , | b a p(c ) ,d p(f|c ) P(d|b ) , | f b p(e ) , f p(g|e

Tree Decomposition for belief updating

A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) B E F p(e|b,f) E F G p(g|e,f) EF BF BC

slide-67
SLIDE 67

CT CTE: : Clu luster Tree Eli limination

95

) , | ( ) | ( ) ( ) , (

) 2 , 1 (

b a c p a b p a p c b h

a

  = ) , ( ) , | ( ) | ( ) , (

) 2 , 3 ( , ) 1 , 2 (

f b h d c f p b d p c b h

f d

  = ) , ( ) , | ( ) | ( ) , (

) 2 , 1 ( , ) 3 , 2 (

c b h d c f p b d p f b h

d c

  = ) , ( ) , | ( ) , (

) 3 , 4 ( ) 2 , 3 (

f e h f b e p f b h

e

 = ) , ( ) , | ( ) , (

) 3 , 2 ( ) 4 , 3 (

f b h f b e p f e h

b

 = ) , | ( ) , (

) 3 , 4 (

f e g G p f e h

e

= =

G E F C D B A

ABC

2 4 1 3

BEF EFG

EF BF BC

BCDF

Time: O ( exp(w+1 )) Space: O ( exp(sep))

For each cluster P(X|e) is computed, also P(e)

slide-68
SLIDE 68

Example

96

property)

  • n

intersecti (running subtree connected a forms set the ble each varia For 2. and such that vertex

  • ne

exactly is there function each For 1. : satisfying and sets, two x each verte with g associatin functions, labeling are and and tree a is where , , , triple a is network belief a for A χ(v)} V|X {v X X χ(v) ) scope(p ψ(v) p P p P ψ(v) X χ(v) V v ψ χ (V,E) T T X,D,G,P BN ion decomposit tree

i i i i i

         =    =  

A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) B E F p(e|b,f) E F G p(g|e,f) EF BF BC

G E F C D B A

Belief network Tree decomposition

slide-69
SLIDE 69

slides10 828X 2019

IJGP - The basic idea

  • Apply Cluster Tree Elimination to any join-graph
  • We commit to graphs that are I-maps
  • Avoid cycles as long as I-mapness is not violated
  • Result: use minimal arc-labeled join-graphs
slide-70
SLIDE 70

a) Fragment of an arc-labeled join-graph

Minimal Arc-Labeled Decomposition

  • Use a DFS algorithm to eliminate cycles relative to

each variable

a) Shrinking labels to make it a minimal arc-labeled join-graph

ABCDE BCE CDEF BC CDE CE ABCDE BCE CDEF BC DE CE

slides10 828X 2019

slide-71
SLIDE 71

Minimal arc-labeled join-graph

slide-72
SLIDE 72

Message propagation

ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F F GH GI

ABCDE p(a), p(c), p(b|ac), p(d|abe),p(e|b,c) h(3,1)(bc) BCE CDEF BC CDE CE

1 3 2

h(3,1)(bc) h(1,2)

Minimal arc-labeled: sep(1,2)={D,E} elim(1,2)={A,B,C} Non-minimal arc-labeled: sep(1,2)={C,D,E} elim(1,2)={A,B}

=

c b a

bc h bc e p abe d p ac b p c p a p de h

, , ) 1 , 3 ( ) 2 , 1 (

) ( ) | ( ) | ( ) | ( ) ( ) ( ) (

=

b a

bc h bc e p abe d p ac b p c p a p cde h

, ) 1 , 3 ( ) 2 , 1 (

) ( ) | ( ) | ( ) | ( ) ( ) ( ) (

slides10 828X 2019

slide-73
SLIDE 73

IJGP - Example

A D I B E J F G C H

Belief network

A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C A AB BC BE C C DE CE F H F FG GH H GI

Loopy BP graph

slides10 828X 2019

slide-74
SLIDE 74

Arc-Minimal Join-Graph

A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C A AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C A AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH GI

Arcs labeled with any single variable should form a TREE

slides10 828X 2019

slide-75
SLIDE 75

A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC BE C DE CE H F FG GH GI

Collapsing Clusters

ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F FG GH GI ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F FG GH GI ABCDE FGI BCE GHIJ CDEF FGH BC CDE CE F FG GH GI ABCDE FGHI GHIJ CDEF CDE F GHI slides10 828X 2019

slide-76
SLIDE 76

Join-Graphs

A ABDE FGI ABC BCE GHIJ CDEF FGH C H A C A AB BC BE C C DE CE F H F FG GH H GI A ABDE FGI ABC BCE GHIJ CDEF FGH C H A AB BC C DE CE H F F GH GI ABCDE FGI BCE GHIJ CDEF FGH BC DE CE F F GH GI ABCDE FGHI GHIJ CDEF CDE F GHI

more accuracy less complexity

slides10 828X 2019

slide-77
SLIDE 77

slides10 828X 2019

Bounded decompositions

  • We want arc-labeled decompositions such that:
  • the cluster size (internal width) is bounded by i (the accuracy parameter)
  • Possible approaches to build decompositions:
  • partition-based algorithms - inspired by the mini-bucket decomposition
  • grouping-based algorithms
slide-78
SLIDE 78

Constructing Join-Graphs

a) schematic mini-bucket(i), i=3 b) arc-labeled join-graph decomposition

CDB CAB BA A

CB

P(D|B) P(C|A,B) P(A)

BA

P(B|A) FCD P(F|C,D) GFE EBF BF

EF

P(E|B,F) P(G|F,E)

B CD BF A

F

G: (GFE) E: (EBF) (EF) F: (FCD) (BF) D: (DB) (CD) C: (CAB) (CB) B: (BA) (AB) (B) A: (A)

G E F C D B A

slides10 828X 2019

slide-79
SLIDE 79

slides10 828X 2019

IJGP properties

  • IJGP(i) applies BP to min arc-labeled join-graph, whose cluster size is bounded by i
  • On join-trees IJGP finds exact beliefs
  • IJGP is a Generalized Belief Propagation algorithm (Yedidia, Freeman, Weiss 2001)
  • Complexity of one iteration:
  • time:

O(deg•(n+N) •d i+1)

  • space:

O(N•d)

slide-80
SLIDE 80

Empirical evaluation

  • Algorithms:
  • Exact
  • IBP
  • MC
  • IJGP

◼ Measures:

◼ Absolute error ◼ Relative error ◼ Kulbach-Leibler (KL) distance ◼ Bit Error Rate ◼ Time

◼ Networks (all variables are binary):

◼ Random networks ◼ Grid networks (MxM) ◼ CPCS 54, 360, 422 ◼ Coding networks

slides10 828X 2019

slide-81
SLIDE 81

Coding Networks – Bit Error Rate

Coding, N=400, 500 instances, 30 it, w*=43, sigma=.32

i-bound

2 4 6 8 10 12

BER

0.00237 0.00238 0.00239 0.00240 0.00241 0.00242 0.00243 IBP IJGP

Coding, N=400, 500 instances, 30 it, w*=43, sigma=.51

i-bound

2 4 6 8 10 12

BER

0.0745 0.0750 0.0755 0.0760 0.0765 0.0770 0.0775 0.0780 0.0785 IBP IJGP

Coding, N=400, 500 instances, 30 it, w*=43, sigma=.65

i-bound

2 4 6 8 10 12

BER

0.1900 0.1902 0.1904 0.1906 0.1908 0.1910 0.1912 0.1914 IBP IJGP

Coding, N=400, 1000 instances, 30 it, w*=43, sigma=.22

i-bound

2 4 6 8 10 12

BER

1e-5 1e-4 1e-3 1e-2 1e-1 IJGP MC IBP

σ = .22 σ = .51 σ = .65 σ = .32

slides10 828X 2019

slide-82
SLIDE 82

CPCS 422, evid=0, w*=23, 1instance

i-bound

2 4 6 8 10 12 14 16 18

KL distance

0.0001 0.001 0.01 0.1 IJGP 30 it (at convergence) MC IBP 10 it (at convergence)

CPCS 422 – KL Distance

CPCS 422, evid=30, w*=23, 1instance

i-bound

3 4 5 6 7 8 9 10 11 12 13 14 15 16

KL distance

0.0001 0.001 0.01 0.1 IJGP at convergence MC IBP at convergence

evidence=0 evidence=30

slides10 828X 2019

slide-83
SLIDE 83

CPCS 422 – KL vs. Iterations

CPCS 422, evid=0, w*=23, 1instance

number of iterations

5 10 15 20 25 30 35

KL distance

0.0001 0.001 0.01 0.1 IJGP (3) IJGP(10) IBP

CPCS 422, evid=30, w*=23, 1instance

number of iterations

5 10 15 20 25 30 35

KL distance

0.0001 0.001 0.01 0.1 1 IJGP(3) IJGP(10) IBP

evidence=0 evidence=30

slides10 828X 2019

slide-84
SLIDE 84

slides10 828X 2019

Coding networks - Time

Coding, N=400, 500 instances, 30 iterations, w*=43

i-bound

2 4 6 8 10 12

Time (seconds)

2 4 6 8 10 IJGP 30 iterations MC IBP 30 iterations

slide-85
SLIDE 85

More On the Power of Belief Propagation

  • BP as local minima of KL distance (Read Darwiche)
  • BP’s power from constraint propagation perspective.

115

slide-86
SLIDE 86
slide-87
SLIDE 87
slide-88
SLIDE 88
slide-89
SLIDE 89

Lambda is Grounding for evidence e)

slide-90
SLIDE 90

Theorem: Yedidia, Frieman and Weiss 2005

slide-91
SLIDE 91
slide-92
SLIDE 92
slide-93
SLIDE 93

Summary of IJGP so far

slide-94
SLIDE 94

Outline

  • Mini-bucket elimination
  • Weighted Mini-bucket
  • Mini-clustering
  • Iterative Belief propagation
  • Iterative-join-graph propagation
  • Re-parameterization, cost-shifting

slides10 828X 2019

slide-95
SLIDE 95

Cost-Shifting

(Reparameterization) A B f(A,B) b b 6 + 3 b g 0 – 1 g b 0 + 3 g g 6 – 1 B C f(B,C) b b 6 – 3 b g 0 – 3 g b 0 + 1 g g 6 + 1 A B C f(A,B,C) b b b 12 b b g 6 b g b b g g 6 g b b 6 g b g g g b 6 g g g 12 B λ(B) b 3 g

  • 1

+ 𝜇(𝐶) − 𝜇(𝐶) Modify the individual functions

  • but –

keep the sum of functions the same = 0 + 6

slides10 828X 2019

slide-96
SLIDE 96

Tightening the bound

  • Reparameterization (or, “cost shifting”)
  • Decrease bound without changing overall function

127

+

B C f2(B,C) 1.0 1 0.0 1 1.0 1 1 3.0 A B C F(A,B,C) 3.0 1 2.0 1 2.0 1 1 4.0 1 4.5 1 1 3.5 1 1 4.0 1 1 1 6.0

=

A B f1(A,B) ¸(B) 2.0 1 3.5 1 1.0

+1

1 1 3.0 B C f2(B,C)

  • ¸(B)

1.0 1 0.0 1 1.0

  • 1

1 1 3.0 A B f1(A,B) 2.0 1 3.5 1 1.0 1 1 3.0

+ =

(Adjusting functions cancel each other)

slides10 828X 2019

(Decomposition bound is exact)

slide-97
SLIDE 97

Dual Decomposition

𝑦1 𝑦3 𝑦2

𝑔

13(𝑦1, 𝑦3)

𝑔

23(𝑦2, 𝑦3)

𝑔

12(𝑦1, 𝑦2)

𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2

𝑔

12(∙)

𝑔

13(∙)

𝑔

23(∙)

𝐺∗ = min

𝑦 ෍ 𝛽

𝑔

𝛽(𝑦)

≥ ෍

𝛽

min

𝑦

𝑔

𝛽(𝑦)

  • Bound solution using decomposed optimization
  • Solve independently: optimistic bound

slides10 828X 2019

slide-98
SLIDE 98

Dual Decomposition

𝑦1 𝑦3 𝑦2

𝑔

13(𝑦1, 𝑦3)

𝑔

23(𝑦2, 𝑦3)

𝑔

12(𝑦1, 𝑦2)

𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2

𝑔

12(∙)

𝑔

13(∙)

𝑔

23(∙)

𝐺∗ = min

𝑦 ෍ 𝛽

𝑔

𝛽(𝑦)

≥ ෍

𝛽

min

𝑦

𝑔

𝛽(𝑦)

  • Bound solution using decomposed optimization
  • Solve independently: optimistic bound
  • Tighten the bound by reparameterization

‒ Enforce lost equality constraints via Lagrange multipliers 𝜇1→13(𝑦1) 𝜇1→12(𝑦1) 𝜇3→13(𝑦3) 𝜇3→23(𝑦3) 𝜇2→13(𝑦2) 𝜇2→23(𝑦2)

∀𝑘 ∶ ෍

𝛽∋𝑘

𝜇𝑘→𝛽 𝑦𝑘 = 0 Reparameterization:

max

𝜇𝑗→𝛽

+ ෍

𝑗∈𝛽

𝜇𝑗→𝛽 𝑦𝑗

slides10 828X 2019

slide-99
SLIDE 99

Dual Decomposition

𝑦1 𝑦3 𝑦2

𝑔

13(𝑦1, 𝑦3)

𝑔

23(𝑦2, 𝑦3)

𝑔

12(𝑦1, 𝑦2)

𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2

𝑔

12(∙)

𝑔

13(∙)

𝑔

23(∙)

𝐺∗ = min

𝑦 ෍ 𝛽

𝑔

𝛽(𝑦)

≥ ෍

𝛽

min

𝑦

𝑔

𝛽(𝑦)

Many names for the same class of bounds:

‒ Dual decomposition [Komodakis et al. 2007] ‒ TRW, MPLP [Wainwright et al. 2005; Globerson & Jaakkola, 2007] ‒ Soft arc consistency [Cooper & Schiex, 2004] ‒ Max-sum diffusion [Warner 2007] 𝜇1→13(𝑦1) 𝜇1→12(𝑦1) 𝜇3→13(𝑦3) 𝜇3→23(𝑦3) 𝜇2→13(𝑦2) 𝜇2→23(𝑦2)

∀𝑘 ∶ ෍

𝛽∋𝑘

𝜇𝑘→𝛽 𝑦𝑘 = 0 Reparameterization:

max

𝜇𝑗→𝛽

+ ෍

𝑗∈𝛽

𝜇𝑗→𝛽 𝑦𝑗

slides10 828X 2019

slide-100
SLIDE 100

Dual Decomposition

slides10 828X 2019

𝑦1 𝑦3 𝑦2

𝑔

13(𝑦1, 𝑦3)

𝑔

23(𝑦2, 𝑦3)

𝑔

12(𝑦1, 𝑦2)

𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2

𝑔

12(∙)

𝑔

13(∙)

𝑔

23(∙)

𝐺∗ = min

𝑦 ෍ 𝛽

𝑔

𝛽(𝑦)

≥ ෍

𝛽

min

𝑦

𝑔

𝛽(𝑦)

Many ways to optimize the bound:

‒ Sub-gradient descent [Komodakis et al. 2007; Jojic et al. 2010] ‒ Coordinate descent [Warner 2007; Globerson & Jaakkola 2007; Sontag et al. 2009; Ihler et al. 2012] ‒ Proximal optimization [Ravikumar et al, 2010] ‒ ADMM [Meshi & Globerson 2011; Martins et al. 2011; Forouzan & Ihler 2013] 𝜇1→13(𝑦1) 𝜇1→12(𝑦1) 𝜇3→13(𝑦3) 𝜇3→23(𝑦3) 𝜇2→13(𝑦2) 𝜇2→23(𝑦2)

∀𝑘 ∶ ෍

𝛽∋𝑘

𝜇𝑘→𝛽 𝑦𝑘 = 0 Reparameterization:

max

𝜇𝑗→𝛽

+ ෍

𝑗∈𝛽

𝜇𝑗→𝛽 𝑦𝑗

slide-101
SLIDE 101

Optimizing the bound

  • Can optimize the bound in various ways:
  • (Sub-)gradient descent

A B f1(A,B) ¸(B) 1.0 1 0.0 1 0.0 1 1 2.5 2 1.0 1 2 3.0 B C f2(B,C)

  • ¸(B)

5.0 1 2.0 1 1.0 1 1 1.5 2 0.2 2 1 0.0

+ =

slides10 828X 2019

A B B C

slide-102
SLIDE 102

Optimizing the bound

  • Can optimize the bound in various ways:
  • (Sub-)gradient descent

A B f1(A,B) ¸(B) 1.0

+1

1 0.0 1 0.0 1 1 2.5 2 1.0

  • 1

1 2 3.0 B C f2(B,C)

  • ¸(B)

5.0

  • 1

1 2.0 1 1.0 1 1 1.5 2 0.2

+1

2 1 0.0

+ =

slides10 828X 2019

A B B C

slide-103
SLIDE 103

Optimizing the bound

  • Can optimize the bound in various ways:
  • (Sub-)gradient descent

A B f1(A,B) ¸(B) 1.0

+1

1 0.0 1 0.0 1 1 2.5 2 1.0

  • 1

1 2 3.0 B C f2(B,C)

  • ¸(B)

5.0

  • 1

1 2.0 1 1.0 1 1 1.5 2 0.2

+1

2 1 0.0

+ =

slides10 828X 2019

A B B C

slide-104
SLIDE 104

Optimizing the bound

  • Can optimize the bound in various ways:
  • (Sub-)gradient descent

A B f1(A,B) ¸(B) 1.0

+2

1 0.0 1 0.0

  • 1

1 1 2.5 2 1.0

  • 1

1 2 3.0 B C f2(B,C)

  • ¸(B)

5.0

  • 2

1 2.0 1 1.0

+1

1 1 1.5 2 0.2

+1

2 1 0.0

+ =

slides10 828X 2019

A B B C

slide-105
SLIDE 105

Various Update Schemes

  • Can use any decomposition updates
  • (message passing, subgradient, augmented, etc.)
  • FGLP: Update the original factors
  • JGLP: Update clique function of the join graph
  • MBE-MM: Mini-bucket with moment matching
  • Apply cost-shifting within each bucket only

slides10 828X 2019

slide-106
SLIDE 106

Factor graph Linear Programming

  • Update the original factors (FGLP)
  • Tighten all factors over over xi simultaneously
  • Compute max-marginals
  • & update:

slides10 828X 2019

slide-107
SLIDE 107

E: C: D: B: A:

Mini-Bucket as Decomposition

slides10 828X 2019

U = upper bound

Join graph:

{A,B,C} {B,D,E} {A,C,E} {A,D,E} {A,E} {A}

{B} {D,E} {A} {A} {A,E} {A,C}

  • Downward pass as cost shifting
  • Can also do cost shifting within

mini-buckets: “Join graph” message passing

  • “Moment-matching” version:

One message exchange within each bucket, during downward sweep

  • Optimal bound defined by cliques

(“regions”) and cost-shifting f’n scopes (“coordinates”)

[Ihler et al. 2012]

slide-108
SLIDE 108

MBE-MM: MBE with moment matching

A B C D E

P(A) P(B|A) P(C|A) P(E|B,C) P(D|A,B)

Bucket B Bucket C Bucket D Bucket E Bucket A

P(B|A) P(D|A,B) P(E|B,C) P(C|A) E = 0 P(A) maxB∏ hB (A,D) MPE* is an upper bound on MPE --U Generating a solution yields a lower bound--L maxB∏ hD (A) hC (A,E) hB (C,E) hE (A)

W=2 m11 m12 m11,m12- moment-matching messages

slides10 828X 2019

slide-109
SLIDE 109

MBE-MM (MBE with Moment-Matching)

slide-110
SLIDE 110

Anytime Approximation

  • Can tighten the bound in various ways
  • Cost-shifting (improve consistency between cliques)
  • Increase i-bound (higher order consistency)
  • Simple moment-matching step improves bound significantly

slides10 828X 2019

slide-111
SLIDE 111

Anytime Approximation

  • Can tighten the bound in various ways
  • Cost-shifting (improve consistency between cliques)
  • Increase i-bound (higher order consistency)
  • Simple moment-matching step improves bound significantly

slides10 828X 2019

slide-112
SLIDE 112

Anytime Approximation

  • Can tighten the bound in various ways
  • Cost-shifting (improve consistency between cliques)
  • Increase i-bound (higher order consistency)
  • Simple moment-matching step improves bound significantly

slides10 828X 2019

slide-113
SLIDE 113

Outline

  • Mini-bucket elimination
  • Weighted Mini-bucket
  • Mini-clustering
  • Iterative Belief propagation
  • Iterative-join-graph propagation
  • Re-parameterization, cost-shifting

slides10 828X 2019