Bounded in inference non-iteratively; ; Min ini-bucket t el - - PowerPoint PPT Presentation

bounded in inference non iteratively min ini bucket t el
SMART_READER_LITE
LIVE PREVIEW

Bounded in inference non-iteratively; ; Min ini-bucket t el - - PowerPoint PPT Presentation

Bounded in inference non-iteratively; ; Min ini-bucket t el eliminatio ion COMPSCI 276,Spring 2017 Set 8: Rina Dechter Reading: Class Notes (8), Darwiche chapters 14 Agenda Mini-bucket elimination Weighted Mini-bucket


slide-1
SLIDE 1

Bounded in inference non-iteratively; ; Min ini-bucket t el eliminatio ion

COMPSCI 276,Spring 2017 Set 8: Rina Dechter

Reading: Class Notes (8), Darwiche chapters 14

slide-2
SLIDE 2

Agenda

  • Mini-bucket elimination
  • Weighted Mini-bucket
  • Mini-clustering
  • Iterative Belief propagation
  • Iterative-join-graph propagation

2

slide-3
SLIDE 3

Probabilistic Inference Tasks

3

X/A a * k * 1

e) , x P( max arg ) a ,..., (a evidence) | x P(X ) BEL(X

i i i

 

▪ Belief updating: ▪ Finding most probable explanation (MPE) ▪ Finding maximum a-posteriory hypothesis ▪ Finding maximum-expected-utility (MEU) decision

e) , x P( max arg * x

x

 ) x U( e) , x P( max arg ) d ,..., (d

X/D d * k * 1

variables hypothesis : X A  function utility x variables decision : ) ( : U X D 

slide-4
SLIDE 4

Queries

  • Probability of evidence (or partition function)
  • Posterior marginal (beliefs):
  • Most Probable Explanation

 

 

) var( 1

| ) | ( ) (

e X n i e i i pa

x P e P



X i i i C

Z ) ( 

   

    

 

) var( 1 ) var( 1

| ) | ( | ) | ( ) ( ) , ( ) | (

e X n j e j j X e X n j e j j i i

pa x P pa x P e P e x P e x P

i

e) , x P( max arg * x

x

slide-5
SLIDE 5

5

Bucket Elimination

) , ( ) | (    e a P e a P

 

d e b c

c b e P b a d P a c P a b P a P e a P

, , ,

) , | ( ) , | ( ) | ( ) | ( ) ( ) , (

   

d c b e

b a d P c b e P a b P a c P a P ) , | ( ) , | ( ) | ( ) | ( ) (

Elimination Order: d,e,b,c Query:

D: E: B: C: A:

d D

b a d P b a f ) , | ( ) , ( ) , | ( b a d P ) , | ( c b e P ) , | ( ) , ( c b e P c b fE  

b E D B

c b f b a f a b P c a f ) , ( ) , ( ) | ( ) , ( ) ( ) ( ) , ( a f A p e a P

C

  ) (a P ) | ( a c P

c B C

c a f a c P a f ) , ( ) | ( ) ( ) | ( a b P

D,A,B E,B,C B,A,C C,A A ) , ( b a fD ) , ( c b fE ) , ( c a fB ) (a fC

A A D D E E C C B B

Bucket Tree

D E B C A

Original Functions Messages Time and space exp(w*)

slide-6
SLIDE 6

6

b

max

Elimination operator

MPE

W*=4 ”induced width” (max clique size)

bucket B: P(a) P(c|a) P(b|a) P(d|b,a) P(e|b,c) bucket C: bucket D: bucket E: bucket A: e=0 B C D E A

e) (a, hD

(a) hE e) d, (a, hC

Finding

Algorithm elim-mpe (Dechter 1996)

) x P( max MPE

x

) , | ( ) , | ( ) | ( ) | ( ) ( max by replaced is

, , , ,

c b e P b a d P a b P a c P a P MPE :

b c d e a

  max

slide-7
SLIDE 7

7

Generating the MPE-tuple

C: E: P(b|a) P(d|b,a) P(e|b,c) B: D: A: P(a) P(c|a) e=0

e) (a, hD

(a) hE

e) c, d, (a, hB

e) d, (a, hC

(a) h P(a) max arg a' 1.

E a

 

e' 2. 

) e' d, , (a' h max arg d' 3.

C d

) e' c, , d' , (a' h ) a' | P(c max arg c' 4.

B c

   ) c' b, | P(e' ) a' b, | P(d' ) a' | P(b max arg b' 5.

b

   

) e' , d' , c' , b' , (a' Return

slide-8
SLIDE 8

8

Approximate Inference

  • Metrics of evaluation
  • Absolute error: given e>0 and a query p= P(x|e), an estimate r has

absolute error e iff |p-r|<e

  • Relative error: the ratio r/p in [1-e,1+e].
  • Dagum and Luby 1993: approximation up to a relative error is NP-

hard.

  • Absolute error is also NP-hard if error is less than .5
slide-9
SLIDE 9

9

Mini-Buckets: “Local Inference”

  • Computation in a bucket is time and space

exponential in the number of variables involved

  • Therefore, partition functions in a bucket

into “mini-buckets” on smaller number of variables

slide-10
SLIDE 10

10

Mini-Bucket Approximation: MPE task

Split a bucket into mini-buckets =>bound complexity

X X

g h 

) ( ) ( ) O (e : d e c re a s e c o m p le xity l E xp o ne ntia

n r n r

e O e O

 

slide-11
SLIDE 11

12

Mini-Bucket Elimination

bucket A: bucket E: bucket D: bucket C: bucket B:

maxBΠ

F(a,b) F(a,d) hE(a) hB(a,c) hB(d,e) F(b,d) F(b,e) F(c,e) F(a,c) hC(e,a) L = lower bound

Mini-buckets A B C D E

F(b,c) e = 0 hD(e,a)

maxBΠ

We can generate a solution s going forward as before U= F(s)

slide-12
SLIDE 12

Mini-Bucket Elimination

A D E C B

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐′ 𝑔 𝑐′, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑑, 𝑏 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

A D E C B B’

min

𝐶′ ෍ 𝑔(∙)

min

𝐶 ෍ 𝑔(∙)

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound [Dechter and Rish, 1997; 2003]

slide-13
SLIDE 13

14 14

Semantics of Mini-Bucket: Splitting a Node

U U

Û

Before Splitting: Network N After Splitting: Network N'

Variables in different buckets are renamed and duplicated (Kask et. al., 2001), (Geffner et. al., 2007), (Choi, Chavira, Darwiche , 2007)

slide-14
SLIDE 14

Relaxed Network Example

15

slide-15
SLIDE 15

MBE-MPE(i): Algorithm MBE-mpe

  • Input: I – max number of variables allowed in a mini-bucket
  • Output: [lower bound (P of suboptimal solution), upper bound]

E: C: D: B: A:

𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑑, 𝑏 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏 𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏)

U = Upper bound Example: MBE-mpe(3) versus BE-mpe

[Dechter and Rish, 1997]

E: C: D: B: A:

𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑑, 𝑏 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏 𝜇𝐶→𝐷(𝑏, 𝑑, 𝑒, 𝑓) 𝜇𝐷→𝐸(𝑏, 𝑒, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏)

OPT 2: 3: 3: 3: 1:

max variables in a mini-bucket

𝑥∗ = 2 𝑥∗ =4

slide-16
SLIDE 16

Mini-Bucket Decoding

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑑, 𝑏 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

min

𝐶 ෍ 𝑔(∙)

min

𝐶 ෍ 𝑔(∙)

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound

ො 𝑏 = arg min

𝑏 𝑔 𝑏 + 𝜇𝐹→𝐵(𝑏)

Ƹ 𝑓 = arg min

𝑓

𝜇𝐷→𝐹(ො 𝑏, 𝑓) + 𝜇𝐸→𝐹(ො 𝑏, 𝑓) መ 𝑒 = arg min

𝑒 𝑔 ො

𝑏, 𝑒 + 𝜇𝐶→𝐸(𝑒, Ƹ 𝑓) Ƹ 𝑑 = arg min

𝑑

𝜇𝐶→𝐷 ො 𝑏, 𝑑 + 𝑔 𝑑, ො 𝑏 + 𝑔(𝑑, Ƹ 𝑓) ෠ 𝑐 = arg min

𝑐 𝑔 ො

𝑏, 𝑐 + 𝑔 𝑐, Ƹ 𝑑 +𝑔 𝑐, መ 𝑒 + 𝑔(𝑐, Ƹ 𝑓)

Greedy configuration = upper bound

[Dechter and Rish, 2003]

slide-17
SLIDE 17

(i,m)-Patitionings

18

slide-18
SLIDE 18

19

MBE(i,m), MBE(i)

  • Input: Belief network ( )
  • Output: upper and lower bounds
  • Initialize: put functions in buckets along ordering
  • Process each bucket from p=n to 1
  • Create (i,m)-partitions
  • Process each mini-bucket
  • (For mpe): assign values in ordering d
  • Return: mpe-configuration, upper and lower bounds

n

P P ,...,

1

slide-19
SLIDE 19
slide-20
SLIDE 20

Partitioning, Refinements

21

slide-21
SLIDE 21

22

Properties of MBE-mpe(i)

  • Complexity: O(r exp(i)) time and O(r exp(i)) space.
  • Accuracy: determined by upper/lower (U/L) bound.
  • As i increases, both accuracy and complexity increase.
  • Possible use of mini-bucket approximations:
  • As anytime algorithms
  • As heuristics in best-first search
slide-22
SLIDE 22

23

Anytime Approximation

slide-23
SLIDE 23

24

MBE for Belief Updating and for Probability of Evidence or Partition Function

  • Idea mini-bucket is the same:
  • So we can apply a sum in each mini-bucket, or better, one sum and the rest max, or min (for

lower-bound)

  • MBE-bel-max(i,m), MBE-bel-min(i,m) generating upper and lower-bound on beliefs approximates

BE-bel

  • MBE-map(i,m): max buckets will be maximized, sum buckets will be sum-max. Approximates BE-

map.

) ( max ) ( ) ( ) ( ) ( ) ( ) ( ) ( X g x f x g x f x g x f x g x f

X X X X X X

   

slide-24
SLIDE 24
slide-25
SLIDE 25

Normalization

  • MBE-bel computes upper/lower bound on the joint marginal distributions.

26

We sometime use normalization of the approximation, but then no guarantee. The problem is that we have to approximate also P(e).

slide-26
SLIDE 26

27

Empirical Evaluation

(Dechter and Rish, 1997; Rish thesis, 1999)

  • Randomly generated networks
  • Uniform random probabilities
  • Random noisy-OR
  • CPCS networks
  • Probabilistic decoding

Comparing MBE-mpe and anytime-mpe versus BE-mpe

slide-27
SLIDE 27

28

Methodology for Empirical Evaluation (for mpe)

  • U/L –accuracy
  • Better (U/mpe) or mpe/L
  • Benchmarks: Random networks
  • Given n,e,v generate a random DAG
  • For xi and parents generate table from uniform [0,1], or noisy-or
  • Create k instances. For each, generate random evidence, likely evidence
  • Measure averages
slide-28
SLIDE 28

30

Anytime-mpe(0.0001) U/L error vs time

Time and parameter i

1 10 100 1000

Upper/Lower

0.6 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8

cpcs422b cpcs360b

i=1 i=21

CPCS Networks – Medical Diagnosis (noisy-OR model)

Test case: no evidence

505.2 70.3 anytime-mpe( ), 110.5 70.3 anytime-mpe( ), 1697.6 115.8 elim-mpe cpcs422 cpcs360 Algorithm Time (sec)

4

10   

1

10   

slide-29
SLIDE 29

Agenda

  • Mini-bucket elimination
  • Weighted Mini-bucket
  • Mini-clustering
  • Iterative Belief propagation
  • Iterative-join-graph propagation

32

slide-30
SLIDE 30

The Power Sum and Holder Inequality

slide-31
SLIDE 31

Working Example

  • Model:
  • Markov network
  • Task:
  • Partition function

A B C

(Qiang Liu slides)

slide-32
SLIDE 32

Mini-Bucket (Basic Principles)

  • Upper bound
  • Lower bound

(Qiang Liu slides)

slide-33
SLIDE 33

Holder Inequality

  • Where and
  • When , the equality is achieved.
  • G. H. Hardy, J. E. Littlewood and G. Pólya, Inequalities, Cambridge Univ. Press, London and

New York, 1934.

(Qiang Liu slides)

slide-34
SLIDE 34

Reverse Holder Inequality

  • If

, but the direction of the inequality reverses.

  • G. H. Hardy, J. E. Littlewood and G. Pólya, Inequalities, Cambridge Univ. Press, London and

New York, 1934.

(Qiang Liu slides)

slide-35
SLIDE 35
  • (mbe-bel-max)

(mbe-bel-min)

Mini-Bucket as Holder Inequality

(Qiang Liu slides)

slide-36
SLIDE 36
slide-37
SLIDE 37

Weighted Mini-Bucket

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) U = upper bound

𝑦 𝑥

ෑ 𝑔(𝑦)

𝜇𝐶 𝑏, 𝑑, 𝑒, 𝑓 = ෍

𝑐

𝑔 𝑏, 𝑐 ⋅ 𝑔 𝑐, 𝑑 ⋅ 𝑔 𝑐, 𝑒 ⋅ 𝑔 𝑐, 𝑓

Exact bucket elimination:

≤ ෍

𝑐 𝑥1

𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 ⋅ ෍

𝑐 𝑥2

𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓

= 𝜇𝐶→𝐷 (𝑏, 𝑑) ⋅ 𝜇𝐶→𝐸 (𝑒, 𝑓)

(mini-buckets)

where

𝑦 𝑥

𝑔 𝑦 = ෍

𝑦

𝑔 𝑦

1 𝑥 𝑥

is the weighted or “power” sum operator

𝑦 𝑥

𝑔

1 𝑦 𝑔 2 𝑦 ≤ ෍ 𝑦 𝑥1

𝑔

1 𝑦

𝑦 𝑥2

𝑔

2 𝑦

where

𝑥1 + 𝑥2 = 𝑥 𝑥1 > 0, 𝑥2 > 0

and

𝑥1 > 0, 𝑥2 < 0

(lower bound if ) [Liu and Ihler, 2011]

(for summation)

slide-38
SLIDE 38
  • (Cauchy–Schwarz inequality)

Choosing the weights

(Qiang Liu slides)

slide-39
SLIDE 39

Allocating the probabilities

Notice that

(Qiang Liu slides)

slide-40
SLIDE 40
  • weights: and

Extention of Mini-Bucket

  • Allocation of the probability:

f2(A)

A B C A

f1(A)

A B C

(Qiang Liu slides)

slide-41
SLIDE 41
slide-42
SLIDE 42

MBE-map

48

Process max buckets With max mini-buckets And sum buckets with sum Mini-bucket and max mini-buckets

slide-43
SLIDE 43

MB and WMB for Marginal MAP

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) U = upper bound

[Liu and Ihler, 2011; 2013] [Dechter and Rish, 2003]

Marginal MAP

Σ𝐶 Σ𝐷

maxA maxE maxD

𝜇𝐶→𝐷 𝑏, 𝑑 = ෍

𝑐 𝑥1

𝑔 𝑏, 𝑐 𝑔(𝑐, 𝑑) 𝜇𝐶→𝐸 𝑒, 𝑓 = ෍

𝑐 𝑥2

𝑔 𝑐, 𝑒 𝑔(𝑐, 𝑓) (𝑥1 + 𝑥2 = 1)

. . .

𝜇𝐹→𝐵 𝑏 = max

𝑓

𝜇𝐷→𝐹 𝑏, 𝑓 𝜇𝐸→𝐹(𝑏, 𝑓) 𝑉 = max

𝑏

𝑔 𝑏 𝜇𝐹→𝐵(𝑏)

Can optimize over cost-shifting and weights (single pass “MM” or iterative message passing)

maxYΣX\YΠjPj

slide-44
SLIDE 44

50

Probabilistic decoding

Error-correcting linear block code State-of-the-art: approximate algorithm – iterative belief propagation (IBP) (Pearl’s poly-tree algorithm applied to loopy networks)

slide-45
SLIDE 45

51

Initial partitioning

slide-46
SLIDE 46

52

slide-47
SLIDE 47

Complexity and Tractability of MBE(i,m)

53

slide-48
SLIDE 48

54

Iterative Belief Proapagation

  • Belief propagation is exact for poly-trees
  • IBP - applying BP iteratively to cyclic networks
  • No guarantees for convergence
  • Works well for many coding networks

) (

1

1 u

X

1

U

2

U

3

U

2

X

1

X

) (

1

2 x

U

 ) (

1

2 u

X

 ) (

1

3 x

U

) bel(U

  • CTE

: step One

1

slide-49
SLIDE 49

55

MBE-mpe vs. IBP

codes * w

  • low
  • n

better is mpe

  • mbe

c o d e s w * )

  • ( h i g h

g e n e r a te d r a n d o m l y

  • n

b e tte r i s IB P

Bit error rate (BER) as a function of noise (sigma):

slide-50
SLIDE 50

56

Mini-Buckets: Summary

Mini-buckets – local inference approximation

Idea: bound size of recorded functions

MBE-mpe(i) - mini-bucket algorithm for MPE

Better results for noisy-OR than for random problems

Accuracy increases with decreasing noise in coding

Accuracy increases for likely evidence

Sparser graphs -> higher accuracy

Coding networks: MBE-mpe outperforms IBP on low- induced width codes

slide-51
SLIDE 51

Agenda

  • Mini-bucket elimination
  • Mini-clustering
  • Iterative Belief propagation
  • Iterative-join-graph propagation

57

slide-52
SLIDE 52

58

Cluster Tree Elimination - properties

  • Correctness and completeness: Algorithm CTE is correct, i.e. it computes the

exact joint probability of a single variable and the evidence.

  • Time complexity:

O ( deg  (n+N)  d w*+1 )

  • Space complexity: O ( N  d sep)

where deg = the maximum degree of a node n = number of variables (= number of CPTs) N = number of nodes in the tree decomposition d = the maximum domain size of a variable w* = the induced width sep = the separator size

slide-53
SLIDE 53

ABC

2 4

) , | ( ) | ( ) ( ) , (

) 2 , 1 (

b a c p a b p a p c b h

a

   

1 3 BEF

EFG

) , ( ) , | ( ) | ( ) , (

) 2 , 3 ( , ) 1 , 2 (

f b h d c f p b d p c b h

f d

    ) , ( ) , | ( ) | ( ) , (

) 2 , 1 ( , ) 3 , 2 (

c b h d c f p b d p f b h

d c

    ) , ( ) , | ( ) , (

) 3 , 4 ( ) 2 , 3 (

f e h f b e p f b h

e

   ) , ( ) , | ( ) , (

) 3 , 2 ( ) 4 , 3 (

f b h f b e p f e h

b

   ) , | ( ) , (

) 3 , 4 (

f e g G p f e h

e

 

EF BF BC

BCDF

Join-Tree Clustering (Cluster-Tree Elimination)

G E F C D B A

Time and space: exp(cluster size)= exp(treewidth) EXACT algorithm

59

slide-54
SLIDE 54

Split a cluster into mini-clusters => bound complexity

) ( ) (

) var( ) var( r n r n

e O e O ) O(e

 

Mini-Clustering

         } ,..., , ,..., {

1 1 n r r

h h h h



 elim n i i

h

1

     } ,..., { 1

r

h h      } ,..., {

1 n r

h h 

                

 

   elim n r i i elim r i i

h h

1 1

Exponential complexity decrease

APPROXIMATE algorithm

60

slide-55
SLIDE 55

Mini-Clustering, i-bound=3

) , | ( ) | ( ) ( ) , (

1 ) 2 , 1 (

b a c p a b p a p c b h

a

   

A B C p(a), p(b|a), p(c|a,b) B C D p(d|b), h(1,2)(b,c) C D F p(f|c,d) B E F p(e|b,f), h1

(2,3)(b), h2 (2,3)(f)

E F G p(g|e,f)

2 4 1 3

EF BC BF

) , | ( max ) (

, 2 ) 3 , 2 (

d c f p f h

d c

) , ( ) | ( ) (

1 ) 2 , 1 ( , 1 ) 3 , 2 (

c b h b d p b h

d c

  

G E F C D B A

Time and space: exp(i-bound) APPROXIMATE algorithm

Number of variables in a mini-cluster

slide-56
SLIDE 56

63

EF BF BC ) , | ( ) | ( ) ( : ) , (

1 ) 2 , 1 (

b a c p a b p a p c b h

a

   

) 2 , 1 (

H

) , | ( max : ) ( ) , ( ) | ( : ) (

, 2 ) 1 , 2 ( 1 ) 2 , 3 ( , 1 ) 1 , 2 (

d c f p c h f b h b d p b h

f d f d

   

) 1 , 2 (

H

) , | ( max : ) ( ) , ( ) | ( : ) (

, 2 ) 3 , 2 ( 1 ) 2 , 1 ( , 1 ) 3 , 2 (

d c f p f h c b h b d p b h

d c d c

   

) 3 , 2 (

H

) , ( ) , | ( : ) , (

1 ) 3 , 4 ( 1 ) 2 , 3 (

f e h f b e p f b h

e

  

) 2 , 3 (

H

) ( ) ( ) , | ( : ) , (

2 ) 3 , 2 ( 1 ) 3 , 2 ( 1 ) 4 , 3 (

f h b h f b e p f e h

b

   

) 4 , 3 (

H

) , | ( : ) , (

1 ) 3 , 4 (

f e g G p f e h

e

 

) 3 , 4 (

H

ABC

2 4 1 3

BEF EFG BCDF

Mini-Clustering - example

slide-57
SLIDE 57

64

Cluster Tree Elimination vs. Mini-Clustering

ABC

2 4

) , (

) 2 , 1 (

c b h

1 3

BEF EFG

) , (

) 1 , 2 (

c b h ) , (

) 3 , 2 (

f b h ) , (

) 2 , 3 (

f b h ) , (

) 4 , 3 (

f e h ) , (

) 3 , 4 (

f e h

EF BF BC BCDF

) , (

1 ) 2 , 1 (

c b h

) ( ) (

2 ) 1 , 2 ( 1 ) 1 , 2 (

c h b h ) ( ) (

2 ) 3 , 2 ( 1 ) 3 , 2 (

f h b h

) , (

1 ) 2 , 3 (

f b h ) , (

1 ) 4 , 3 (

f e h ) , (

1 ) 3 , 4 (

f e h

) 2 , 1 (

H

) 1 , 2 (

H

) 3 , 2 (

H

) 2 , 3 (

H

) 4 , 3 (

H

) 3 , 4 (

H

ABC

2 4 1 3

BEF EFG EF BF BC BCDF

slide-58
SLIDE 58

Semantic of node duplication for mini-clustering

  • We can have a different duplication of nodes going up and down. Example: going down.

65

slide-59
SLIDE 59

67

We can replace max operator by

  • min =>

lower bound on the joint

  • mean

=> approximation of the joint

Lower Bounds and Mean Approximations

slide-60
SLIDE 60

69

Grid 15x15 - 10 evidence

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

2 4 6 8 10 12 14 16 18

NHD

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 MC IBP

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

2 4 6 8 10 12 14 16 18

Absolute error

0.00 0.01 0.02 0.03 0.04 0.05 0.06 MC IBP

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

2 4 6 8 10 12 14 16 18

Relative error

0.00 0.02 0.04 0.06 0.08 0.10 0.12 MC IBP

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

2 4 6 8 10 12 14 16 18

Time (seconds)

2 4 6 8 10 12 MC IBP

slide-61
SLIDE 61

70

CPCS422 - Absolute error

evidence=0 evidence=10

CPCS 422, evid=0, w*=23, 1 instance

i-bound

2 4 6 8 10 12 14 16 18

Absolute error

0.00 0.01 0.02 0.03 0.04 0.05 MC IBP

CPCS 422, evid=10, w*=23, 1 instance

i-bound

2 4 6 8 10 12 14 16 18

Absolute error

0.00 0.01 0.02 0.03 0.04 0.05 MC IBP

slide-62
SLIDE 62

71

Coding networks - Bit Error Rate

sigma=0.22 sigma=.51

Coding networks, N=100, P=4, sigma=.51, w*=12, 50 instances

i-bound

2 4 6 8 10 12

Bit Error Rate

0.06 0.08 0.10 0.12 0.14 0.16 0.18 MC IBP

Coding networks, N=100, P=4, sigma=.22, w*=12, 50 instances

i-bound

2 4 6 8 10 12

Bit Error Rate

0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 MC IBP

slide-63
SLIDE 63

72

CPCS422 - Absolute error

evidence=0 evidence=10

CPCS 422, evid=0, w*=23, 1 instance

i-bound

2 4 6 8 10 12 14 16 18

Absolute error

0.00 0.01 0.02 0.03 0.04 0.05 MC IBP

CPCS 422, evid=10, w*=23, 1 instance

i-bound

2 4 6 8 10 12 14 16 18

Absolute error

0.00 0.01 0.02 0.03 0.04 0.05 MC IBP

slide-64
SLIDE 64

73

Coding networks - Bit Error Rate

sigma=0.22 sigma=.51

Coding networks, N=100, P=4, sigma=.51, w*=12, 50 instances

i-bound

2 4 6 8 10 12

Bit Error Rate

0.06 0.08 0.10 0.12 0.14 0.16 0.18 MC IBP

Coding networks, N=100, P=4, sigma=.22, w*=12, 50 instances

i-bound

2 4 6 8 10 12

Bit Error Rate

0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 MC IBP

slide-65
SLIDE 65

Heuristic for Partitioning

75

Scope-based Partitioning Heuristic. The scope-based partition heuristic (SCP) aims at minimizing the number of mini-buckets in the partition by including in each minibucket as many functions as possible as long as the i bound is satisfied. First, single function mini-buckets are decreasingly ordered according to their

  • arity. Then, each minibucket is absorbed into the left-most mini-

bucket with whom it can be merged. The time and space complexity of Partition(B, i) , where B is the partitioned bucket, using the SCP heuristic is O(|B| log (|B|) + |B|^2) and O(exp(i)), respectively. The scope-based heuristic is is quite fast, its shortcoming is that it does not consider the actual information in the functions.

slide-66
SLIDE 66

Content-based heuristics

(Rollon and Dechter 2010)

76

Use greedy heuristic derived from a distance function to decide which functions go into a single mini-bucket

slide-67
SLIDE 67

Greedy Partition as a function of a distance function h

slide-68
SLIDE 68

Agenda

  • Mini-bucket elimination
  • Mini-clustering
  • Reparameterization, cost-shifting
  • Iterative Belief propagation
  • Iterative-join-graph propagation

78

slide-69
SLIDE 69

Cost-Shifting

(Reparameterization) A B f(A,B) b b 6 + 3 b g 0 – 1 g b 0 + 3 g g 6 – 1 B C f(B,C) b b 6 – 3 b g 0 – 3 g b 0 + 1 g g 6 + 1 A B C f(A,B,C) b b b 12 b b g 6 b g b b g g 6 g b b 6 g b g g g b 6 g g g 12 B λ(B) b 3 g

  • 1

+ 𝜇(𝐶) − 𝜇(𝐶) Modify the individual functions

  • but –

keep the sum of functions the same = 0 + 6

slide-70
SLIDE 70

Dual Decomposition

𝑦1 𝑦3 𝑦2

𝑔

13(𝑦1, 𝑦3)

𝑔

23(𝑦2, 𝑦3)

𝑔

12(𝑦1, 𝑦2)

𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2

𝑔

12(∙)

𝑔

13(∙)

𝑔

23(∙)

𝐺∗ = min

𝑦 ෍ 𝛽

𝑔

𝛽(𝑦)

≥ ෍

𝛽

min

𝑦

𝑔

𝛽(𝑦)

  • Bound solution using decomposed optimization
  • Solve independently: optimistic bound
slide-71
SLIDE 71

Dual Decomposition

𝑦1 𝑦3 𝑦2

𝑔

13(𝑦1, 𝑦3)

𝑔

23(𝑦2, 𝑦3)

𝑔

12(𝑦1, 𝑦2)

𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2

𝑔

12(∙)

𝑔

13(∙)

𝑔

23(∙)

𝐺∗ = min

𝑦 ෍ 𝛽

𝑔

𝛽(𝑦)

≥ ෍

𝛽

min

𝑦

𝑔

𝛽(𝑦)

  • Bound solution using decomposed optimization
  • Solve independently: optimistic bound
  • Tighten the bound by reparameterization

‒ Enforce lost equality constraints via Lagrange multipliers 𝜇1→13(𝑦1) 𝜇1→12(𝑦1) 𝜇3→13(𝑦3) 𝜇3→23(𝑦3) 𝜇2→13(𝑦2) 𝜇2→23(𝑦2)

∀𝑘 ∶ ෍

𝛽∋𝑘

𝜇𝑘→𝛽 𝑦𝑘 = 0 Reparameterization:

max

𝜇𝑗→𝛽

+ ෍

𝑗∈𝛽

𝜇𝑗→𝛽 𝑦𝑗

slide-72
SLIDE 72

Dual Decomposition

𝑦1 𝑦3 𝑦2

𝑔

13(𝑦1, 𝑦3)

𝑔

23(𝑦2, 𝑦3)

𝑔

12(𝑦1, 𝑦2)

𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2

𝑔

12(∙)

𝑔

13(∙)

𝑔

23(∙)

𝐺∗ = min

𝑦 ෍ 𝛽

𝑔

𝛽(𝑦)

≥ ෍

𝛽

min

𝑦

𝑔

𝛽(𝑦)

Many names for the same class of bounds:

‒ Dual decomposition [Komodakis et al. 2007] ‒ TRW, MPLP [Wainwright et al. 2005; Globerson & Jaakkola, 2007] ‒ Soft arc consistency [Cooper & Schiex, 2004] ‒ Max-sum diffusion [Warner 2007] 𝜇1→13(𝑦1) 𝜇1→12(𝑦1) 𝜇3→13(𝑦3) 𝜇3→23(𝑦3) 𝜇2→13(𝑦2) 𝜇2→23(𝑦2)

∀𝑘 ∶ ෍

𝛽∋𝑘

𝜇𝑘→𝛽 𝑦𝑘 = 0 Reparameterization:

max

𝜇𝑗→𝛽

+ ෍

𝑗∈𝛽

𝜇𝑗→𝛽 𝑦𝑗

slide-73
SLIDE 73

Dual Decomposition

ECAI 2016

𝑦1 𝑦3 𝑦2

𝑔

13(𝑦1, 𝑦3)

𝑔

23(𝑦2, 𝑦3)

𝑔

12(𝑦1, 𝑦2)

𝑦1 𝑦3 𝑦3 𝑦1 𝑦2 𝑦2

𝑔

12(∙)

𝑔

13(∙)

𝑔

23(∙)

𝐺∗ = min

𝑦 ෍ 𝛽

𝑔

𝛽(𝑦)

≥ ෍

𝛽

min

𝑦

𝑔

𝛽(𝑦)

Many ways to optimize the bound:

‒ Sub-gradient descent [Komodakis et al. 2007; Jojic et al. 2010] ‒ Coordinate descent [Warner 2007; Globerson & Jaakkola 2007; Sontag et al. 2009; Ihler et al. 2012] ‒ Proximal optimization [Ravikumar et al, 2010] ‒ ADMM [Meshi & Globerson 2011; Martins et al. 2011; Forouzan & Ihler 2013] 𝜇1→13(𝑦1) 𝜇1→12(𝑦1) 𝜇3→13(𝑦3) 𝜇3→23(𝑦3) 𝜇2→13(𝑦2) 𝜇2→23(𝑦2)

∀𝑘 ∶ ෍

𝛽∋𝑘

𝜇𝑘→𝛽 𝑦𝑘 = 0 Reparameterization:

max

𝜇𝑗→𝛽

+ ෍

𝑗∈𝛽

𝜇𝑗→𝛽 𝑦𝑗

slide-74
SLIDE 74

Mini-Bucket as Dual Decomposition

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound

slide-75
SLIDE 75

Mini-Bucket as Dual Decomposition

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound

min

𝑏,𝑑,𝑐 𝑔 𝑏, 𝑐 + 𝑔 𝑐, 𝑑 − 𝜇𝐶→𝐷 𝑏, 𝑑

= 0 min

𝑒,𝑓,𝑐 𝑔 𝑐, 𝑒 + 𝑔 𝑐, 𝑓 − 𝜇𝐶→𝐸 𝑒, 𝑓

= 0

slide-76
SLIDE 76

Mini-Bucket as Dual Decomposition

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound

min

𝑏,𝑑,𝑐 𝑔 𝑏, 𝑐 + 𝑔 𝑐, 𝑑 − 𝜇𝐶→𝐷 𝑏, 𝑑

= 0 min

𝑒,𝑓,𝑐 𝑔 𝑐, 𝑒 + 𝑔 𝑐, 𝑓 − 𝜇𝐶→𝐸 𝑒, 𝑓

= 0 min

𝑏,𝑓,𝑑[𝜇𝐶→𝐷 𝑏, 𝑑 + 𝑔 𝑏, 𝑑 + 𝑔(𝑑, 𝑓)

− 𝜇𝐷→𝐹 𝑏, 𝑓 ] = 0

slide-77
SLIDE 77

Mini-Bucket as Dual Decomposition

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound

min

𝑏,𝑑,𝑐 𝑔 𝑏, 𝑐 + 𝑔 𝑐, 𝑑 − 𝜇𝐶→𝐷 𝑏, 𝑑

= 0 min

𝑒,𝑓,𝑐 𝑔 𝑐, 𝑒 + 𝑔 𝑐, 𝑓 − 𝜇𝐶→𝐸 𝑒, 𝑓

= 0 min

𝑏,𝑓,𝑑[𝜇𝐶→𝐷 𝑏, 𝑑 + 𝑔 𝑏, 𝑑 + 𝑔(𝑑, 𝑓)

− 𝜇𝐷→𝐹 𝑏, 𝑓 ] = 0 min

𝑏,𝑒 [𝑔 𝑏, 𝑒 + 𝜇𝐶→𝐸 𝑒, 𝑓

− 𝜇𝐸→𝐹 𝑏, 𝑓 ] = 0

slide-78
SLIDE 78

Mini-Bucket as Dual Decomposition

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound

min

𝑏,𝑑,𝑐 𝑔 𝑏, 𝑐 + 𝑔 𝑐, 𝑑 − 𝜇𝐶→𝐷 𝑏, 𝑑

= 0 min

𝑒,𝑓,𝑐 𝑔 𝑐, 𝑒 + 𝑔 𝑐, 𝑓 − 𝜇𝐶→𝐸 𝑒, 𝑓

= 0 min

𝑏,𝑓,𝑑[𝜇𝐶→𝐷 𝑏, 𝑑 + 𝑔 𝑏, 𝑑 + 𝑔(𝑑, 𝑓)

− 𝜇𝐷→𝐹 𝑏, 𝑓 ] = 0 min

𝑏,𝑒 [𝑔 𝑏, 𝑒 + 𝜇𝐶→𝐸 𝑒, 𝑓

− 𝜇𝐸→𝐹 𝑏, 𝑓 ] = 0 min

𝑏,𝑓 [𝜇𝐷→𝐹 𝑏, 𝑓 + 𝜇𝐸→𝐹(𝑏, 𝑓)

− 𝜇𝐹→𝐵 𝑏 ] = 0

slide-79
SLIDE 79

Mini-Bucket as Dual Decomposition

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) L = lower bound

min

𝑏,𝑑,𝑐 𝑔 𝑏, 𝑐 + 𝑔 𝑐, 𝑑 − 𝜇𝐶→𝐷 𝑏, 𝑑

= 0 min

𝑒,𝑓,𝑐 𝑔 𝑐, 𝑒 + 𝑔 𝑐, 𝑓 − 𝜇𝐶→𝐸 𝑒, 𝑓

= 0 min

𝑏,𝑓,𝑑[𝜇𝐶→𝐷 𝑏, 𝑑 + 𝑔 𝑏, 𝑑 + 𝑔(𝑑, 𝑓)

− 𝜇𝐷→𝐹 𝑏, 𝑓 ] = 0 min

𝑏,𝑒 [𝑔 𝑏, 𝑒 + 𝜇𝐶→𝐸 𝑒, 𝑓

− 𝜇𝐸→𝐹 𝑏, 𝑓 ] = 0 min

𝑏,𝑓 [𝜇𝐷→𝐹 𝑏, 𝑓 + 𝜇𝐸→𝐹(𝑏, 𝑓)

− 𝜇𝐹→𝐵 𝑏 ] = 0 min

𝑏

𝑔 𝑏 + 𝜇𝐹→𝐵 𝑏 = 𝑀

slide-80
SLIDE 80

Various Update Schemes

  • Can use any decomposition updates
  • (message passing, subgradient, augmented, etc.)
  • FGLP: Update the original factors
  • JGLP: Update clique function of the join graph
  • MBE-MM: Mini-bucket with moment matching
  • Apply cost-shifting within each bucket only

UTA 1/2015

slide-81
SLIDE 81

FGLP (Factor Graph Linear Programming)

slide-82
SLIDE 82

Factor graph Linear Programming

  • Update the original factors (FGLP)
  • Tighten all factors over over xi simultaneously
  • Compute max-marginals
  • & update:

UTA 1/2015

slide-83
SLIDE 83
slide-84
SLIDE 84

Factor Graph Linear Programming

slide-85
SLIDE 85

Complexity of FGLP

slide-86
SLIDE 86

Mini-Bucket as Dual Decomposition

  • Downward pass as cost-shifting
  • Can also do cost-shifting within

mini-buckets

  • “Join graph” message passing
  • “Moment matching” version: one

message update within each bucket during downward sweep

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: {𝑏, 𝑐, 𝑑} {𝑐, 𝑒, 𝑓} {𝑏, 𝑑, 𝑓} {𝑏, 𝑒, 𝑓} {𝑏}

Join graph:

L = lower bound {𝑏, 𝑓}

slide-87
SLIDE 87

MBE-MM: MBE with moment matching

A B C D E

P(A) P(B|A) P(C|A) P(E|B,C) P(D|A,B)

Bucket B Bucket C Bucket D Bucket E Bucket A

P(B|A) P(D|A,B) P(E|B,C) P(C|A) E = 0 P(A) maxB∏ hB (A,D) MPE* is an upper bound on MPE --U Generating a solution yields a lower bound--L maxB∏ hD (A) hC (A,E) hB (C,E) hE (A)

W=2 m11 m12 m11,m12- moment-matching messages

ECAI 2016

slide-88
SLIDE 88

MBE-MM (MBE with Moment-Matching)

slide-89
SLIDE 89

Anytime Approximation

ECAI 2016

[Dechter and Rish, 2003]

slide-90
SLIDE 90

Anytime Approximation

  • Can tighten the bound in various ways
  • Cost-shifting (improve consistency between cliques)
  • Increase i-bound (higher order consistency)
  • Simple moment-matching step improves bound significantly

ECAI 2016

slide-91
SLIDE 91

Anytime Approximation

  • Can tighten the bound in various ways
  • Cost-shifting (improve consistency between cliques)
  • Increase i-bound (higher order consistency)
  • Simple moment-matching step improves bound significantly

ECAI 2016

slide-92
SLIDE 92

Anytime Approximation

  • Can tighten the bound in various ways
  • Cost-shifting (improve consistency between cliques)
  • Increase i-bound (higher order consistency)
  • Simple moment-matching step improves bound significantly
slide-93
SLIDE 93

Weighted Mini-Bucket

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) U = upper bound

𝑦 𝑥

ෑ 𝑔(𝑦)

𝜇𝐶 𝑏, 𝑑, 𝑒, 𝑓 = ෍

𝑐

𝑔 𝑏, 𝑐 ⋅ 𝑔 𝑐, 𝑑 ⋅ 𝑔 𝑐, 𝑒 ⋅ 𝑔 𝑐, 𝑓

Exact bucket elimination:

≤ ෍

𝑐 𝑥1

𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 ⋅ ෍

𝑐 𝑥2

𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓

= 𝜇𝐶→𝐷 (𝑏, 𝑑) ⋅ 𝜇𝐶→𝐸 (𝑒, 𝑓)

(mini-buckets)

where

𝑦 𝑥

𝑔 𝑦 = ෍

𝑦

𝑔 𝑦

1 𝑥 𝑥

is the weighted or “power” sum operator

𝑦 𝑥

𝑔

1 𝑦 𝑔 2 𝑦 ≤ ෍ 𝑦 𝑥1

𝑔

1 𝑦

𝑦 𝑥2

𝑔

2 𝑦

where

𝑥1 + 𝑥2 = 𝑥 𝑥1 > 0, 𝑥2 > 0

and

𝑥1 > 0, 𝑥2 < 0

(lower bound if ) [Liu and Ihler, 2011]

(for summation)

slide-94
SLIDE 94

Weighted Mini-Bucket

  • Related to conditional entropy

decomposition but, with an efficient “primal” bound form

  • Can optimize the bound over:
  • Cost-shifting
  • Weights
  • Again, involves message passing
  • ver JG
  • Similar, one-pass “moment-

matching” variant

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: {𝑏, 𝑐, 𝑑} {𝑐, 𝑒, 𝑓} {𝑏, 𝑑, 𝑓} {𝑏, 𝑒, 𝑓} {𝑏}

Join graph:

U = upper bound {𝑏, 𝑓}

𝑥1 𝑥2 [Liu and Ihler, 2011]

slide-95
SLIDE 95

MB and WMB for Marginal MAP

ECAI 2016

bucket E: bucket C: bucket D: bucket B: bucket A: 𝑔 𝑏, 𝑐 𝑔 𝑐, 𝑑 𝑔 𝑐, 𝑒 𝑔 𝑐, 𝑓 𝑔 𝑏, 𝑑 𝑔 𝑑, 𝑓 𝑔 𝑏, 𝑒 𝑔 𝑏

mini-buckets

𝜇𝐶→𝐷(𝑏, 𝑑) 𝜇𝐶→𝐸(𝑒, 𝑓) 𝜇𝐷→𝐹(𝑏, 𝑓) 𝜇𝐸→𝐹(𝑏, 𝑓) 𝜇𝐹→𝐵(𝑏) U = upper bound

[Liu and Ihler, 2011; 2013] [Dechter and Rish, 2003]

Marginal MAP

Σ𝐶 Σ𝐷

maxA maxE maxD

𝜇𝐶→𝐷 𝑏, 𝑑 = ෍

𝑐 𝑥1

𝑔 𝑏, 𝑐 𝑔(𝑐, 𝑑) 𝜇𝐶→𝐸 𝑒, 𝑓 = ෍

𝑐 𝑥2

𝑔 𝑐, 𝑒 𝑔(𝑐, 𝑓) (𝑥1 + 𝑥2 = 1)

. . .

𝜇𝐹→𝐵 𝑏 = max

𝑓

𝜇𝐷→𝐹 𝑏, 𝑓 𝜇𝐸→𝐹(𝑏, 𝑓) 𝑉 = max

𝑏

𝑔 𝑏 𝜇𝐹→𝐵(𝑏)

Can optimize over cost-shifting and weights (single pass “MM” or iterative message passing)

maxYΣX\YΠjPj