Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh - - PowerPoint PPT Presentation

graphical models graphical models
SMART_READER_LITE
LIVE PREVIEW

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh - - PowerPoint PPT Presentation

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing


slide-1
SLIDE 1

Graphical Models Graphical Models

MAP inference

Siamak Ravanbakhsh Winter 2018

slide-2
SLIDE 2

Learning objectives Learning objectives

MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing relationship to LP relaxation graph-cuts for MAP inference

slide-3
SLIDE 3

Definition & complexity Definition & complexity

given Bayes-net, deciding whether for some is NP-complete! p(x) > c

x

MAP

decision problem

arg max p(x)

x

side-chain prediction as MAP inference

(Yanover & Weiss)

slide-4
SLIDE 4

Definition & complexity Definition & complexity

given Bayes-net, deciding whether for some is NP-complete! p(x) > c

x

MAP Marginal MAP given Bayes-net for , deciding whether for some is complete for p(x) > c

x

p(x, y)

decision problem

arg max p(x)

x decision problem

arg max p(x, y)

x ∑y

NP PP

a non-deterministic Turing machine that accepts if the majority of paths accept a non-deterministic Turing machine that accepts if a single path accepts (with access to a PP oracle)

is NP-hard even for trees cannot use the distributive law

side-chain prediction as MAP inference

(Yanover & Weiss)

slide-5
SLIDE 5

Problem & terminology Problem & terminology

MAP inference: arg max p(x) = arg max ϕ (x )

x x Z 1 ∏I I I

≡ arg max (x) = arg max ϕ (x )

x p

~

x ∏I I I

ignore the normalization constant

aka max-product inference

slide-6
SLIDE 6

Problem & terminology Problem & terminology

MAP inference: arg max p(x) = arg max ϕ (x )

x x Z 1 ∏I I I

≡ arg max (x) = arg max ϕ (x )

x p

~

x ∏I I I

ignore the normalization constant

with evidence: arg max p(x ∣ e) = arg max ≡ arg max p(x, e)

x x p(e) p(x,e) x

aka max-product inference

slide-7
SLIDE 7

Problem & terminology Problem & terminology

MAP inference: arg max p(x) = arg max ϕ (x )

x x Z 1 ∏I I I

≡ arg max (x) = arg max ϕ (x )

x p

~

x ∏I I I

ignore the normalization constant

with evidence: arg max p(x ∣ e) = arg max ≡ arg max p(x, e)

x x p(e) p(x,e) x

aka max-product inference

log domain: arg max p(x) ≡ arg max ln ϕ (x ) ≡ arg min − ln (x)

x x ∑I I I x

p ~

aka max-sum inference aka min-sum inference (energy minimization)

slide-8
SLIDE 8

Max-marginals Max-marginals

marginal

ϕ(x, y) ∑x∈V al(x)

is replaced with max-marginal max

ϕ(x, y)

x∈V al(x)

used in sum-product inference

ϕ(a, b, c) ϕ (a, c) = max ϕ(a, b, c)

′ b

slide-9
SLIDE 9

distributive law distributive law for MAP inference for MAP inference

max(ab, ac) = a max(b, c)

3 operations 2 operations

max(min(a, b), min(a, c)) = max(a, min(b, c)) ab + ac = a(b + c) max(a + b, a + c) = a + max(b, c)

sum-product inference min-max inference max-sum inference max-product inference

slide-10
SLIDE 10

distributive law distributive law for MAP inference for MAP inference

max(ab, ac) = a max(b, c)

3 operations 2 operations

save computation by factoring the operations in disguise assuming complexity: from to

∣V al(X)∣ = ∣V al(Y )∣ = ∣V al(Z)∣ = d

O(d )

3

O(d )

2

max(min(a, b), min(a, c)) = max(a, min(b, c)) ab + ac = a(b + c) max(a + b, a + c) = a + max(b, c) max f(x, y)g(y, z) = max g(y, z) max f(x, y)

x,y y x

sum-product inference min-max inference max-sum inference max-product inference

slide-11
SLIDE 11

Max-product Max-product variable elimination variable elimination

input: a set of factors (e.g. CPDs)

  • utput:

go over in some order: collect all the relevant factors: calculate their product: max-marginalize out : update the set of factors: return the scalar in as

x , … , x

i1 in

Φ = {ϕ , … , ϕ }

t=0 1 K

Ψ = {ϕ ∈ Φ ∣ x ∈ Scope[ϕ]}

t t it

ψ = ϕ

t

∏ϕ∈Ψt

xit ψ = max

ψ

t ′ xit t

max (x) = max ϕ (x )

x p

~

x ∏I I I

Φ = Φ − Ψ + {ψ }

t t−1 t t ′

Φt=m

the procedure is similar to VE for sum-product inference eliminate all the variables

max (x)

x p

~

maximizing value

Z = (x) ∑x p ~

similar to the partition function:

slide-12
SLIDE 12

Decoding Decoding the max-value the max-value

input: a set of factors (e.g. CPDs)

  • utput:

go over in some order: collect all the relevant factors: calculate their product: max-marginalize out : update the set of factors: return the scalar in as

x , … , x

i1 in

Φ = {ϕ , … , ϕ }

t=0 1 K

Ψ = {ϕ ∈ Φ ∣ x ∈ Scope[ϕ]}

t t it

ψ = ϕ

t

∏ϕ∈Ψt

xit ψ = max

ψ

t ′ xit t

max (x) = max ϕ (x )

x p

~

x ∏I I I

Φ = Φ − Ψ + {ψ }

t t−1 t t ′

Φt=m

keep , produced during inference

max (x)

x p

~

{ψ , … , ψ }

t=1 t=n

we need to recover the maximizing assignment x∗

slide-13
SLIDE 13

Decoding Decoding the max-value the max-value

input: a set of factors (e.g. CPDs)

  • utput:

go over in some order: collect all the relevant factors: calculate their product: max-marginalize out : update the set of factors: return the scalar in as

x , … , x

i1 in

Φ = {ϕ , … , ϕ }

t=0 1 K

Ψ = {ϕ ∈ Φ ∣ x ∈ Scope[ϕ]}

t t it

ψ = ϕ

t

∏ϕ∈Ψt

xit ψ = max

ψ

t ′ xit t

max (x) = max ϕ (x )

x p

~

x ∏I I I

Φ = Φ − Ψ + {ψ }

t t−1 t t ′

Φt=m

start from the last eliminated variable

max (x)

x p

~

should have been a function of alone:

ψt=n

xin

x ← arg max ψ

in ∗ n

slide-14
SLIDE 14

Decoding Decoding the max-value the max-value

input: a set of factors (e.g. CPDs)

  • utput:

go over in some order: collect all the relevant factors: calculate their product: max-marginalize out : update the set of factors: return the product of scalars in as

x , … , x

i1 in

Φ = {ϕ , … , ϕ }

t=0 1 K

Ψ = {ϕ ∈ Φ ∣ x ∈ Scope[ϕ]}

t t it

ψ = ϕ

t

∏ϕ∈Ψt

xit ψ = max

ψ

t ′ xit t

max (x) = max ϕ (x )

x p

~

x ∏I I I

Φ = Φ − Ψ + {ψ }

t t−1 t t ′

Φt=m

start from the last eliminated variable at this point we have can only have in its domain

max (x)

x p

~ ψt=n−1

x ← arg max ψ (x , x )

in−1 ∗ xin−1 n−1 in−1 in

xin

x , x

in−1 in

and so on...

slide-15
SLIDE 15

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum in do not commute

max ϕ(x, y) ≠ max ϕ(x, y)

x ∑y

∑y

x

max ϕ (x )

y ,…,y

1 m ∑x ,…,x 1 n ∏I

I I

slide-16
SLIDE 16

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum in do not commute

max ϕ(x, y) ≠ max ϕ(x, y)

x ∑y

∑y

x

max ϕ (x )

y ,…,y

1 m ∑x ,…,x 1 n ∏I

I I

cannot use arbitrary elimination order

slide-17
SLIDE 17

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum in do not commute

max ϕ(x, y) ≠ max ϕ(x, y)

x ∑y

∑y

x

max ϕ (x )

y ,…,y

1 m ∑x ,…,x 1 n ∏I

I I

cannot use arbitrary elimination order first, eliminate (sum-prod VE) {x , … , x }

1 n

slide-18
SLIDE 18

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum in do not commute

max ϕ(x, y) ≠ max ϕ(x, y)

x ∑y

∑y

x

max ϕ (x )

y ,…,y

1 m ∑x ,…,x 1 n ∏I

I I

cannot use arbitrary elimination order first, eliminate (sum-prod VE) then eliminate (max-prod VE) decode the maximizing value {x , … , x }

1 n

{y , … , y }

1 m

slide-19
SLIDE 19

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum in do not commute

max ϕ(x, y) ≠ max ϕ(x, y)

x ∑y

∑y

x

max ϕ (x )

y ,…,y

1 m ∑x ,…,x 1 n ∏I

I I

cannot use arbitrary elimination order first, eliminate (sum-prod VE) then eliminate (max-prod VE) decode the maximizing value {x , … , x }

1 n

{y , … , y }

1 m

example: exponential complexity despite low tree-width

slide-20
SLIDE 20

Max-product BP Max-product BP

In clique-trees, cluster-graphs, factor-graph building the chordal graph building the clique-tree tree-width (complexity of inference) ... remains the same!

slide-21
SLIDE 21

Max-product BP Max-product BP

In clique-trees, cluster-graphs, factor-graph building the chordal graph building the clique-tree tree-width (complexity of inference) ... remains the same! main differences: replacing sum with max decoding the maximizing assignment variational interpretation

slide-22
SLIDE 22

Max-product BP Max-product BP

x1 x2 x3 x4 x5 ψ{1,2,4} ψ{3,5} p(x) = ψ (x )

Z 1 ∏I I I

Example factor-graph

slide-23
SLIDE 23

Max-product BP Max-product BP

δ (x ) ∝ δ (x )

i→I i

∏J∣i∈J,J≠I

J→i i

variable-to-factor message:

x1 x2 x3 x4 x5 ψ{1,2,4} ψ{3,5} p(x) = ψ (x )

Z 1 ∏I I I

Example factor-graph

slide-24
SLIDE 24

Max-product BP Max-product BP

δ (x ) ∝ δ (x )

i→I i

∏J∣i∈J,J≠I

J→i i

variable-to-factor message:

x1 x2 x3 x4 x5 ψ{1,2,4} ψ{3,5} p(x) = ψ (x )

Z 1 ∏I I I

factor-to-variable message: δ

(x ) ∝ max ψ (x ) δ (x )

I→i i xI−i I I ∏j∈I−i j→I i

Example factor-graph

slide-25
SLIDE 25

Max-product BP Max-product BP

δ (x ) ∝ δ (x )

i→I i

∏J∣i∈J,J≠I

J→i i

variable-to-factor message:

x1 x2 x3 x4 x5 ψ{1,2,4} ψ{3,5} p(x) = ψ (x )

Z 1 ∏I I I

factor-to-variable message: δ

(x ) ∝ max ψ (x ) δ (x )

I→i i xI−i I I ∏j∈I−i j→I i

Example factor-graph

  • approx. max-marginals:

β(x ) ∝ δ (x )

i

∏J∣i∈J

J→i i

slide-26
SLIDE 26

Max-product BP Max-product BP

δ (x ) ∝ δ (x )

i→I i

∏J∣i∈J,J≠I

J→i i

variable-to-factor message:

x1 x2 x3 x4 x5 ψ{1,2,4} ψ{3,5} p(x) = ψ (x )

Z 1 ∏I I I

factor-to-variable message: δ

(x ) ∝ max ψ (x ) δ (x )

I→i i xI−i I I ∏j∈I−i j→I i

Example factor-graph

  • approx. max-marginals:

β(x ) ∝ δ (x )

i

∏J∣i∈J

J→i i

use damping for convergence in loopy graphs

slide-27
SLIDE 27

Decoding Decoding exact exact max-marginals max-marginals

x = arg max β(x )

i ∗ xi i

Single MAP assignment

clique-trees &factor-graphs without any loops

MAP assignment is unique

x = arg max p(x)

∗ x

max-marginals are unambiguous

slide-28
SLIDE 28

Decoding Decoding exact exact max-marginals max-marginals

x = arg max β(x )

i ∗ xi i

Single MAP assignment

clique-trees &factor-graphs without any loops

MAP assignment is unique

x = arg max p(x)

∗ x

max-marginals are unambiguous Multiple MAP assignments

p(x , x ) = I(x = x )

1 2 2 1 1 2

example

β(x = 0) = β(x = 1)

1 1

β(x = 0) = β(x = 1)

2 2

slide-29
SLIDE 29

Decoding Decoding exact exact max-marginals max-marginals

x = arg max β(x )

i ∗ xi i

Single MAP assignment

clique-trees &factor-graphs without any loops

MAP assignment is unique

x = arg max p(x)

∗ x

max-marginals are unambiguous Multiple MAP assignments

p(x , x ) = I(x = x )

1 2 2 1 1 2

example

β(x = 0) = β(x = 1)

1 1

β(x = 0) = β(x = 1)

2 2

⇒ a join assignment exists

that is locally optimal x∗

β(x ) = max β(x )∀i

i ∗ xi i

β(x ) = max β(x )∀I

I ∗ xI I

easy to find (how?)

slide-30
SLIDE 30

Decoding Decoding pseudo pseudo max-marginals max-marginals

best local assignments may be incompatible

cluster-graphs, loopy factor-graphs

example

a

b

c

b=0 b=1 a=0 1 2 a=1 2 1

β(a, b)

b=0 b=1 c=0 1 2 c=1 2 1

β(b, c)

a=0 a=1 c=0 1 2 c=1 2 1

β(a, c)

however, if have unique max., a unique locally optimal belief exists

m(a), m(b), m(c)

slide-31
SLIDE 31

Decoding Decoding pseudo pseudo max-marginals max-marginals

best local assignments may be incompatible

cluster-graphs, loopy factor-graphs

example

a

b

c

b=0 b=1 a=0 1 2 a=1 2 1

β(a, b)

b=0 b=1 c=0 1 2 c=1 2 1

β(b, c)

a=0 a=1 c=0 1 2 c=1 2 1

β(a, c)

however, if have unique max., a unique locally optimal belief exists

m(a), m(b), m(c)

b

c

b=0 b=1 a=0 3 2 a=1 2 3

β(a, b)

b=0 b=1 c=0 3 2 c=1 2 3

β(b, c)

a=0 a=1 c=0 3 2 c=1 2 3

β(a, c)

a

example

β(a), β(b), β(c) do not have a unique max., but a locally optimal assignment (a=b=c=0) exists

slide-32
SLIDE 32

Decoding Decoding pseudo pseudo max-marginals max-marginals

best local assignments may be incompatible

cluster-graphs, loopy factor-graphs

example

a

b

c

b=0 b=1 a=0 1 2 a=1 2 1

β(a, b)

b=0 b=1 c=0 1 2 c=1 2 1

β(b, c)

a=0 a=1 c=0 1 2 c=1 2 1

β(a, c)

however, if have unique max., a unique locally optimal belief exists

m(a), m(b), m(c)

b

c

b=0 b=1 a=0 3 2 a=1 2 3

β(a, b)

b=0 b=1 c=0 3 2 c=1 2 3

β(b, c)

a=0 a=1 c=0 3 2 c=1 2 3

β(a, c)

a

example

β(a), β(b), β(c) do not have a unique max., but a locally optimal assignment (a=b=c=0) exists

so, it's complicated!

slide-33
SLIDE 33

Decoding Decoding pseudo pseudo max-marginals max-marginals

given a set of cluster max-marginals how to find locally optimal (optimal in all ) if it exists

cluster-graphs, loopy factor-graphs

{m (x )}

I I I

x ^∗

mI reduce to a Constraint Satisfaction Problem use decimation: run inference fix a subset of variables repeat until all vars are fixed

= arg max m (x ) x ^I

∗ xI I I

slide-34
SLIDE 34

Optimality Optimality of max-product

  • f max-product loopy

loopy BP BP

a locally optimal assignment is a strong local maxima of

m( ) = max m(x )∀i x ^i

∗ xi i

m( ) = max m(x )∀I x ^I

∗ xI I

p(x)

no better assignment exists in a large neighborhood of

x ^∗ x ^∗

pick any subset of variables build the maximal subgraph s.t. each factor has a variable in T if this subgraph does not have more than one loop then cannot be improved by changing the vars in

T ⊆ {1, … , n}

GT

p( ) x ^∗

T

slide-35
SLIDE 35

Using Using integer integer and linear programming and linear programming

pairwise case

ln (x) = ln ϕ (x , x ) p ~ ∑i,j

i,j i j looking for an assignment to maximize this sum

x∗

slide-36
SLIDE 36

Using Using integer integer and linear programming and linear programming

pairwise case

q (x , x ) ∈ {0, 1} ∀i, j ∈ E, x , x

i,j i j i j

arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j

ln (x) = ln ϕ (x , x ) p ~ ∑i,j

i,j i j looking for an assignment to maximize this sum

integer-programming formulation:

picks a single assignment for vars in each factor

x∗

slide-37
SLIDE 37

Using Using integer integer and linear programming and linear programming

pairwise case

q (x , x ) = q (x ) ∀i, j ∈ E, x ∑xi

i,j i j j j j

q (x , x ) ∈ {0, 1} ∀i, j ∈ E, x , x

i,j i j i j

q (x ) = 1 ∀i ∑xi

i i

arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j

ln (x) = ln ϕ (x , x ) p ~ ∑i,j

i,j i j looking for an assignment to maximize this sum

integer-programming formulation:

picks a single assignment for vars in each factor ensure that assignments to different factors are consistent

x∗

slide-38
SLIDE 38

Using Using integer integer and linear programming and linear programming

pairwise case

q (x , x ) = q (x ) ∀i, j ∈ E, x ∑xi

i,j i j j j j

q (x , x ) ∈ {0, 1} ∀i, j ∈ E, x , x

i,j i j i j

q (x ) = 1 ∀i ∑xi

i i

arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j

ln (x) = ln ϕ (x , x ) p ~ ∑i,j

i,j i j looking for an assignment to maximize this sum

integer-programming formulation:

picks a single assignment for vars in each factor ensure that assignments to different factors are consistent

x∗

solution to this NP-hard program is the MAP assignment

slide-39
SLIDE 39

Using integer and Using integer and linear programming linear programming

pairwise case

q (x , x ) = q (x ) ∀i, j ∈ E, x ∑xi

i,j i j j j j

q (x , x ) ∈ {0, 1}

i,j i j

q (x ) = 1 ∀i ∑xi

i i

arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j

linear programming has a polynomial-time solution

ensure that assignments to different factors are consistent

{q }

i,j

slide-40
SLIDE 40

Using integer and Using integer and linear programming linear programming

pairwise case

q (x , x ) = q (x ) ∀i, j ∈ E, x ∑xi

i,j i j j j j

q (x , x ) ∈ {0, 1}

i,j i j

q (x ) = 1 ∀i ∑xi

i i

arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j

linear programming has a polynomial-time solution

relax this constraint to ensure that assignments to different factors are consistent

q (x , x ) ≥ 0 ∀i, j ∈ E, x , x

i,j i j i j

{q }

i,j

slide-41
SLIDE 41

Using integer and Using integer and linear programming linear programming

pairwise case

q (x , x ) = q (x ) ∀i, j ∈ E, x ∑xi

i,j i j j j j

q (x , x ) ∈ {0, 1}

i,j i j

q (x ) = 1 ∀i ∑xi

i i

arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j

linear programming has a polynomial-time solution

relax this constraint to ensure that assignments to different factors are consistent

q (x , x ) ≥ 0 ∀i, j ∈ E, x , x

i,j i j i j

local consistency constraints that we saw earlier

  • uter-bound to marginal polytope for globally consistent

{q }

i,j

slide-42
SLIDE 42

Using integer and Using integer and linear programming linear programming

pairwise case

Marginal polytope

conv{[I[X = x , X = x ]] ∣ X}

i i j j i,j∈E,x ,x

i j

∃q(x)s.t. max q(x) = q (x , x )

x−i,j i,j i j

alternative form the convex hull of sufficient statistics for all assignments to x

[q (x , x )]

i,j i j i,j∈E,x ,x

i j

slide-43
SLIDE 43

Using integer and Using integer and linear programming linear programming

pairwise case

q (x , x ) = q (x ) ∀i, j ∈ E, x ∑xi

i,j i j j j j

q (x ) = 1 ∀i ∑xi

i i

Local consistency polytope

q (x , x ) ≥ 0 ∀i, j ∈ E, x , x

i,j i j i j

Marginal polytope

conv{[I[X = x , X = x ]] ∣ X}

i i j j i,j∈E,x ,x

i j

[q (x , x )]

i,j i j i,j∈E,x ,x

i j

∃q(x)s.t. max q(x) = q (x , x )

x−i,j i,j i j

alternative form the convex hull of sufficient statistics for all assignments to x

[q (x , x )]

i,j i j i,j∈E,x ,x

i j

slide-44
SLIDE 44

Using integer and Using integer and linear programming linear programming

why is this important? LP solutions are at corners of the polytope (why?) LP using is an upper-bound to the MAP value using

L

M

slide-45
SLIDE 45

Using integer and Using integer and linear programming linear programming

why is this important? LP solutions are at corners of the polytope (why?) LP using is an upper-bound to the MAP value using

L

M

LP solution found using L

slide-46
SLIDE 46

Using integer and Using integer and linear programming linear programming

why is this important? LP solutions are at corners of the polytope (why?) LP using is an upper-bound to the MAP value using

L

M

LP solution found using L LP solution found using M

is integral (by definition) gives the correct MAP assignment is difficult to specify

M

slide-47
SLIDE 47

Recall: Recall: variational derivation of BP variational derivation of BP

arg max H(q ) − (∣Nb ∣ − 1)H(q ) + q (x , x ) ln ϕ (x , x )

{q} ∑i,j∈E i,j

∑i

i i

∑i,j∈E ∑xi,j

i,j i j i,j i j

slide-48
SLIDE 48

Recall: Recall: variational derivation of BP variational derivation of BP

arg max H(q ) − (∣Nb ∣ − 1)H(q ) + q (x , x ) ln ϕ (x , x )

{q} ∑i,j∈E i,j

∑i

i i

∑i,j∈E ∑xi,j

i,j i j i,j i j

q (x , x ) = q (x ) ∀i, j ∈ E, x ∑xi

i,j i j j j j

q (x , x ) ≥ 0 ∀i, j ∈ E, x , x

i,j i j i j

locally consistent marginal distributions

q (x ) = 1 ∀i ∑xi

i i

slide-49
SLIDE 49

Recall: Recall: variational derivation of BP variational derivation of BP

arg max H(q ) − (∣Nb ∣ − 1)H(q ) + q (x , x ) ln ϕ (x , x )

{q} ∑i,j∈E i,j

∑i

i i

∑i,j∈E ∑xi,j

i,j i j i,j i j

q (x , x ) = q (x ) ∀i, j ∈ E, x ∑xi

i,j i j j j j

q (x , x ) ≥ 0 ∀i, j ∈ E, x , x

i,j i j i j

locally consistent marginal distributions

q (x ) = 1 ∀i ∑xi

i i

BP update is derived as "fixed-points" of the Lagrangian

BP messages are the (exponential form of the) Lagrange multipliers

slide-50
SLIDE 50

replace in the equation above

Relationship between LP & BP Relationship between LP & BP

arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j

+ H(q)

sum-product BP objective

LP objective

p(x) ∝ ϕ (x , x )

T 1

∏i,j∈E

i,j i j

T 1

pairwise case

arg max q (x , x ) ln ϕ (x , x )

{q} T 1 ∑i,j∈E ∑xi,j i,j i j i,j i j

+ H(q)

= arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j + TH(q)

slide-51
SLIDE 51

replace in the equation above

Relationship between LP & BP Relationship between LP & BP

q (x , x ) = q (x ) ∀i, j ∈ E, x ∑xi

i,j i j j j j

q (x ) = 1 ∀i ∑xi

i i

q (x , x ) ≥ 0 ∀i, j ∈ E, x , x

i,j i j i j

arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j

+ H(q)

sum-product BP objective

LP objective

p(x) ∝ ϕ (x , x )

T 1

∏i,j∈E

i,j i j

T 1

pairwise case

arg max q (x , x ) ln ϕ (x , x )

{q} T 1 ∑i,j∈E ∑xi,j i,j i j i,j i j

+ H(q)

= arg max q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑xi,j i,j i j i,j i j + TH(q)

slide-52
SLIDE 52

sum-product BP for marginalization at the zero-temperature limit is similar to LP relaxation of MAP inference

they are equivalent for concave entropy approximations

Relationship between LP & BP Relationship between LP & BP

lim p(x)

T→0

T 1

sum-product BP at the zero-temperature limit is similar to max-product BP

lim p(x)

T→0

T 1

In practice, max-product BP can be much more efficient than LP

it uses the graph structure

they are equivalent for concave entropy approximations

slide-53
SLIDE 53

reduce MAP inference to min-cut problem use efficient & optimal min-cut solvers setting: binary pairwise MRF

using using graph cuts graph cuts

p(x) ∝ exp(−E(x)) E(x) = ϵ (x ) + ϵ (x , x ) ∑i

i i

∑i,j∈E

i,j i j

image: https://www.geeksforgeeks.org

graph-cut problem: partition the nodes into two sets that include source and target at min cost

slide-54
SLIDE 54

reduce MAP inference to min-cut problem use efficient & optimal min-cut solvers setting: binary pairwise MRF metric interactions

using using graph cuts graph cuts

p(x) ∝ exp(−E(x)) E(x) = ϵ (x ) + ϵ (x , x ) ∑i

i i

∑i,j∈E

i,j i j

reflexivity symmetry triangle inequality ϵ (x , x ) = 0 ⇔ x = x

i,j i j i j

ϵ (x , x ) = ϵ (x , x )

i,j i j j,i j i

ϵ (a, b) + ϵ (b, c) ≥ ϵ (a, c)

i,j i,j i,j

image: https://www.geeksforgeeks.org

graph-cut problem: partition the nodes into two sets that include source and target at min cost

slide-55
SLIDE 55

reduction reduction to graph-cuts to graph-cuts

x1 x2 x3 x4

ϵ (x ) = 2x

2 2 2

ϵ ( x ) = 7 ( 1 − x )

1 1 1

ϵ ( x ) = x

3 3 3

ϵ (x ) = 6x

4 4 4

ϵ (x , x ) = −6I(x = x )

1 , 2 1 2 1 2

ϵ (x , x ) = −6I(x = x )

2,3 2 3 2 3

ϵ (x , x ) = −2I(x = x )

3,4 3 4 3 4

ϵ (x , x ) = −I(x = x )

1,4 1 4 1 4

reduction through an example:

slide-56
SLIDE 56

source node's partition assignment of 0 target node's partition assignment of 1

reduction reduction to graph-cuts to graph-cuts

x1 x2 x3 x4

ϵ (x ) = 2x

2 2 2

ϵ ( x ) = 7 ( 1 − x )

1 1 1

ϵ ( x ) = x

3 3 3

ϵ (x ) = 6x

4 4 4

ϵ (x , x ) = −6I(x = x )

1 , 2 1 2 1 2

ϵ (x , x ) = −6I(x = x )

2,3 2 3 2 3

ϵ (x , x ) = −2I(x = x )

3,4 3 4 3 4

ϵ (x , x ) = −I(x = x )

1,4 1 4 1 4

⇒ ⇒

reduction through an example:

p(x) ∝ exp(−E(x))

E(x) = ϵ (x ) + ϵ (x , x ) ∑i

i i

∑i,j∈E

i,j i j

slide-57
SLIDE 57

source node's partition assignment of 0 target node's partition assignment of 1

reduction reduction to graph-cuts to graph-cuts

x1 x2 x3 x4

ϵ (x ) = 2x

2 2 2

ϵ ( x ) = 7 ( 1 − x )

1 1 1

ϵ ( x ) = x

3 3 3

ϵ (x ) = 6x

4 4 4

ϵ (x , x ) = −6I(x = x )

1 , 2 1 2 1 2

ϵ (x , x ) = −6I(x = x )

2,3 2 3 2 3

ϵ (x , x ) = −2I(x = x )

3,4 3 4 3 4

ϵ (x , x ) = −I(x = x )

1,4 1 4 1 4

⇒ ⇒

any metric MRF is reducible to this form reduction through an example:

p(x) ∝ exp(−E(x))

E(x) = ϵ (x ) + ϵ (x , x ) ∑i

i i

∑i,j∈E

i,j i j

slide-58
SLIDE 58

source node's partition assignment of 0 target node's partition assignment of 1

reduction reduction to graph-cuts to graph-cuts

x1 x2 x3 x4

ϵ (x ) = 2x

2 2 2

ϵ ( x ) = 7 ( 1 − x )

1 1 1

ϵ ( x ) = x

3 3 3

ϵ (x ) = 6x

4 4 4

ϵ (x , x ) = −6I(x = x )

1 , 2 1 2 1 2

ϵ (x , x ) = −6I(x = x )

2,3 2 3 2 3

ϵ (x , x ) = −2I(x = x )

3,4 3 4 3 4

ϵ (x , x ) = −I(x = x )

1,4 1 4 1 4

⇒ ⇒

any metric MRF is reducible to this form reduction through an example: non-optimal extensions to variables with higher cardinality

p(x) ∝ exp(−E(x))

E(x) = ϵ (x ) + ϵ (x , x ) ∑i

i i

∑i,j∈E

i,j i j

slide-59
SLIDE 59

variable elimination max-product belief propagation IP and LP relaxation graph-cuts dual decomposition branch and bound methods local search

Other methods for MAP inference Other methods for MAP inference

slide-60
SLIDE 60

MAP and marginal MAP are NP-hard distributive law extends to MAP inference

variable elimination clique-tree loopy BP

Summary Summary

an additional challenge of decoding

slide-61
SLIDE 61

MAP and marginal MAP are NP-hard distributive law extends to MAP inference

variable elimination clique-tree loopy BP

variational perspective, connects three approaches: max-product LBP (can find strong local optima!) sum-product LBP (theoretical zero temperature limit) LP relaxations

Summary Summary

an additional challenge of decoding

slide-62
SLIDE 62

MAP and marginal MAP are NP-hard distributive law extends to MAP inference

variable elimination clique-tree loopy BP

variational perspective, connects three approaches: max-product LBP (can find strong local optima!) sum-product LBP (theoretical zero temperature limit) LP relaxations for some family of loopy graphs, exact polynomial-time inference is possible (graph-cuts)

Summary Summary

an additional challenge of decoding