Probabilistic Graphical Models Probabilistic Graphical Models MAP - - PowerPoint PPT Presentation

probabilistic graphical models probabilistic graphical
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Graphical Models Probabilistic Graphical Models MAP - - PowerPoint PPT Presentation

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh Fall 2019 Learning objectives Learning objectives MAP inference and its complexity exact & approximate MAP inference max-product and max-sum


slide-1
SLIDE 1

Probabilistic Graphical Models Probabilistic Graphical Models

MAP inference

Siamak Ravanbakhsh Fall 2019

slide-2
SLIDE 2

Learning objectives Learning objectives

MAP inference and its complexity exact & approximate MAP inference max-product and max-sum message passing relationship to LP relaxation graph-cuts for MAP inference

slide-3
SLIDE 3

Optimization Optimization

x =

arg max

f(x)

x

slide-4
SLIDE 4

Optimization Optimization

x =

arg max

f(x)

x g

(x) ≥

c

∀c h

(x) =

d

∀d

may or may not have constraints continuous or discrete (combinatorial)...

slide-5
SLIDE 5

Optimization Optimization

local search heuristics

hill-climbing beam search tabu search ..

simulated annealing integer program genetic algorithm branch and bound: when you can efficiently upper-bound partial assignments

x =

arg max

f(x)

x g

(x) ≥

c

∀c h

(x) =

d

∀d

may or may not have constraints continuous or discrete (combinatorial)...

slide-6
SLIDE 6

Optimization Optimization

local search heuristics

hill-climbing beam search tabu search ..

simulated annealing integer program genetic algorithm branch and bound: when you can efficiently upper-bound partial assignments

x =

arg max

f(x)

x g

(x) ≥

c

∀c h

(x) =

d

∀d

may or may not have constraints continuous or discrete (combinatorial)...

what if f(x) is structured?

f(x) =

f (x )

∑I

I I

MAP inference in a graphical model

slide-7
SLIDE 7

Definition & complexity Definition & complexity

given Bayes-net, deciding whether for some is NP-complete! p(x) > c

x

MAP

decision problem

arg max

p(x)

x

side-chain prediction as MAP inference

(Yanover & Weiss)

slide-8
SLIDE 8

Definition & complexity Definition & complexity

given Bayes-net, deciding whether for some is NP-complete! p(x) > c

x

MAP Marginal MAP given Bayes-net for , deciding whether for some is complete for p(x) > c

x

p(x, y)

decision problem

arg max

p(x)

x decision problem

arg max

p(x, y)

x ∑y

NP PP

a non-deterministic Turing machine that accepts if the majority of paths accept a non-deterministic Turing machine that accepts if a single path accepts (with access to a PP oracle)

is NP-hard even for trees

side-chain prediction as MAP inference

(Yanover & Weiss)

slide-9
SLIDE 9

Problem & terminology Problem & terminology

MAP inference: arg max

p(x) =

x

arg max

ϕ (x )

x Z 1 ∏I I I

≡ arg max

(x) =

x p

~ arg max

ϕ (x )

x ∏I I I

ignore the normalization constant

aka max-product inference

slide-10
SLIDE 10

Problem & terminology Problem & terminology

MAP inference: arg max

p(x) =

x

arg max

ϕ (x )

x Z 1 ∏I I I

≡ arg max

(x) =

x p

~ arg max

ϕ (x )

x ∏I I I

ignore the normalization constant

with evidence: arg max

p(x ∣

x

e) = arg max

x p(e) p(x,e)

arg max

p(x, e)

x

aka max-product inference

slide-11
SLIDE 11

Problem & terminology Problem & terminology

MAP inference: arg max

p(x) =

x

arg max

ϕ (x )

x Z 1 ∏I I I

≡ arg max

(x) =

x p

~ arg max

ϕ (x )

x ∏I I I

ignore the normalization constant

with evidence: arg max

p(x ∣

x

e) = arg max

x p(e) p(x,e)

arg max

p(x, e)

x

aka max-product inference

log domain: arg max

p(x) ≡

x

arg max

ln ϕ (x ) ≡

x ∑I I I

arg min

− ln (x)

x

p ~

aka max-sum inference aka min-sum inference (energy minimization)

slide-12
SLIDE 12

Max-marginals Max-marginals

marginal

ϕ(x, y)

∑x∈V al(x)

is replaced with max-marginal max

ϕ(x, y)

x∈V al(x)

used in sum-product inference

ϕ(a, b, c) ϕ (a, c) =

max

ϕ(a, b, c)

b

slide-13
SLIDE 13

distributive law distributive law for MAP inference for MAP inference

max(ab, ac) = a max(b, c)

3 operations 2 operations

max(min(a, b), min(a, c)) = max(a, min(b, c)) ab + ac = a(b + c) max(a + b, a + c) = a + max(b, c)

sum-product inference min-max inference max-sum inference max-product inference

slide-14
SLIDE 14

distributive law distributive law for MAP inference for MAP inference

max(ab, ac) = a max(b, c)

3 operations 2 operations

save computation by factoring the operations in disguise assuming complexity: from to

∣V al(X)∣ = ∣V al(Y )∣ = ∣V al(Z)∣ = d

O(d )

3

O(d )

2

max(min(a, b), min(a, c)) = max(a, min(b, c)) ab + ac = a(b + c) max(a + b, a + c) = a + max(b, c) max

f(x, y)g(y, z) =

x,y

max

g(y, z) max f(x, y)

y x

sum-product inference min-max inference max-sum inference max-product inference

slide-15
SLIDE 15

Max-product Max-product variable elimination variable elimination

input: a set of factors (e.g. CPDs)

  • utput:

go over in some order: collect all the relevant factors: calculate their product: max-marginalize out : update the set of factors: return the scalar in as

x

, … , x

i

1

i

n

Φ =

t=0

, … , ϕ }

1 K

Ψ =

t

{ϕ ∈ Φ ∣

t

x

i

t

Scope[ϕ]} ψ

=

t

ϕ

∏ϕ∈Ψt

xi

t ψ

=

t ′

max

ψ

x

i t

t

max

(x) =

x p

~ max

ϕ (x )

x ∏I I I

Φ =

t

Φ −

t−1

Ψ +

t

}

t ′

Φt=m

the procedure is similar to VE for sum-product inference eliminate all the variables

max

(x)

x p

~

maximizing value

Z =

(x)

∑x p ~

similar to the partition function:

slide-16
SLIDE 16

Decoding Decoding the max-value the max-value

input: a set of factors (e.g. CPDs)

  • utput:

go over in some order: collect all the relevant factors: calculate their product: max-marginalize out : update the set of factors: return the scalar in as

x

, … , x

i

1

i

n

Φ =

t=0

, … , ϕ }

1 K

Ψ =

t

{ϕ ∈ Φ ∣

t

x

i

t

Scope[ϕ]} ψ

=

t

ϕ

∏ϕ∈Ψt

xi

t ψ

=

t ′

max

ψ

x

i t

t

max

(x) =

x p

~ max

ϕ (x )

x ∏I I I

Φ =

t

Φ −

t−1

Ψ +

t

}

t ′

Φt=m

keep , produced during inference

max

(x)

x p

~

, … , ψ }

t=1 t=n

we need to recover the maximizing assignment x∗

slide-17
SLIDE 17

Decoding Decoding the max-value the max-value

input: a set of factors (e.g. CPDs)

  • utput:

go over in some order: collect all the relevant factors: calculate their product: max-marginalize out : update the set of factors: return the scalar in as

x

, … , x

i

1

i

n

Φ =

t=0

, … , ϕ }

1 K

Ψ =

t

{ϕ ∈ Φ ∣

t

x

i

t

Scope[ϕ]} ψ

=

t

ϕ

∏ϕ∈Ψt

xi

t ψ

=

t ′

max

ψ

x

i t

t

max

(x) =

x p

~ max

ϕ (x )

x ∏I I I

Φ =

t

Φ −

t−1

Ψ +

t

}

t ′

Φt=m

start from the last eliminated variable

max

(x)

x p

~

should have been a function of alone:

ψ

t=n

x

i

n

x

i

n

arg max ψ

n

slide-18
SLIDE 18

Decoding Decoding the max-value the max-value

input: a set of factors (e.g. CPDs)

  • utput:

go over in some order: collect all the relevant factors: calculate their product: max-marginalize out : update the set of factors: return the product of scalars in as

x

, … , x

i

1

i

n

Φ =

t=0

, … , ϕ }

1 K

Ψ =

t

{ϕ ∈ Φ ∣

t

x

i

t

Scope[ϕ]} ψ

=

t

ϕ

∏ϕ∈Ψt

xi

t ψ

=

t ′

max

ψ

x

i t

t

max

(x) =

x p

~ max

ϕ (x )

x ∏I I I

Φ =

t

Φ −

t−1

Ψ +

t

}

t ′

Φt=m

start from the last eliminated variable at this point we have can only have in its domain

max

(x)

x p

~ ψ

t=n−1

x

i

n−1

arg max

ψ (x , x )

x

i n−1

n−1 i

n−1

i

n ∗

x

i

n

x

, x

i

n−1

i

n

and so on...

slide-19
SLIDE 19

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum do not commute

max

ϕ(x, y) =

x ∑y

max ϕ(x, y)

∑y

x

max

ϕ (x )

y

,…,y

1 m ∑x

,…,x

1 n ∏I

I I

slide-20
SLIDE 20

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum do not commute

max

ϕ(x, y) =

x ∑y

max ϕ(x, y)

∑y

x

max

ϕ (x )

y

,…,y

1 m ∑x

,…,x

1 n ∏I

I I

cannot use arbitrary elimination order

slide-21
SLIDE 21

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum do not commute

max

ϕ(x, y) =

x ∑y

max ϕ(x, y)

∑y

x

max

ϕ (x )

y

,…,y

1 m ∑x

,…,x

1 n ∏I

I I

cannot use arbitrary elimination order first, eliminate (sum-prod VE) {x

, … , x }

1 n

slide-22
SLIDE 22

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum do not commute

max

ϕ(x, y) =

x ∑y

max ϕ(x, y)

∑y

x

max

ϕ (x )

y

,…,y

1 m ∑x

,…,x

1 n ∏I

I I

cannot use arbitrary elimination order first, eliminate (sum-prod VE) then eliminate (max-prod VE) decode the maximizing value {x

, … , x }

1 n

{y

, … , y }

1 m

slide-23
SLIDE 23

Marginal-MAP Marginal-MAP variable elimination variable elimination

the procedure remains similar for max and sum do not commute

max

ϕ(x, y) =

x ∑y

max ϕ(x, y)

∑y

x

max

ϕ (x )

y

,…,y

1 m ∑x

,…,x

1 n ∏I

I I

cannot use arbitrary elimination order first, eliminate (sum-prod VE) then eliminate (max-prod VE) decode the maximizing value {x

, … , x }

1 n

{y

, … , y }

1 m

example: exponential complexity despite low tree-width

slide-24
SLIDE 24

Max-product BP Max-product BP

In clique-trees, cluster-graphs, factor-graph building the chordal graph building the clique-tree tree-width (complexity of inference) ... remains the same!

slide-25
SLIDE 25

Max-product BP Max-product BP

In clique-trees, cluster-graphs, factor-graph building the chordal graph building the clique-tree tree-width (complexity of inference) ... remains the same! main differences: replacing sum with max decoding the maximizing assignment variational interpretation

slide-26
SLIDE 26

Max-product BP Max-product BP

x

1

x

2

x

3

x

4

x

5

ψ

{1,2,4}

ψ

{3,5}

p(x) =

ψ (x )

Z 1 ∏I I I

Example factor-graph

slide-27
SLIDE 27

Max-product BP Max-product BP

δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message:

x

1

x

2

x

3

x

4

x

5

ψ

{1,2,4}

ψ

{3,5}

p(x) =

ψ (x )

Z 1 ∏I I I

Example factor-graph

slide-28
SLIDE 28

Max-product BP Max-product BP

δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message:

x

1

x

2

x

3

x

4

x

5

ψ

{1,2,4}

ψ

{3,5}

p(x) =

ψ (x )

Z 1 ∏I I I

factor-to-variable message: δ

(x ) ∝

I→i i

max

ψ (x ) δ (x )

x

I−i

I I ∏j∈I−i j→I i

Example factor-graph

slide-29
SLIDE 29

Max-product BP Max-product BP

δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message:

x

1

x

2

x

3

x

4

x

5

ψ

{1,2,4}

ψ

{3,5}

p(x) =

ψ (x )

Z 1 ∏I I I

factor-to-variable message: δ

(x ) ∝

I→i i

max

ψ (x ) δ (x )

x

I−i

I I ∏j∈I−i j→I i

Example factor-graph

  • approx. max-marginals:

β(x

) ∝

i

δ (x )

∏J∣i∈J

J→i i

slide-30
SLIDE 30

Max-product BP Max-product BP

δ

(x ) ∝ δ (x )

i→I i

∏J∣i∈J,J

=I

 J→i i

variable-to-factor message:

x

1

x

2

x

3

x

4

x

5

ψ

{1,2,4}

ψ

{3,5}

p(x) =

ψ (x )

Z 1 ∏I I I

factor-to-variable message: δ

(x ) ∝

I→i i

max

ψ (x ) δ (x )

x

I−i

I I ∏j∈I−i j→I i

Example factor-graph

  • approx. max-marginals:

β(x

) ∝

i

δ (x )

∏J∣i∈J

J→i i

use damping for convergence in loopy graphs

slide-31
SLIDE 31

Decoding Decoding exact exact max-marginals max-marginals

x

=

i ∗

arg max

β(x )

x

i

i

Single MAP assignment

clique-trees &factor-graphs without any loops

MAP assignment is unique

x =

arg max

p(x)

x

max-marginals are unambiguous

slide-32
SLIDE 32

Decoding Decoding exact exact max-marginals max-marginals

x

=

i ∗

arg max

β(x )

x

i

i

Single MAP assignment

clique-trees &factor-graphs without any loops

MAP assignment is unique

x =

arg max

p(x)

x

max-marginals are unambiguous Multiple MAP assignments

p(x

, x ) =

1 2

I(x =

2 1 1

x

)

2

example

β(x

=

1

0) = β(x

=

1

1) β(x

=

2

0) = β(x

=

2

1)

slide-33
SLIDE 33

Decoding Decoding exact exact max-marginals max-marginals

x

=

i ∗

arg max

β(x )

x

i

i

Single MAP assignment

clique-trees &factor-graphs without any loops

MAP assignment is unique

x =

arg max

p(x)

x

max-marginals are unambiguous Multiple MAP assignments

p(x

, x ) =

1 2

I(x =

2 1 1

x

)

2

example

β(x

=

1

0) = β(x

=

1

1) β(x

=

2

0) = β(x

=

2

1)

⇒ a join assignment exists

that is locally optimal x∗

β(x

) =

i ∗

max

β(x )∀i

x

i

i

β(x

) =

I ∗

max

β(x )∀I

x

I

I

easy to find (how?)

slide-34
SLIDE 34

Decoding Decoding pseudo pseudo max-marginals max-marginals

best local assignments may be incompatible

cluster-graphs, loopy factor-graphs

example

a

b

c

b=0 b=1 a=0 1 2 a=1 2 1

β(a, b)

b=0 b=1 c=0 1 2 c=1 2 1

β(b, c)

a=0 a=1 c=0 1 2 c=1 2 1

β(a, c)

slide-35
SLIDE 35

Decoding Decoding pseudo pseudo max-marginals max-marginals

best local assignments may be incompatible

cluster-graphs, loopy factor-graphs

example

a

b

c

b=0 b=1 a=0 1 2 a=1 2 1

β(a, b)

b=0 b=1 c=0 1 2 c=1 2 1

β(b, c)

a=0 a=1 c=0 1 2 c=1 2 1

β(a, c)

b

c

b=0 b=1 a=0 3 2 a=1 2 3

β(a, b)

b=0 b=1 c=0 3 2 c=1 2 3

β(b, c)

a=0 a=1 c=0 3 2 c=1 2 3

β(a, c)

a

example ... or compatible

slide-36
SLIDE 36

Decoding Decoding pseudo pseudo max-marginals max-marginals

best local assignments may be incompatible

cluster-graphs, loopy factor-graphs

example

a

b

c

b=0 b=1 a=0 1 2 a=1 2 1

β(a, b)

b=0 b=1 c=0 1 2 c=1 2 1

β(b, c)

a=0 a=1 c=0 1 2 c=1 2 1

β(a, c)

If have unique max., a unique locally optimal belief exists

m(a), m(b), m(c)

b

c

b=0 b=1 a=0 3 2 a=1 2 3

β(a, b)

b=0 b=1 c=0 3 2 c=1 2 3

β(b, c)

a=0 a=1 c=0 3 2 c=1 2 3

β(a, c)

a

example ... or compatible

slide-37
SLIDE 37

Decoding Decoding pseudo pseudo max-marginals max-marginals

given a set of cluster max-marginals how to find locally optimal (optimal in all ) if it exists

cluster-graphs, loopy factor-graphs

{m

(x )}

I I I

x ^∗

m

I

reduce to a constraint satisfaction problem use decimation: run inference fix a subset of variables repeat until all vars are fixed

=

x ^I

arg max

m (x )

x

I

I I

slide-38
SLIDE 38

Optimality Optimality of max-product

  • f max-product loopy

loopy BP BP

a locally optimal assignment is a strong local maxima of

m(

) =

x ^i

max

m(x )∀i

x

i

i

m(

) =

x ^I

max

m(x )∀I

x

I

I

p(x)

x ^∗

slide-39
SLIDE 39

Optimality Optimality of max-product

  • f max-product loopy

loopy BP BP

a locally optimal assignment is a strong local maxima of

m(

) =

x ^i

max

m(x )∀i

x

i

i

m(

) =

x ^I

max

m(x )∀I

x

I

I

p(x)

no better assignment exists in a large neighborhood of

x ^∗ x ^∗

slide-40
SLIDE 40

Optimality Optimality of max-product

  • f max-product loopy

loopy BP BP

a locally optimal assignment is a strong local maxima of

m(

) =

x ^i

max

m(x )∀i

x

i

i

m(

) =

x ^I

max

m(x )∀I

x

I

I

p(x)

no better assignment exists in a large neighborhood of

x ^∗ x ^∗

pick any subset of variables build a subgraph with all factors that have a variable in T if this subgraph does not have more than one loop then cannot be improved by changing the vars in

T ⊆ {1, … , n}

G

T

p( ) x ^∗

T

from: Weiss & Freeman

example

slide-41
SLIDE 41

Using Using integer integer and linear programming and linear programming

pairwise case

ln

(x) =

p ~

ln ϕ (x , x )

∑i,j

i,j i j looking for an assignment to maximize this sum

x∗

slide-42
SLIDE 42

Using Using integer integer and linear programming and linear programming

pairwise case

q

(x , x ) ∈

i,j i j

{0, 1} ∀i, j ∈ E, x

, x

i j

arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

ln

(x) =

p ~

ln ϕ (x , x )

∑i,j

i,j i j looking for an assignment to maximize this sum

integer-programming formulation:

picks a single assignment for vars in each factor

x∗

slide-43
SLIDE 43

Using Using integer integer and linear programming and linear programming

pairwise case

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

(x , x ) ∈

i,j i j

{0, 1} ∀i, j ∈ E, x

, x

i j

q (x ) =

∑x

i

i i

1 ∀i arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

ln

(x) =

p ~

ln ϕ (x , x )

∑i,j

i,j i j looking for an assignment to maximize this sum

integer-programming formulation:

picks a single assignment for vars in each factor ensure that assignments to different factors are consistent

x∗

slide-44
SLIDE 44

Using Using integer integer and linear programming and linear programming

pairwise case

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

(x , x ) ∈

i,j i j

{0, 1} ∀i, j ∈ E, x

, x

i j

q (x ) =

∑x

i

i i

1 ∀i arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

ln

(x) =

p ~

ln ϕ (x , x )

∑i,j

i,j i j looking for an assignment to maximize this sum

integer-programming formulation:

picks a single assignment for vars in each factor ensure that assignments to different factors are consistent

x∗

solution to this NP-hard program is the MAP assignment

slide-45
SLIDE 45

Using integer and Using integer and linear programming linear programming

pairwise case

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

(x , x ) ∈

i,j i j

{0, 1}

q (x ) =

∑x

i

i i

1 ∀i arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

linear programming has a polynomial-time solution

ensure that assignments to different factors are consistent

slide-46
SLIDE 46

Using integer and Using integer and linear programming linear programming

pairwise case

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

(x , x ) ∈

i,j i j

{0, 1}

q (x ) =

∑x

i

i i

1 ∀i arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

linear programming has a polynomial-time solution

relax this constraint to ensure that assignments to different factors are consistent

q

(x , x ) ≥

i,j i j

∀i, j ∈ E, x

, x

i j

slide-47
SLIDE 47

Using integer and Using integer and linear programming linear programming

pairwise case

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

(x , x ) ∈

i,j i j

{0, 1}

q (x ) =

∑x

i

i i

1 ∀i arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

linear programming has a polynomial-time solution

relax this constraint to ensure that assignments to different factors are consistent

q

(x , x ) ≥

i,j i j

∀i, j ∈ E, x

, x

i j

local consistency constraints that we saw earlier

  • uter-bound to marginal polytope for globally consistent

{q

}

i,j

slide-48
SLIDE 48

Using integer and Using integer and linear programming linear programming

pairwise case

Marginal polytope

conv{[I[X

=

i

x

, X =

i j

x

]] ∣

j i,j∈E,x

,x

i j

X}

∃q(x)s.t. max

q(x) =

x

−i,j

q

(x , x )

i,j i j

alternative form the convex hull of sufficient statistics for all assignments to x

[q

(x , x )]

i,j i j i,j∈E,x

,x

i j

slide-49
SLIDE 49

Using integer and Using integer and linear programming linear programming

pairwise case

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q (x ) =

∑x

i

i i

1 ∀i

Local consistency polytope

q

(x , x ) ≥ 0

∀i, j ∈ E, x

, x

i,j i j i j

Marginal polytope

conv{[I[X

=

i

x

, X =

i j

x

]] ∣

j i,j∈E,x

,x

i j

X}

[q

(x , x )]

i,j i j i,j∈E,x

,x

i j

∃q(x)s.t. max

q(x) =

x

−i,j

q

(x , x )

i,j i j

alternative form the convex hull of sufficient statistics for all assignments to x

[q

(x , x )]

i,j i j i,j∈E,x

,x

i j

slide-50
SLIDE 50

Using integer and Using integer and linear programming linear programming

why is this important? LP solutions are at corners of the polytope (why?) LP using is an upper-bound to the MAP value using

L

M

slide-51
SLIDE 51

Using integer and Using integer and linear programming linear programming

why is this important? LP solutions are at corners of the polytope (why?) LP using is an upper-bound to the MAP value using

L

M

LP solution found using L

slide-52
SLIDE 52

Using integer and Using integer and linear programming linear programming

why is this important? LP solutions are at corners of the polytope (why?) LP using is an upper-bound to the MAP value using

L

M

LP solution found using L LP solution found using M

is integral (by definition) gives the correct MAP assignment is difficult to specify

M

slide-53
SLIDE 53

Recall: Recall: variational derivation of BP variational derivation of BP

arg max

H(q ) −

{q} ∑i,j∈E i,j

(∣Nb ∣ −

∑i

i

1)H(q

) +

i

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

slide-54
SLIDE 54

Recall: Recall: variational derivation of BP variational derivation of BP

arg max

H(q ) −

{q} ∑i,j∈E i,j

(∣Nb ∣ −

∑i

i

1)H(q

) +

i

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

(x , x ) ≥

i,j i j

∀i, j ∈ E, x

, x

i j

locally consistent marginal distributions

q (x ) =

∑x

i

i i

1 ∀i

slide-55
SLIDE 55

Recall: Recall: variational derivation of BP variational derivation of BP

arg max

H(q ) −

{q} ∑i,j∈E i,j

(∣Nb ∣ −

∑i

i

1)H(q

) +

i

q (x , x ) ln ϕ (x , x )

∑i,j∈E ∑x

i,j

i,j i j i,j i j

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q

(x , x ) ≥

i,j i j

∀i, j ∈ E, x

, x

i j

locally consistent marginal distributions

q (x ) =

∑x

i

i i

1 ∀i

BP update is derived as "fixed-points" of the Lagrangian

BP messages are the (exponential form of the) Lagrange multipliers

slide-56
SLIDE 56

Relationship between LP & BP Relationship between LP & BP

q (x , x ) =

∑x

i

i,j i j

q

(x )

∀i, j ∈

j j

E, x

j

q (x ) =

∑x

i

i i

1 ∀i q

(x , x ) ≥ 0

∀i, j ∈ E, x

, x

i,j i j i j

arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

+ H(q)

sum-product BP objective

pairwise case

slide-57
SLIDE 57

Relationship between LP & BP Relationship between LP & BP

arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

+ H(q)

sum-product BP objective

LP objective

pairwise case

slide-58
SLIDE 58

replace in the equation above

Relationship between LP & BP Relationship between LP & BP

arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

+ H(q)

sum-product BP objective

LP objective

p(x) ∝

T 1

ϕ (x , x )

∏i,j∈E

i,j i j

T 1

pairwise case

slide-59
SLIDE 59

replace in the equation above

Relationship between LP & BP Relationship between LP & BP

arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

+ H(q)

sum-product BP objective

LP objective

p(x) ∝

T 1

ϕ (x , x )

∏i,j∈E

i,j i j

T 1

pairwise case

arg max

q (x , x ) ln ϕ (x , x )

{q} T 1 ∑i,j∈E ∑x

i,j

i,j i j i,j i j + H(q)

slide-60
SLIDE 60

replace in the equation above

Relationship between LP & BP Relationship between LP & BP

arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

+ H(q)

sum-product BP objective

LP objective

p(x) ∝

T 1

ϕ (x , x )

∏i,j∈E

i,j i j

T 1

pairwise case

arg max

q (x , x ) ln ϕ (x , x )

{q} T 1 ∑i,j∈E ∑x

i,j

i,j i j i,j i j + H(q)

= arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j + TH(q)

slide-61
SLIDE 61

replace in the equation above

Relationship between LP & BP Relationship between LP & BP

arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j

+ H(q)

sum-product BP objective

LP objective

p(x) ∝

T 1

ϕ (x , x )

∏i,j∈E

i,j i j

T 1

pairwise case

arg max

q (x , x ) ln ϕ (x , x )

{q} T 1 ∑i,j∈E ∑x

i,j

i,j i j i,j i j + H(q)

= arg max

q (x , x ) ln ϕ (x , x )

{q}∑i,j∈E ∑x

i,j

i,j i j i,j i j + TH(q)

T → 0

slide-62
SLIDE 62

sum-product BP for marginalization at the zero-temperature limit is similar to LP relaxation of MAP inference

they are equivalent for concave entropy approximations

Relationship between LP & BP Relationship between LP & BP

lim

p(x)

T→0

T 1

slide-63
SLIDE 63

sum-product BP for marginalization at the zero-temperature limit is similar to LP relaxation of MAP inference

they are equivalent for concave entropy approximations

Relationship between LP & BP Relationship between LP & BP

lim

p(x)

T→0

T 1

sum-product BP at the zero-temperature limit is similar to max-product BP

lim

p(x)

T→0

T 1

they are equivalent for concave entropy approximations

slide-64
SLIDE 64

sum-product BP for marginalization at the zero-temperature limit is similar to LP relaxation of MAP inference

they are equivalent for concave entropy approximations

Relationship between LP & BP Relationship between LP & BP

lim

p(x)

T→0

T 1

sum-product BP at the zero-temperature limit is similar to max-product BP

lim

p(x)

T→0

T 1

In practice, max-product BP can be much more efficient than LP

it uses the graph structure

they are equivalent for concave entropy approximations

slide-65
SLIDE 65

reduce MAP inference to min-cut problem use efficient & optimal min-cut solvers

using using graph cuts graph cuts

image: https://www.geeksforgeeks.org

graph-cut problem: partition the nodes into two sets that include source and target at min cost

  • nly for a family of factors

arbitrary graph (i.e., large tree width poses no problem)

O(V E) algorithms exist

slide-66
SLIDE 66

reduce MAP inference to min-cut problem use efficient & optimal min-cut solvers

using using graph cuts graph cuts

p(x) ∝ exp(−E(x)) E(x) =

ϵ (x ) +

∑i

i i

ϵ (x , x )

∑i,j∈E

i,j i j

sub-modular

image: https://www.geeksforgeeks.org

ϵ

i,j

ϵ

(1, 1) +

i,j

ϵ

(0, 0) ≤

i,j

ϵ

(1, 0) +

i,j

ϵ

(0, 1)

i,j

setting:

binary pairwise MRF

x

1

x

2

x

3

x

4

ϵ

(x ) =

2 2

2x

2

ϵ

(x ) =

1 1

7 ( 1 − x

)

1

ϵ ( x ) =

3 3

x

3

ϵ

(x ) =

4 4

6x

4

ϵ

(x , x ) =

1 , 2 1 2

−6I(x

=

1

x

)

2

ϵ

(x , x ) =

2,3 2 3

−6I(x

=

2

x

)

3

ϵ

(x , x ) =

3,4 3 4

−2I(x

=

3

x

)

4

ϵ

(x , x ) =

1,4 1 4

−I(x

=

1

x

)

4

slide-67
SLIDE 67

reduction to graph-cuts: reduction to graph-cuts: example example

x

1

x

2

x

3

x

4

ϵ

(x ) =

2 2

2x

2

ϵ

(x ) =

1 1

7 ( 1 − x

)

1

ϵ ( x ) =

3 3

x

3

ϵ

(x ) =

4 4

6x

4

ϵ

(x , x ) =

1 , 2 1 2

−6I(x

=

1

x

)

2

ϵ

(x , x ) =

2,3 2 3

−6I(x

=

2

x

)

3

ϵ

(x , x ) =

3,4 3 4

−2I(x

=

3

x

)

4

ϵ

(x , x ) =

1,4 1 4

−I(x

=

1

x

)

4

slide-68
SLIDE 68

source node's partition assignment of 0 target node's partition assignment of 1

reduction to graph-cuts: reduction to graph-cuts: example example

x

1

x

2

x

3

x

4

ϵ

(x ) =

2 2

2x

2

ϵ

(x ) =

1 1

7 ( 1 − x

)

1

ϵ ( x ) =

3 3

x

3

ϵ

(x ) =

4 4

6x

4

ϵ

(x , x ) =

1 , 2 1 2

−6I(x

=

1

x

)

2

ϵ

(x , x ) =

2,3 2 3

−6I(x

=

2

x

)

3

ϵ

(x , x ) =

3,4 3 4

−2I(x

=

3

x

)

4

ϵ

(x , x ) =

1,4 1 4

−I(x

=

1

x

)

4

slide-69
SLIDE 69

source node's partition assignment of 0 target node's partition assignment of 1

reduction to graph-cuts: reduction to graph-cuts: example example

x

1

x

2

x

3

x

4

ϵ

(x ) =

2 2

2x

2

ϵ

(x ) =

1 1

7 ( 1 − x

)

1

ϵ ( x ) =

3 3

x

3

ϵ

(x ) =

4 4

6x

4

ϵ

(x , x ) =

1 , 2 1 2

−6I(x

=

1

x

)

2

ϵ

(x , x ) =

2,3 2 3

−6I(x

=

2

x

)

3

ϵ

(x , x ) =

3,4 3 4

−2I(x

=

3

x

)

4

ϵ

(x , x ) =

1,4 1 4

−I(x

=

1

x

)

4

  • max. cut

  • min. energy
slide-70
SLIDE 70

source node's partition assignment of 0 target node's partition assignment of 1

reduction to graph-cuts: reduction to graph-cuts: example example

x

1

x

2

x

3

x

4

ϵ

(x ) =

2 2

2x

2

ϵ

(x ) =

1 1

7 ( 1 − x

)

1

ϵ ( x ) =

3 3

x

3

ϵ

(x ) =

4 4

6x

4

ϵ

(x , x ) =

1 , 2 1 2

−6I(x

=

1

x

)

2

ϵ

(x , x ) =

2,3 2 3

−6I(x

=

2

x

)

3

ϵ

(x , x ) =

3,4 3 4

−2I(x

=

3

x

)

4

ϵ

(x , x ) =

1,4 1 4

−I(x

=

1

x

)

4

  • max. cut

  • min. energy

non-optimal extensions to variables with higher cardinality

slide-71
SLIDE 71

variable elimination max-product belief propagation IP and LP relaxation graph-cuts dual decomposition branch and bound methods local search

Other methods for MAP inference Other methods for MAP inference

slide-72
SLIDE 72

MAP and marginal MAP are NP-hard distributive law extends to MAP inference

variable elimination clique-tree loopy BP

Summary Summary

an additional challenge of decoding

slide-73
SLIDE 73

MAP and marginal MAP are NP-hard distributive law extends to MAP inference

variable elimination clique-tree loopy BP

variational perspective, connects three approaches: max-product LBP (can find strong local optima!) sum-product LBP (theoretical zero temperature limit) LP relaxations

Summary Summary

an additional challenge of decoding

slide-74
SLIDE 74

MAP and marginal MAP are NP-hard distributive law extends to MAP inference

variable elimination clique-tree loopy BP

variational perspective, connects three approaches: max-product LBP (can find strong local optima!) sum-product LBP (theoretical zero temperature limit) LP relaxations for some family of loopy graphs, exact polynomial-time inference is possible (graph-cuts)

Summary Summary

an additional challenge of decoding