Part 3: Probabilistic Inference in Graphical Models Sebastian - - PowerPoint PPT Presentation

part 3 probabilistic inference in graphical models
SMART_READER_LITE
LIVE PREVIEW

Part 3: Probabilistic Inference in Graphical Models Sebastian - - PowerPoint PPT Presentation

Belief Propagation Variational Inference Sampling Break Part 3: Probabilistic Inference in Graphical Models Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 Sebastian Nowozin and Christoph H. Lampert Part 3:


slide-1
SLIDE 1

Belief Propagation Variational Inference Sampling Break

Part 3: Probabilistic Inference in Graphical Models

Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-2
SLIDE 2

Belief Propagation Variational Inference Sampling Break Belief Propagation

Belief Propagation

◮ “Message-passing” algorithm ◮ Exact and optimal for tree-structured graphs ◮ Approximate for cyclic graphs

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-3
SLIDE 3

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj Yk Yl F G H

Z =

  • y∈Y

exp(−E(y)) =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−E(yi, yj, yk, yl)) =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−(EF(yi, yj) + EG(yj, yk) + EH(yk, yl)))

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-4
SLIDE 4

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj Yk Yl F G H

Z =

  • y∈Y

exp(−E(y)) =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−E(yi, yj, yk, yl)) =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−(EF(yi, yj) + EG(yj, yk) + EH(yk, yl)))

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-5
SLIDE 5

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj Yk Yl F G H

Z =

  • y∈Y

exp(−E(y)) =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−E(yi, yj, yk, yl)) =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−(EF(yi, yj) + EG(yj, yk) + EH(yk, yl)))

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-6
SLIDE 6

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj Yk Yl F G H

Z =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−(EF(yi, yj) + EG(yj, yk) + EH(yk, yl))) =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−EF(yi, yj)) exp(−EG(yj, yk)) exp(−EH(yk, yl)) =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk))

  • yl∈Yl

exp(−EH(yk, yl))

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-7
SLIDE 7

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj Yk Yl F G H

Z =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−(EF(yi, yj) + EG(yj, yk) + EH(yk, yl))) =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−EF(yi, yj)) exp(−EG(yj, yk)) exp(−EH(yk, yl)) =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk))

  • yl∈Yl

exp(−EH(yk, yl))

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-8
SLIDE 8

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj Yk Yl F G H

Z =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−(EF(yi, yj) + EG(yj, yk) + EH(yk, yl))) =

  • yi∈Yi
  • yj∈Yj
  • yk∈Yk
  • yl∈Yl

exp(−EF(yi, yj)) exp(−EG(yj, yk)) exp(−EH(yk, yl)) =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk))

  • yl∈Yl

exp(−EH(yk, yl))

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-9
SLIDE 9

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj Yk Yl F G H

rH→Yk ∈ RYk Z =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk))

  • yl∈Yl

exp(−EH(yk, yl))

  • rH→Yk (yk)

=

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yi

exp(−EG(yj, yk))rH→Yk(yk)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-10
SLIDE 10

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj Yk F G H

rH→Yk ∈ RYk

Yl

Z =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk))

  • yl∈Yl

exp(−EH(yk, yl))

  • rH→Yk (yk)

=

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yi

exp(−EG(yj, yk))rH→Yk(yk)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-11
SLIDE 11

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj Yk F G H Yl

rG→Yj ∈ RYj Z =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk))rH→Yk(yk)

  • rG→Yj (yj)

=

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))rG→Yj(yj)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-12
SLIDE 12

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Chains

Yi Yj F G H Yl

rG→Yj ∈ RYj

Yk

Z =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk))rH→Yk(yk)

  • rG→Yj (yj)

=

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))rG→Yj(yj)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-13
SLIDE 13

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Trees

Yi Yj Yk Yl F G H I Ym Z =

  • y∈Y

exp(−E(y)) =

  • yi∈Yi
  • yj∈Yi
  • yk∈Yi
  • yl∈Yi
  • ym∈Ym

exp(−(EF(yi, yj) + · · · + EI(yk, ym)))

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-14
SLIDE 14

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Trees

Yi Yj Yk Yl F G H I Ym Z =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk)) ·          

yl∈Yl

exp(−EH(yk, yl))  

  • rH→Yk (yk)

·  

ym∈Ym

exp(−EI(yk, ym))  

  • rI→Yk (yk)

       

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-15
SLIDE 15

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Trees

Yi Yj Yk F G H I Ym rH→Yk(yk) rI→Yk(yk) Yl Z =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk)) · (rH→Yk(yk) · rI→Yk(yk))

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-16
SLIDE 16

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Trees

Yi Yj Yk F G H I Ym rH→Yk(yk) rI→Yk(yk) Yl qYk→G(yk) Z =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk)) · (rH→Yk(yk) · rI→Yk(yk))

  • qYk →G (yk)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-17
SLIDE 17

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Inference on Trees

Yi Yj F G H I Ym rH→Yk(yk) rI→Yk(yk) Yl qYk→G(yk) Yk Z =

  • yi∈Yi
  • yj∈Yj

exp(−EF(yi, yj))

  • yk∈Yk

exp(−EG(yj, yk))qYk→G(yk)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-18
SLIDE 18

Belief Propagation Variational Inference Sampling Break Belief Propagation

Factor Graph Sum-Product Algorithm

◮ “Message”: pair of vectors at each

factor graph edge (i, F) ∈ E

  • 1. rF→Yi ∈ RYi : factor-to-variable

message

  • 2. qYi →F ∈ RYi : variable-to-factor

message

◮ Algorithm iteratively update messages ◮ After convergence: Z and µF can be

  • btained from the messages

Yi

. . . . . . . . . . . .

F rF→Yi qYi→F

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-19
SLIDE 19

Belief Propagation Variational Inference Sampling Break Belief Propagation

Factor Graph Sum-Product Algorithm

◮ “Message”: pair of vectors at each

factor graph edge (i, F) ∈ E

  • 1. rF→Yi ∈ RYi : factor-to-variable

message

  • 2. qYi →F ∈ RYi : variable-to-factor

message

◮ Algorithm iteratively update messages ◮ After convergence: Z and µF can be

  • btained from the messages

Yi

. . . . . . . . . . . .

F rF→Yi qYi→F

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-20
SLIDE 20

Belief Propagation Variational Inference Sampling Break Belief Propagation

Sum-Product: Variable-to-Factor message

◮ Set of factors adjacent to

variable i M(i) = {F ∈ F : (i, F) ∈ E}

◮ Variable-to-factor message

qYi→F(yi) =

  • F ′∈M(i)\{F}

rF ′→Yi(yi)

Yi

qYi→F

F . . .

rA→Yi rB→Yi rF→Yi

A B

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-21
SLIDE 21

Belief Propagation Variational Inference Sampling Break Belief Propagation

Sum-Product: Factor-to-Variable message

rF→Yi

F . . .

qYj→F qYk→F qYi→F

Yi Yj Yk

◮ Factor-to-variable message

rF→Yi(yi) =

  • y ′

F ∈YF ,

y ′

i =yi

 exp (−EF(y ′

F))

  • j∈N(F)\{i}

qYj→F(y ′

j )

 

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-22
SLIDE 22

Belief Propagation Variational Inference Sampling Break Belief Propagation

Message Scheduling

qYi→F(yi) =

  • F ′∈M(i)\{F}

rF ′→Yi(yi) rF→Yi(yi) =

  • y ′

F ∈YF ,

y ′

i =yi

 exp (−EF(y ′

F))

  • j∈N(F)\{i}

qYj→F(y ′

j )

 

◮ Problem: message updates depend on each other ◮ No dependencies if product is empty (= 1) ◮ For tree-structured graphs we can resolve all dependencies

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-23
SLIDE 23

Belief Propagation Variational Inference Sampling Break Belief Propagation

Message Scheduling

qYi→F(yi) =

  • F ′∈M(i)\{F}

rF ′→Yi(yi) rF→Yi(yi) =

  • y ′

F ∈YF ,

y ′

i =yi

 exp (−EF(y ′

F))

  • j∈N(F)\{i}

qYj→F(y ′

j )

 

◮ Problem: message updates depend on each other ◮ No dependencies if product is empty (= 1) ◮ For tree-structured graphs we can resolve all dependencies

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-24
SLIDE 24

Belief Propagation Variational Inference Sampling Break Belief Propagation

Message Scheduling

qYi→F(yi) =

  • F ′∈M(i)\{F}

rF ′→Yi(yi) rF→Yi(yi) =

  • y ′

F ∈YF ,

y ′

i =yi

 exp (−EF(y ′

F))

  • j∈N(F)\{i}

qYj→F(y ′

j )

 

◮ Problem: message updates depend on each other ◮ No dependencies if product is empty (= 1) ◮ For tree-structured graphs we can resolve all dependencies

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-25
SLIDE 25

Belief Propagation Variational Inference Sampling Break Belief Propagation

Message Scheduling: Trees

F Yj Yi Yl Yk Ym B D E C A

1. 2. 3. 5. 4. 6. 8. 7. 9. 10.

F Yj Yi Yl Yk Ym B D E C A

10. 9. 8. 6. 7. 5. 3. 4. 2. 1.

  • 1. Select one variable node as tree root
  • 2. Compute leaf-to-root messages
  • 3. Compute root-to-leaf messages

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-26
SLIDE 26

Belief Propagation Variational Inference Sampling Break Belief Propagation

Inference Results: Z and marginals

rA→Yi

. . . . . . . . . . . . . . .

rB→Yi rC→Yi A B C

Yi

qYi→F

F Yi Yk Yj

qYk→F qYj→F

. . . . . . . . . . . .

◮ Partition function, evaluated at root

Z =

  • yr∈Yr
  • F∈M(r)

rF→Yr (yr)

◮ Marginal distributions, for each factor

µF(yF) = p(YF = yF) = 1 Z exp (−EF(yF))

  • i∈N(F)

qYi→F(yi)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-27
SLIDE 27

Belief Propagation Variational Inference Sampling Break Belief Propagation

Max-Product/Max-Sum Algorithm

◮ Belief Propagation for MAP inference

y ∗ = argmax

y∈Y

p(Y = y|x, w)

◮ Exact for trees ◮ For cyclic graphs: not as well understood as sum-product algorithm

qYi→F(yi) =

  • F ′∈M(i)\{F}

rF ′→Yi(yi) rF→Yi(yi) = max

y ′

F ∈YF ,

y ′

i =yi

 −EF(y ′

F) +

  • j∈N(F)\{i}

qYj→F(y ′

j )

 

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-28
SLIDE 28

Belief Propagation Variational Inference Sampling Break Belief Propagation

Sum-Product/Max-Sum comparison

Sum-Product qYi→F(yi) =

  • F ′∈M(i)\{F}

rF ′→Yi(yi) rF→Yi(yi) =

  • y ′

F ∈YF ,

y ′

i =yi

 exp (−EF(y ′

F))

  • j∈N(F)\{i}

qYj→F(y ′

j )

  Max-Sum qYi→F(yi) =

  • F ′∈M(i)\{F}

rF ′→Yi(yi) rF→Yi(yi) = max

y ′

F ∈YF ,

y ′

i =yi

 −EF(y ′

F) +

  • j∈N(F)\{i}

qYj→F(y ′

j )

 

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-29
SLIDE 29

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Pictorial Structures

. . .

Ytop Yhead Ytorso Yrarm Yrhnd Yrleg Yrfoot Ylfoot Ylleg Ylarm Ylhnd

X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

F (1) top F (2) top,head

◮ Tree-structured model for articulated pose (Felzenszwalb and

Huttenlocher, 2000), (Fischler and Elschlager, 1973)

◮ Body-part variables, states: discretized tuple (x, y, s, θ) ◮ (x, y) position, s scale, and θ rotation

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-30
SLIDE 30

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Pictorial Structures

. . .

Ytop Yhead Ytorso Yrarm Yrhnd Yrleg Yrfoot Ylfoot Ylleg Ylarm Ylhnd

X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

F (1) top F (2) top,head

◮ Tree-structured model for articulated pose (Felzenszwalb and

Huttenlocher, 2000), (Fischler and Elschlager, 1973)

◮ Body-part variables, states: discretized tuple (x, y, s, θ) ◮ (x, y) position, s scale, and θ rotation

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-31
SLIDE 31

Belief Propagation Variational Inference Sampling Break Belief Propagation

Example: Pictorial Structures (cont)

rF→Yi(yi) = max

(y ′

i ,y ′ j )∈Yi×Yj,

y ′

i =yi

 −EF(y ′

i , y ′ j ) +

  • j∈N(F)\{i}

qYj→F(y ′

j )

  (1)

◮ Because Yi is large (≈ 500k), Yi × Yj is too big ◮ (Felzenszwalb and Huttenlocher, 2000) use special form for

EF(yi, yj) so that (1) is computable in O(|Yi|)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-32
SLIDE 32

Belief Propagation Variational Inference Sampling Break Belief Propagation

Loopy Belief Propagation

◮ Key difference: no schedule that removes dependencies ◮ But: message computation is still well defined ◮ Therefore, classic loopy belief propagation (Pearl, 1988)

  • 1. Initialize message vectors to 1 (sum-product) or 0 (max-sum)
  • 2. Update messages, hoping for convergence
  • 3. Upon convergence, treat beliefs µF as approximate marginals

◮ Different messaging schedules (synchronous/asynchronous,

static/dynamic)

◮ Improvements: generalized BP (Yedidia et al., 2001), convergent BP

(Heskes, 2006)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-33
SLIDE 33

Belief Propagation Variational Inference Sampling Break Belief Propagation

Loopy Belief Propagation

◮ Key difference: no schedule that removes dependencies ◮ But: message computation is still well defined ◮ Therefore, classic loopy belief propagation (Pearl, 1988)

  • 1. Initialize message vectors to 1 (sum-product) or 0 (max-sum)
  • 2. Update messages, hoping for convergence
  • 3. Upon convergence, treat beliefs µF as approximate marginals

◮ Different messaging schedules (synchronous/asynchronous,

static/dynamic)

◮ Improvements: generalized BP (Yedidia et al., 2001), convergent BP

(Heskes, 2006)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-34
SLIDE 34

Belief Propagation Variational Inference Sampling Break Belief Propagation

Synchronous Iteration

Yi Yj Yk Yl Ym Yn Yo Yp Yq A B F G K L C D E H I J Yi Yj Yk Yl Ym Yn Yo Yp Yq A B F G K L C D E H I J

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-35
SLIDE 35

Belief Propagation Variational Inference Sampling Break Variational Inference

Mean field methods

◮ Mean field methods (Jordan et al., 1999), (Xing et al., 2003) ◮ Distribution p(y|x, w), inference intractable ◮ Approximate distribution q(y) ◮ Tractable family Q

q p Q

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-36
SLIDE 36

Belief Propagation Variational Inference Sampling Break Variational Inference

Mean field methods (cont)

q∗ = argmin

q∈Q

DKL(q(y)p(y|x, w))

q p Q

DKL(q(y)p(y|x, w)) =

  • y∈Y

q(y) log q(y) p(y|x, w) =

  • y∈Y

q(y) log q(y) −

  • y∈Y

q(y) log p(y|x, w) = −H(q) +

  • F∈F
  • yF ∈YF

µF,yF (q)EF(yF; xF, w) + log Z(x, w), where H(q) is the entropy of q and µF,yF are the marginals of q. (The form of µ depends on Q.)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-37
SLIDE 37

Belief Propagation Variational Inference Sampling Break Variational Inference

Mean field methods (cont)

q∗ = argmin

q∈Q

DKL(q(y)p(y|x, w))

q p Q

DKL(q(y)p(y|x, w)) =

  • y∈Y

q(y) log q(y) p(y|x, w) =

  • y∈Y

q(y) log q(y) −

  • y∈Y

q(y) log p(y|x, w) = −H(q) +

  • F∈F
  • yF ∈YF

µF,yF (q)EF(yF; xF, w) + log Z(x, w), where H(q) is the entropy of q and µF,yF are the marginals of q. (The form of µ depends on Q.)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-38
SLIDE 38

Belief Propagation Variational Inference Sampling Break Variational Inference

Mean field methods (cont)

q∗ = argmin

q∈Q

DKL(q(y)p(y|x, w))

◮ When Q is rich: q∗ is close to p ◮ Marginals of q∗ approximate marginals of p ◮ Gibbs inequality

DKL(q(y)p(y|x, w)) ≥ 0

◮ Therefore, we have a lower bound

log Z(x, w) ≥ H(q) −

  • F∈F
  • yF ∈YF

µF,yF (q)EF(yF; xF, w).

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-39
SLIDE 39

Belief Propagation Variational Inference Sampling Break Variational Inference

Mean field methods (cont)

q∗ = argmin

q∈Q

DKL(q(y)p(y|x, w))

◮ When Q is rich: q∗ is close to p ◮ Marginals of q∗ approximate marginals of p ◮ Gibbs inequality

DKL(q(y)p(y|x, w)) ≥ 0

◮ Therefore, we have a lower bound

log Z(x, w) ≥ H(q) −

  • F∈F
  • yF ∈YF

µF,yF (q)EF(yF; xF, w).

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-40
SLIDE 40

Belief Propagation Variational Inference Sampling Break Variational Inference

Naive Mean Field

◮ Set Q all distributions of the form

q(y) =

  • i∈V

qi(yi).

qe qf qg qj qi qh qk ql qm

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-41
SLIDE 41

Belief Propagation Variational Inference Sampling Break Variational Inference

Naive Mean Field

◮ Set Q all distributions of the form

q(y) =

  • i∈V

qi(yi).

◮ Marginals µF,yF take the form

µF,yF (q) =

  • i∈N(F)

qi(yi).

◮ Entropy H(q) decomposes

H(q) =

  • i∈V

Hi(qi) = −

  • i∈V
  • yi∈Yi

qi(yi) log qi(yi).

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-42
SLIDE 42

Belief Propagation Variational Inference Sampling Break Variational Inference

Naive Mean Field (cont)

Putting it together, argmin

q∈Q

DKL(q(y)p(y|x, w)) = argmax

q∈Q

H(q) −

  • F∈F
  • yF ∈YF

µF,yF (q)EF(yF; xF, w) − log Z(x, w) = argmax

q∈Q

H(q) −

  • F∈F
  • yF ∈YF

µF,yF (q)EF(yF; xF, w) = argmax

q∈Q

  • i∈V
  • yi∈Yi

qi(yi) log qi(yi) −

  • F∈F
  • yF ∈YF

i∈N(F)

qi(yi)

  • EF(yF; xF, w)
  • .

Optimizing over Q is optimizing over qi ∈ ∆i, the probability simplices.

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-43
SLIDE 43

Belief Propagation Variational Inference Sampling Break Variational Inference

Naive Mean Field (cont)

argmax

q∈Q

  • i∈V
  • yi∈Yi

qi(yi) log qi(yi) −

  • F∈F
  • yF ∈YF

i∈N(F)

qi(yi)

  • EF(yF; xF, w)
  • .

◮ Non-concave maximization problem →

  • hard. (For general EF and pairwise or

higher-order factors.)

◮ Block coordinate ascent: closed-form

update for each qi

ˆ qf ˆ qj qi ˆ qh ˆ ql

F µF

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-44
SLIDE 44

Belief Propagation Variational Inference Sampling Break Variational Inference

Naive Mean Field (cont)

Closed form update for qi: q∗

i (yi)

= exp    1 −

  • F∈F,

i∈N(F)

  • yF ∈YF ,

[yF ]i=yi

  • j∈N(F)\{i}

ˆ qj(yj)

  • EF(yF; xF, w) + λ

    λ = − log    

  • yi∈Yi

exp

  • 1 −
  • F∈F,

i∈N(F)

  • yF ∈YF ,

[yF ]i=yi j∈N(F)\{i}

ˆ qj(yj)

  • EF(yF; xF, w)

  

◮ Look scary, but very easy to implement

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-45
SLIDE 45

Belief Propagation Variational Inference Sampling Break Variational Inference

Structured Mean Field

◮ Naive mean field approximation can be poor ◮ Structured mean field (Saul and Jordan, 1995) uses factorial

approximations with larger tractable subgraphs

◮ Block coordinate ascent: optimizing an entire subgraph using exact

probabilistic inference on trees

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-46
SLIDE 46

Belief Propagation Variational Inference Sampling Break Sampling

Sampling

Probabilistic inference and related tasks require the computation of Ey∼p(y|x,w)[h(x, y)], where h : X × Y → R is an arbitary but well-behaved function.

◮ Inference: hF,zF (x, y) = I(yF = zF),

Ey∼p(y|x,w)[hF,zF (x, y)] = p(yF = zF|x, w), the marginal probability of factor F taking state zF.

◮ Parameter estimation: feature map h(x, y) = φ(x, y),

φ : X × Y → Rd, Ey∼p(y|x,w)[φ(x, y)], “expected sufficient statistics under the model distribution”.

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-47
SLIDE 47

Belief Propagation Variational Inference Sampling Break Sampling

Sampling

Probabilistic inference and related tasks require the computation of Ey∼p(y|x,w)[h(x, y)], where h : X × Y → R is an arbitary but well-behaved function.

◮ Inference: hF,zF (x, y) = I(yF = zF),

Ey∼p(y|x,w)[hF,zF (x, y)] = p(yF = zF|x, w), the marginal probability of factor F taking state zF.

◮ Parameter estimation: feature map h(x, y) = φ(x, y),

φ : X × Y → Rd, Ey∼p(y|x,w)[φ(x, y)], “expected sufficient statistics under the model distribution”.

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-48
SLIDE 48

Belief Propagation Variational Inference Sampling Break Sampling

Sampling

Probabilistic inference and related tasks require the computation of Ey∼p(y|x,w)[h(x, y)], where h : X × Y → R is an arbitary but well-behaved function.

◮ Inference: hF,zF (x, y) = I(yF = zF),

Ey∼p(y|x,w)[hF,zF (x, y)] = p(yF = zF|x, w), the marginal probability of factor F taking state zF.

◮ Parameter estimation: feature map h(x, y) = φ(x, y),

φ : X × Y → Rd, Ey∼p(y|x,w)[φ(x, y)], “expected sufficient statistics under the model distribution”.

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-49
SLIDE 49

Belief Propagation Variational Inference Sampling Break Sampling

Monte Carlo

◮ Sample approximation from y (1), y (2), . . .

Ey∼p(y|x,w)[h(x, y)] ≈ 1 S

S

  • s=1

h(x, y (s)).

◮ When the expectation exists, then the law of large numbers

guarantees convergence for S → ∞,

◮ For S independent samples, approximation error is O(1/

√ S), independent of the dimension d. Problem

◮ Producing exact samples y (s) from p(y|x) is hard

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-50
SLIDE 50

Belief Propagation Variational Inference Sampling Break Sampling

Monte Carlo

◮ Sample approximation from y (1), y (2), . . .

Ey∼p(y|x,w)[h(x, y)] ≈ 1 S

S

  • s=1

h(x, y (s)).

◮ When the expectation exists, then the law of large numbers

guarantees convergence for S → ∞,

◮ For S independent samples, approximation error is O(1/

√ S), independent of the dimension d. Problem

◮ Producing exact samples y (s) from p(y|x) is hard

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-51
SLIDE 51

Belief Propagation Variational Inference Sampling Break Sampling

Markov Chain Monte Carlo (MCMC)

◮ Markov chain with p(y|x) as

stationary distribution

◮ Here: Y = {A, B, C} ◮ Here: p(y) = (0.1905, 0.3571, 0.4524)

C B A

0.25 0.5 0.4 0.1 0.5 0.5 0.5 0.25

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-52
SLIDE 52

Belief Propagation Variational Inference Sampling Break Sampling

Markov Chains

Definition (Finite Markov chain)

Given a finite set Y and a matrix P ∈ RY×Y, then a random process (Z1, Z2, . . . ) with Zt taking values from Y is a Markov chain with transition matrix P, if p(Zt+1 = y (j)|Z1 = y (1), Z2 = y (2), . . . , Zt = y (t)) = p(Zt+1 = y (j)|Zt = y (t)) = Py (t),y (j)

C B A

0.25 0.5 0.4 0.1 0.5 0.5 0.5 0.25

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-53
SLIDE 53

Belief Propagation Variational Inference Sampling Break Sampling

MCMC Simulation (1)

Steps

  • 1. Construct a Markov chain with stationary distribution p(y|x, w)
  • 2. Start at y (0)
  • 3. Perform random walk according to Markov chain
  • 4. After sufficient number S of steps, stop and treat y (S) as sample

from p(y|x, w)

◮ Justified by ergodic theorem. ◮ In practise: discard a fixed number of initial samples (“burn-in

phase”) to forget starting point

◮ In practise: afterwards, use between 100 to 100, 000 samples

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-54
SLIDE 54

Belief Propagation Variational Inference Sampling Break Sampling

MCMC Simulation (1)

Steps

  • 1. Construct a Markov chain with stationary distribution p(y|x, w)
  • 2. Start at y (0)
  • 3. Perform random walk according to Markov chain
  • 4. After sufficient number S of steps, stop and treat y (S) as sample

from p(y|x, w)

◮ Justified by ergodic theorem. ◮ In practise: discard a fixed number of initial samples (“burn-in

phase”) to forget starting point

◮ In practise: afterwards, use between 100 to 100, 000 samples

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-55
SLIDE 55

Belief Propagation Variational Inference Sampling Break Sampling

MCMC Simulation (1)

Steps

  • 1. Construct a Markov chain with stationary distribution p(y|x, w)
  • 2. Start at y (0)
  • 3. Perform random walk according to Markov chain
  • 4. After each step, output y (i) as sample from p(y|x, w)

◮ Justified by ergodic theorem. ◮ In practise: discard a fixed number of initial samples (“burn-in

phase”) to forget starting point

◮ In practise: afterwards, use between 100 to 100, 000 samples

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-56
SLIDE 56

Belief Propagation Variational Inference Sampling Break Sampling

MCMC Simulation (1)

Steps

  • 1. Construct a Markov chain with stationary distribution p(y|x, w)
  • 2. Start at y (0)
  • 3. Perform random walk according to Markov chain
  • 4. After each step, output y (i) as sample from p(y|x, w)

◮ Justified by ergodic theorem. ◮ In practise: discard a fixed number of initial samples (“burn-in

phase”) to forget starting point

◮ In practise: afterwards, use between 100 to 100, 000 samples

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-57
SLIDE 57

Belief Propagation Variational Inference Sampling Break Sampling

Gibbs sampler

How to construct a suitable Markov chain?

◮ Metropolis-Hastings chain, almost always possible ◮ Special case: Gibbs sampler (Geman and Geman, 1984)

  • 1. Select a variable yi,
  • 2. Sample yi ∼ p(yi|yV \{i}, x).

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-58
SLIDE 58

Belief Propagation Variational Inference Sampling Break Sampling

Gibbs sampler

How to construct a suitable Markov chain?

◮ Metropolis-Hastings chain, almost always possible ◮ Special case: Gibbs sampler (Geman and Geman, 1984)

  • 1. Select a variable yi,
  • 2. Sample yi ∼ p(yi|yV \{i}, x).

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-59
SLIDE 59

Belief Propagation Variational Inference Sampling Break Sampling

Gibbs Sampler

p(yi|y (t)

V \{i}, x, w) =

p(yi, y (t)

V \{i}|x, w)

  • yi∈Yi p(yi, y (t)

V \{i}|x, w)

= ˜ p(yi, y (t)

V \{i}|x, w)

  • yi∈Yi ˜

p(yi, y (t)

V \{i}|x, w)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-60
SLIDE 60

Belief Propagation Variational Inference Sampling Break Sampling

Gibbs Sampler

p(yi|y (t)

V \{i}, x, w) =

  • F∈M(i) exp(−EF(yi, y (t)

F\{i}, xF, w))

  • yi∈Yi
  • F∈M(i) exp(−EF(yi, y (t)

F\{i}, xF, w))

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-61
SLIDE 61

Belief Propagation Variational Inference Sampling Break Sampling

Example: Gibbs sampler

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-62
SLIDE 62

Belief Propagation Variational Inference Sampling Break Sampling

Example: Gibbs sampler

500 1000 1500 2000 2500 3000 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Cummulative mean

p(foreground) Gibbs sweep

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-63
SLIDE 63

Belief Propagation Variational Inference Sampling Break Sampling

Example: Gibbs sampler

500 1000 1500 2000 2500 3000 0.2 0.4 0.6 0.8 1

Gibbs sample means and average of 50 repetitions

Estimated probability Gibbs sweep

◮ p(yi = “foreground′′) ≈ 0.770 ± 0.011

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-64
SLIDE 64

Belief Propagation Variational Inference Sampling Break Sampling

Example: Gibbs sampler

500 1000 1500 2000 2500 3000 0.05 0.1 0.15 0.2 0.25

Estimated one unit standard deviation

Standard deviation Gibbs sweep

◮ Standard deviation O(1/

√ S)

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-65
SLIDE 65

Belief Propagation Variational Inference Sampling Break Sampling

Gibbs Sampler Transitions

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-66
SLIDE 66

Belief Propagation Variational Inference Sampling Break Sampling

Gibbs Sampler Transitions

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-67
SLIDE 67

Belief Propagation Variational Inference Sampling Break Sampling

Block Gibbs Sampler

Extension to larger groups of variables

  • 1. Select a block yI
  • 2. Sample yI ∼ p(yI|yV \I, x)

→ Tractable if sampling from blocks is tractable.

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-68
SLIDE 68

Belief Propagation Variational Inference Sampling Break Sampling

Summary: Sampling

Two families of Monte Carlo methods

  • 1. Markov Chain Monte Carlo (MCMC)
  • 2. Importance Sampling

Properties

◮ Often simple to implement, general, parallelizable ◮ (Cannot compute partition function Z) ◮ Can fail without any warning signs

References

◮ (H¨

aggstr¨

  • m, 2000), introduction to Markov chains

◮ (Liu, 2001), excellent Monte Carlo introduction

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models

slide-69
SLIDE 69

Belief Propagation Variational Inference Sampling Break Break

Coffee Break Coffee Break

Continuing at 10:30am Slides available at http://www.nowozin.net/sebastian/ cvpr2011tutorial/

Sebastian Nowozin and Christoph H. Lampert Part 3: Probabilistic Inference in Graphical Models