Mean field approximation methods and information geometry Shiro - - PowerPoint PPT Presentation

mean field approximation methods and information geometry
SMART_READER_LITE
LIVE PREVIEW

Mean field approximation methods and information geometry Shiro - - PowerPoint PPT Presentation

Mean field approximation methods and information geometry Shiro Ikeda ISM, Tokyo, Japan 31 August 2009 Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 1 / 38 Outline Belief Propagation 1 Information Geometrical View 2 Survey


slide-1
SLIDE 1

Mean field approximation methods and information geometry

Shiro Ikeda

ISM, Tokyo, Japan

31 August 2009

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 1 / 38

slide-2
SLIDE 2

Outline

1

Belief Propagation

2

Information Geometrical View

3

Survey Propagation (SP)

4

Conclusion

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 2 / 38

slide-3
SLIDE 3

Belief Propagation Graphical model and inference

Graphical Model

Example

a b c d xi xj xk xl

Stochastic Variable x = (xi, xj, xk, xl)T Clique r ∈ L = {a, b, c, d}

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 3 / 38

slide-4
SLIDE 4

Belief Propagation Graphical model and inference

Graphical model

Joint distribution

q(x) = 1 Z

  • r∈L

ψr(xr) = exp

  • r∈L

cr(xr) − ϕq

  • .

ψr(xr) > 0, Z =

  • x
  • r∈L

ψr(xr)

  • cr(xr) = log ψr(xr),

ϕq = log Z xr = {xi|i ∈ V (r)}, V (r) : member of clique r.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 4 / 38

slide-5
SLIDE 5

Belief Propagation Graphical model and inference

Belief Propagation

Message update

For each (r, i), update messages νri(xi), µri(xi).

1

initialize t = 1, νri(xi)t = 1/2 µri(xi)t = 1/2

2

update messages as follows νt+1

ri (xi) ∝

  • xr\xi

ψr(xr)

  • j∈V (r)\i

µt

rj(xj)

µt+1

ri (xi) ∝

  • s∈N (i)\r

νt+1

si (xi),

3

belief (marginal distribution) is bt+1

i

(xi) ∝

  • r∈N (i)

νt+1

ri (xi)

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 5 / 38

slide-6
SLIDE 6

Information Geometrical View

Information Geometrical View

Our results

Information geometry: Applied differential geometry to statistical/stochastic models. Amari, (1985). Springer-Verlag. Murray & Rice, (1993). Chipman & Hall. Amari & Nagaoka, (2000). AMS and Oxford University Press.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 6 / 38

slide-7
SLIDE 7

Information Geometrical View Information Geometry

Information Geometry

S

r(x) S: x Space of probability distribu- tions. Each point r(x)∈S is a probability distribution

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 7 / 38

slide-8
SLIDE 8

Information Geometrical View Information Geometry

Information Geometry

S

r(x) M = {p(x; θ)} M = {p(x; θ)}: distributions parametrized by θ.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 8 / 38

slide-9
SLIDE 9

Information Geometrical View Information Geometry

Information Geometry

S

r(x) M = {p(x; θ)}

∆ ∆T I()∆

M = {p(x; θ)}: Model manifold

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 9 / 38

slide-10
SLIDE 10

Information Geometrical View Information Geometry

Information Geometry

S

r(x)

ˆ θ m–projection

M = {p(x; θ)} ˆ θ = argmin

θ∈Θ

KL(r(x); p(x; θ)) Maximum Likelihood: m-projection to the model If the model is an exponential family, projection is unique.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 10 / 38

slide-11
SLIDE 11

Information Geometrical View Information Geometry

Information Geometry

S

r(x)

ˆ θ m–projection

M = {p(x; θ)} If p(x; θ) is an exponential family, and its sufficient statistics is t(x), m- projection gives Er(x)[t(x)] = Ep(x;θ)[t(x)]

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 11 / 38

slide-12
SLIDE 12

Information Geometrical View Information Geometry

Information Geometry

S

r(x)

ˆ θmf e–projection

M = {p(x; θ)} ˆ θmf = argmin

θ∈Θ

KL(p(x; θ); r(x)) Naive Mean Field approximation: e- projection For an exponential family, there are a lot of local minima.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 12 / 38

slide-13
SLIDE 13

Information Geometrical View Information Geometry

Information Geometry

S

r(x)

ˆ θmf e–projection

M = {p(x; θ)} In Statistical Physics, multiple lo- cal minima are important. They consider the multiple solutions corre- sponds to landscape of energy func- tion and “phase transitions.”

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 13 / 38

slide-14
SLIDE 14

Information Geometrical View Belief Propagation and Information Geometry

Information Geometrical View

Our results

Discuss the accuracy, convergence of LBP with information geometry Ikeda, Tanaka, & Amari, (2004). IEEE tr. on IT, 50(6), 1097-1114. Ikeda, Tanaka, & Amari, (2004). Neural Comput., 50(6), 1779-1810.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 14 / 38

slide-15
SLIDE 15

Information Geometrical View Belief Propagation and Information Geometry

Belief Propagation

Joint distribution a b c d xi xj xk xl

q(x)

S q(x) = exp

  • r

cr(x) − ψq

  • Ikeda (ISM)

MF approx and Info Geom 31/Aug/2009 15 / 38

slide-16
SLIDE 16

Information Geometrical View Belief Propagation and Information Geometry

Belief Propagation

Single link models a b c d xi xj xk xl

Mr

q(x)

S Mr =

  • pr(x; ζr) = exp(cr(x) + ζr · x − ϕq(ζr))
  • , r = 1, · · · , L.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 16 / 38

slide-17
SLIDE 17

Information Geometrical View Belief Propagation and Information Geometry

Belief Propagation

Marginals a b c d xi xj xk xl

Mr M0

q(x)

S M0 =

  • p0(x; θ) = exp(θ · x − ϕ0(θ))
  • Ikeda (ISM)

MF approx and Info Geom 31/Aug/2009 17 / 38

slide-18
SLIDE 18

Information Geometrical View Belief Propagation and Information Geometry

Belief Propagation

Convergence

M(θ) Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

M(θ) = {Product of marginalsp0(x; θ)} Condition 1 p0(x; θ), pr(x; ζr) ∈ M(θ),

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 18 / 38

slide-19
SLIDE 19

Information Geometrical View Belief Propagation and Information Geometry

Belief Propagation

At convergent points

M(θ) Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

If M(θ) includes q(x) p0(x; θ) is the true marginals.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 19 / 38

slide-20
SLIDE 20

Information Geometrical View Belief Propagation and Information Geometry

Belief Propagation

Convergence

E Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

E = {log-linear mixtureofp0, pr} =

  • 1

ZE(t)p0(x; θ)t0

r pr(x; ζr)tr

  • tr=1
  • Condition 2

q(x), p0(x; θ), pr(x; ζr) ∈ E

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 20 / 38

slide-21
SLIDE 21

Information Geometrical View Belief Propagation and Information Geometry

Belief Propagation

Convergence

M(θ) Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

E Theorem When p0(x; θ), and pr(x; ζr) r = 1, · · · , L satisfies Condition 1 and Condition 2

It is the convergent point of BP.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 21 / 38

slide-22
SLIDE 22

Information Geometrical View Improving BP

Approximate accuracy

Perturbation analysis

Difference between E and M(θ) → Accuracy xq: expectation of x w.r.t. q(x). xBP : convergent point of BP. xq ≃ xBP + 1 2

  • r=s

BrsxBP + 1 6

  • rst

Brst −

  • r

Brrr

  • xBP .

BrsxBP : order 4 loop, embedded m–curvature of E BrstxBP : order 6 loop, torsion of E.

a b c d xi xj xk xl a b c d xi xj xk xl

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 22 / 38

slide-23
SLIDE 23

Information Geometrical View Improving BP

Convergence

e–constraint algorithm

E Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

E

Condition 2 is always satisfied, update parameters to satisfy Condition 1

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 23 / 38

slide-24
SLIDE 24

Information Geometrical View Improving BP

Convergence

e–constraint algorithm

E Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

M(θ) Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

E BP, TRP (Wainwright, et al. NIPS*14)

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 24 / 38

slide-25
SLIDE 25

Information Geometrical View Improving BP

Convergence

m–constraint algorithm

M(θ) Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

M(θ) Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

Condition 1 is always satisfied, update parameters to satisfy Condition 2

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 25 / 38

slide-26
SLIDE 26

Information Geometrical View Improving BP

Convergence

m–constraint algorithm

M(θ) Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

M(θ) Mr M0

p0(x; θ) pr(x; ζr) q(x)

S

E

CCCP (Yuille & Rangarajan, NIPS*15)

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 26 / 38

slide-27
SLIDE 27

Survey Propagation (SP) SP

Survey Propagation

Background

M´ ezard, Parisi, & Zecchina, (2002). Science, 297, 812–815. Method to analyze K-sat problems. 3-sat problem (x1 ∨ ¯ x2 ∨ x3) ∧ (¯ x1 ∨ ¯ x4 ∨ x5) ∧ (x4 ∨ x2 ∨ x3) 3-sat is NP complete. For above example, (x1, x2, x3, x4, x5) = (+, +, ∗, ∗, +) is a solution. Also (∗, ∗, +, −, ∗) is.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 27 / 38

slide-28
SLIDE 28

Survey Propagation (SP) SP

Survey Propagation

Notation

ψ1(x1) = 1 − 1 8(1 − x1)(1 + x2)(1 − x3) ψ1(x1) is 1 if (x1 ∨ ¯ x2 ∨ x3) is True, otherwise 0, and V = (x1 ∨ ¯ x2 ∨ x3) ∧ (¯ x1 ∨ ¯ x4 ∨ x5) ∧ (x4 ∨ x2 ∨ x3) =

  • r

ψr(xr) ψr(xr) = 1 − 1 8

  • i∈V (r)

(1 + Jrixi), Jri ∈ {−1, +1} if V = 1 is 1, it is SAT.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 28 / 38

slide-29
SLIDE 29

Survey Propagation (SP) SP

Probability Distribution

Model

q(x) = exp

  • β
  • r

ψr(xr) − ϕq(β)

  • This distribution describes the SAT problem as β → ∞.

Finite temperature

β → ∞ is survey propagation. Maneva, Mossel, & Wainwright, (2007). JACM, 54(4), No.17. For finite temperature, the relation to BP is not clear. Kabashima, (2005). JPSJ, 74(8), 2133–2136.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 29 / 38

slide-30
SLIDE 30

Survey Propagation (SP) SP

SP is based on BP

BP

Important distribution pr(xr; ζr) = exp(βψr(xr) + ζr · xr − ϕr(ζr)) p0(xi; θ) = exp(θixi − ϕ0(θi)) = exp(θixi) 2 cosh(θixi). At the convergent point, p0(xi; θ∗

i ) =

  • xr\xi

pr(xr; ζ∗

r )

q(x) ∝ p0(x; θ∗)

  • r

pr(xr; ζ∗

r)

p0(xr; θ∗) = exp

  • β
  • r

ψ(xr) + FB(θ∗, {ζ∗

r })

  • FB(θ∗, {ζ∗

r }) = r(ϕ0(θ∗) − ϕr(ζ∗ r)) − ϕ0(θ∗): Bethe free energy

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 30 / 38

slide-31
SLIDE 31

Survey Propagation (SP) SP

Difference between SP and BP

BP

There are many convergent points of BP SP utilizes all the BP convergent points Let a be the index of and the convergent points be indexed as θa, {ζa

r }.

For each a = 1, · · · , A the following relation holds. p0(xi; θa

i ) =

  • xr\xi

pr(xr; ζa

r )

q(x) ∝ p0(x; θa)

  • r

pr(xr; ζa

r )

p0(xr; θa) = exp

  • β
  • r

ψ(xr) + FB(θa, {ζa

r })

  • FB(θa, {ζa

r }) = r(ϕ0(θa) − ϕr(ζa r )) − ϕ0(θa): Bethe free energy

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 31 / 38

slide-32
SLIDE 32

Survey Propagation (SP) SP

Idea of SP

Approximation with a mixture

Approximate q(x) as follows

  • a

π(a)p0(x; θa) 1-RSB ? How to determine π(a) ? SP do not use π(a) but distribution of ξa

r, ζa r .

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 32 / 38

slide-33
SLIDE 33

Survey Propagation (SP) SP

Update rule of SP

From Kabashima’s result

If every ξa

ri and ζa ri is different from each other depending on a, finite

temperature SP is defined as follows.

1

Increase t by 1 and update π(ξa

ri) as follows,

απ(ξa

ri)

exp(ξa

rixi)

  • 2 cosh(ξa

ri)

y =

xr\xi

eβψr(sr)y

  • j∈V (r)\i

π(ζa

rj) exp(ζa rjxj)

  • 2 cosh(ζa

rj)

y .

2

Update and π(ζa

ri) as follows

βπ(ζa

rj)

exp(ζa

rixi)

  • 2 cosh(ζa

ri)

y =

  • u∈N (i)\r

π(ξa

ui) exp(ξa uixi)

  • 2 cosh(ξa

ui)

y .

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 33 / 38

slide-34
SLIDE 34

Survey Propagation (SP) Finite temperature SP

Update rule of SP

α π(ξa

ri)

  • 2 cosh(ξa

ri)

y = ey(ϕr(ζa

r )−ϕ0(θa i ))

  • j∈V (r)\i

π(ζa

rj)

  • 2 cosh(ζa

rj)

y β π(ζa

rj)

  • 2 cosh(ζa

ri)

y =

  • u∈N (i)\r

π(ξa

ui)

  • 2 cosh(ξa

ui)

y ,

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 34 / 38

slide-35
SLIDE 35

Survey Propagation (SP) Finite temperature SP

Update rule of SP

If every ξa

ri and ζa ri is different for each a π(a) = π(ξa ri) = π(ζa ri) holds.

1 απ(a)|V (r)|−2 =

  • eϕ0(θa

i )−ϕr(ζa r )

  • j∈V (r) 2 cosh(ζa

rj)

2 cosh(ξa

ri)2 cosh(ζa ri)

y 1 β π(a)|N (i)|−2 =

u∈N (i) 2 cosh(ξa ui)

2 cosh(ζa

ri)2 cosh(ξa ri)

y ,

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 35 / 38

slide-36
SLIDE 36

Survey Propagation (SP) Finite temperature SP

Update rule of SP

Let E the number of edges and

  • r

|V (r)| = E,

  • i

|N(i)| = E. When 2 cosh(ζa

ri)2 cosh(ξa ri) ≃ 4 cosh(θa i ) holds. (since θa i = ξa ri + ζa ri

holds, this means one of ξa

ri and ζa ri is 0 or very large)

α′π(a)E−2L ≃ 1 2L exp

  • −y
  • r

ϕr(ζa

r )

  • r
  • j∈V (r)
  • 2 cosh(ζa

rj)

y (1) β′π(a)E−2N ≃ 1 2N exp(−yϕ0(θa

i ))

  • i
  • r∈N (i)
  • 2 cosh(ξa

ri)

y, (2)

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 36 / 38

slide-37
SLIDE 37

Survey Propagation (SP) Finite temperature SP

Update rule of SP

Cπ(a)2(E−N−L) ≃ exp

  • y
  • r

(ϕ0(θa

r) − ϕr(ζa r )) − ϕ0(θa)

  • (3)

= exp(yFB(θa, {ζa

r })),

(4)

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 37 / 38

slide-38
SLIDE 38

Conclusion

Conclusion

Information geometrical view of BP

Approximation accuracy: perturbation analysis. Convergence: e–m–constraint algorithms.

New understanding of SP.

Ikeda (ISM) MF approx and Info Geom 31/Aug/2009 38 / 38