Fast Convergence of Belief Propagation to Global Optima: Beyond - - PowerPoint PPT Presentation

fast convergence of belief propagation to global optima
SMART_READER_LITE
LIVE PREVIEW

Fast Convergence of Belief Propagation to Global Optima: Beyond - - PowerPoint PPT Presentation

Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay Frederic Koehler Massachusetts Institute of Technology NeurIPS 2019 Frederic Koehler Fast Convergence of BP to Global Optima 1 / 7 Graphical models Ising


slide-1
SLIDE 1

Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay

Frederic Koehler

Massachusetts Institute of Technology

NeurIPS 2019

Frederic Koehler Fast Convergence of BP to Global Optima 1 / 7

slide-2
SLIDE 2

Graphical models

Ising model: For x ∈ {±1}n, Pr(X = x) = 1 Z exp 1 2xTJx + htx

  • Natural model of correlated random variables. Some examples: Hopfield

networks, Restricted Boltzmann Machine (RBM) = bipartite Ising model. Xa Xb Xc Xf Xe XWI XMI XMN XOH XIA Popular model in ML, natural and social sciences, etc.

Frederic Koehler Fast Convergence of BP to Global Optima 2 / 7

slide-3
SLIDE 3

Inference

Inference: Given J, h compute properties of the model. E.g. E[Xi] or E[Xi|Xj = xj]. XWI XMI XMN XOH XIA

E[XWI | XOH = +1] = ?

Problem: inference in Ising models (e.g. approximating E[Xi]) is NP-hard! Natural markov chain approaches (e.g. Gibbs sampling) may mix very slowly.

Frederic Koehler Fast Convergence of BP to Global Optima 3 / 7

slide-4
SLIDE 4

Variational Inference

Variational objectives (Mean-Field/VI, Bethe): ΦMF(x) = 1 2xTJx + hTx +

  • i

H

  • Ber

1 + xi 2

  • ΦBethe(P) = EP[1

2X TJX + hTX] +

  • E

HP(XE) −

  • i

(deg(i) − 1)HP(Xi) Message-passing algorithms (MF/VI, BP): x(t+1) = tanh⊗n(Jx(t) + h) ν(t+1)

i→j

= tanh  hi +

  • k∈∂i\j

tanh−1(tanh(Jik)ν(t)

k→i)

  Non-convex objective — when do these algorithms find global optima?

Frederic Koehler Fast Convergence of BP to Global Optima 4 / 7

slide-5
SLIDE 5

Our Assumption

We suppose, following Dembo-Montanari ’10, that the model is ferromagnetic: Jij ≥ 0, hi ≥ 0 for all i, j I.e. neighbors want to align. This assumption is necessary: if we don’t have it, computing the

  • ptimal mean-field approximation, even approximately, is NP hard.

Objective typically has sub-optimal critical points. (cf. correlation decay)

Frederic Koehler Fast Convergence of BP to Global Optima 5 / 7

slide-6
SLIDE 6

Our Theorems

Fix a ferromagnetic Ising model (J, h) with m edges and n nodes.

Theorem (Mean-Field Convergence)

Let x∗ be a global maximizer of ΦMF. Initializing with x(0) = 1 and defining x(1), x(2), . . . by iterating the mean-field equations, for every t ≥ 1: 0 ≤ ΦMF(x∗) − ΦMF(x(t)) ≤ min

  • J1 + h1

t , 2 J1 + h1 t 4/3

Theorem (BP Convergence)

Let P∗ be a global maximizer of ΦBethe. Initializing ν(0)

i→j = 1 for all i ∼ j

and defining ν(1), ν(2), . . . by BP iteration, 0 ≤ ΦBethe(P∗) − Φ∗

Bethe(ν(t)) ≤

  • 8mn(1 + J∞)

t .

Frederic Koehler Fast Convergence of BP to Global Optima 6 / 7

slide-7
SLIDE 7

For More

The poster: Poster 174, Wednesday 10:45-12:45 The paper: https://arxiv.org/abs/1905.09992

Frederic Koehler Fast Convergence of BP to Global Optima 7 / 7