fast convergence of belief propagation to global optima
play

Fast Convergence of Belief Propagation to Global Optima: Beyond - PowerPoint PPT Presentation

Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay Frederic Koehler Massachusetts Institute of Technology NeurIPS 2019 Frederic Koehler Fast Convergence of BP to Global Optima 1 / 7 Graphical models Ising


  1. Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay Frederic Koehler Massachusetts Institute of Technology NeurIPS 2019 Frederic Koehler Fast Convergence of BP to Global Optima 1 / 7

  2. Graphical models Ising model: For x ∈ {± 1 } n , Pr( X = x ) = 1 � 1 � 2 x T Jx + h t x Z exp Natural model of correlated random variables. Some examples: Hopfield networks, Restricted Boltzmann Machine (RBM) = bipartite Ising model. X b X IA X e X OH X c X WI X f X MI X MN X a Popular model in ML, natural and social sciences, etc. Frederic Koehler Fast Convergence of BP to Global Optima 2 / 7

  3. Inference Inference: Given J , h compute properties of the model. E.g. E [ X i ] or E [ X i | X j = x j ] . X IA X OH E [ X WI | X OH = + 1 ] = ? X WI X MI X MN Problem: inference in Ising models (e.g. approximating E [ X i ] ) is NP-hard! Natural markov chain approaches (e.g. Gibbs sampling) may mix very slowly. Frederic Koehler Fast Convergence of BP to Global Optima 3 / 7

  4. Variational Inference Variational objectives (Mean-Field/VI, Bethe): Φ MF ( x ) = 1 � � 1 + x i �� � 2 x T Jx + h T x + H Ber 2 i Φ Bethe ( P ) = E P [ 1 � � 2 X T JX + h T X ] + H P ( X E ) − ( deg ( i ) − 1 ) H P ( X i ) E i Message-passing algorithms (MF/VI, BP): x ( t + 1 ) = tanh ⊗ n ( Jx ( t ) + h )   ν ( t + 1 ) tanh − 1 (tanh( J ik ) ν ( t ) � = tanh  h i + k → i )  i → j k ∈ ∂ i \ j Non-convex objective — when do these algorithms find global optima? Frederic Koehler Fast Convergence of BP to Global Optima 4 / 7

  5. Our Assumption We suppose, following Dembo-Montanari ’10, that the model is ferromagnetic : J ij ≥ 0 , h i ≥ 0 for all i , j I.e. neighbors want to align. This assumption is necessary : if we don’t have it, computing the optimal mean-field approximation, even approximately, is NP hard. Objective typically has sub-optimal critical points. (cf. correlation decay) Frederic Koehler Fast Convergence of BP to Global Optima 5 / 7

  6. Our Theorems Fix a ferromagnetic Ising model ( J , h ) with m edges and n nodes. Theorem (Mean-Field Convergence) Let x ∗ be a global maximizer of Φ MF . Initializing with x ( 0 ) = � 1 and defining x ( 1 ) , x ( 2 ) , . . . by iterating the mean-field equations, for every t ≥ 1 : � � 4 / 3 � � � J � 1 + � h � 1 � J � 1 + � h � 1 0 ≤ Φ MF ( x ∗ ) − Φ MF ( x ( t ) ) ≤ min , 2 t t Theorem (BP Convergence) Let P ∗ be a global maximizer of Φ Bethe . Initializing ν ( 0 ) i → j = 1 for all i ∼ j and defining ν ( 1 ) , ν ( 2 ) , . . . by BP iteration, � 8 mn ( 1 + � J � ∞ ) 0 ≤ Φ Bethe ( P ∗ ) − Φ ∗ Bethe ( ν ( t ) ) ≤ . t Frederic Koehler Fast Convergence of BP to Global Optima 6 / 7

  7. For More The poster: Poster 174, Wednesday 10:45-12:45 The paper: https://arxiv.org/abs/1905.09992 Frederic Koehler Fast Convergence of BP to Global Optima 7 / 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend