BEYOND MEAN-FIELD APPROXIMATION AURLIEN DECELLE LABORATOIRE DE - PowerPoint PPT Presentation

BEYOND MEAN-FIELD APPROXIMATION AURÉLIEN DECELLE LABORATOIRE DE RECHERCHE EN INFORMATIQUE UNIVERSITÉ PARIS SUD

MOTIVATIONS Why inverse problems ?  In Machine Learning → online recognition tasks  In Physics → understanding a physical system from observations  In social science → getting insight of latent properties

HOW HARD ? Direct problems are already hard : understanding equilibrium properties can be (very) challenging (e.g. spin glasses) Inverse problems can be harder : ideally maximizing the likelihood would involve to compute the partition function many times In particular, serious problems can appear because if  Overfitting  Non-convex functions  Slow convergence in the direct problem

HOW HARD ? Depending on the system, different optimization scheme can be adopted

DEEP LEARNING

ICML STUFFS

WHY IT IS NEEDED TO GO BEYOND MF MF is mapping the distribution of the data onto a particular form of probability distribution min 𝜘 𝐿𝑀( 𝑞 𝑒𝑏𝑢𝑏 || 𝑞 𝑢𝑏𝑠𝑕𝑓𝑢 (𝜘)) nMF 𝑞 𝑜𝑁𝐺 𝜘 = 𝑗 𝑞 𝑗 (𝑡 𝑗 ) 𝑞 𝑗𝑘 (𝑡 𝑗 ,𝑡 𝑘 ) Bethe approx 𝑞 𝐶𝐵 𝜘 = 𝑗𝑘 𝑞 𝑗 (𝑡 𝑗 )𝑞 𝑘 (𝑡 𝑘 ) 𝑗 𝑞 𝑗 (𝑡 𝑗 )

WHY IT IS NEEDED TO GO BEYOND MF What about when the system can not be describe by this particular form of distribution ? Long-range correlations • Very specific topology • Presence of hidden nodes • ⊕ how to put prior information ?

OTHER METHODS ? Pseudo udo-Lik Likeli elihood Trade off between complexity and the level of approximation • Consistent for infinite sampling • Can deal with priors • But overfit Max x likelih kelihood Same as the two last points of above • But overfit and can be very slow

OTHER METHODS ? Adap apti tive e cluster uster exansio ansion Avoid overfitting • Consistently develop cluster of larger sizes • Minimum um Probabilistic tic Flow But it is hard to write it … Fast to converge • Consistent • But probably does not work well for small sampling. Con ontrastic trastic diver vergen gence ce Very fast • A trade off can be found between speed and exactness • Overfit, and can be bad if very slow convergence !

PSEUDO-LIKELIHOOD METHOD  Principle  Comparison with MF  Regularization  Decimation  Generalisation and extension

SETTINGS We consider the following problem : A system of discrete variables 𝑡 𝑗 = 1, … , 𝑟 (ok let’s say 𝑡 𝑗 = ± 1 in the following) - Interacting by pairs and having biases. 𝑡) = 𝑓 −𝛾𝓘( 𝑡) 𝓘 = 𝐾 𝑗𝑘 𝑡 𝑗 𝑡 𝑘 + ℎ 𝑗 𝑡 𝑗 p( 𝑎 <𝑗,𝑘> 𝑗 𝑡 𝑏 } 𝑏=1,..,𝑁 Then, a set of configuration is collected : { Using them, it is possible to compute the likelihood ∗ ) 2 (𝐾 𝑗𝑘 −𝐾 𝑗𝑘 Reconstruction error ε 2 = 2 𝐾 𝑗𝑘

SETTINGS The likelih ihood functi ction 𝑓 −𝛾𝓘(𝑡(𝑏)) Proba of observing the configurations = 𝑏 𝑎 𝑡 (𝑏) ) − log(𝑎)) Define the log-likelihood ℒ = 𝑏 (−𝛾𝓘( 𝜖ℒ ∝< 𝑡 𝑗 𝑡 𝑘 > 𝑒𝑏𝑢𝑏 −< 𝑡 𝑗 𝑡 𝑘 > 𝜖𝐾 𝑗𝑘 𝑛𝑝𝑒𝑓𝑚 Problem of maximization … How to compute average values efficiently ?

PSEUDO-LIKELIHOOD Goal : find a function that can be maximize and would infer correctly the Js 𝑞 𝑡 = 𝑞 𝑡 𝑗 𝑡 𝑘\i ) 𝑞 𝑡 = 𝑞 𝑡 𝑗 𝑡 𝑘\i )𝑞( 𝑡 𝑘\i ) 𝑡 𝑗 𝑓 −𝛾𝑡𝑗( 𝑘 𝐾𝑗𝑘𝑡𝑘+ℎ𝑗) 2cosh(𝛾 ( 𝑘 𝐾 𝑗𝑘 𝑡 𝑘 +ℎ 𝑗 ) ) can be minimized ! 𝑞 𝑡 𝑗 𝑡 𝑘\i ) = Ekeberg et al. : Protein foldings ??? : training RBM

PSEUDO-LIKELIHOOD Can we have theoretical insight ? Yes, for gibbs infinite sampling, the maximum is correct ! Consider : 𝒬ℒ 𝑗 = 𝑏 log(𝑞 𝑡 𝑗 𝑡 𝑘\i )) we replace the distribution over the data by Boltzmann 𝑡 𝒟 ) 𝑓 −𝛾𝓘 𝐻 ( 𝒟 𝒟 )) 𝒬ℒ 𝑗 = log(𝑞 𝑡 𝑗 𝑡 𝑘\i 𝑎 𝐻 𝒟 The maximum is reached when the couplings from 𝓘 𝐻 and 𝓘 of are equals

PSEUDO-LIKELIHOOD When no hidden variables are present, the PL is convex ! Therefore only one maxima exists ! The PL can be minimized without too much trouble using for instance Newton method • Gradient descent • And the complexity goes as O(N 2 M) Let’s understand how this works and how it compares to MF

RECALL OF THE SETTING A set of M equilibrium configurations 𝑡 (𝑙) , 𝑙 = 1, . . , 𝑁 On one side we use the MF equations 𝑛 𝑗 = tanh( 𝐾 𝑗𝑘 𝑛 𝑘 + ℎ 𝑘 ) −1 𝐾 𝑗𝑘 = −𝑑 𝑗𝑘 𝑘 On the other side we maximize the Pseudo-Likelihood distributions (𝑙) 𝐾 𝑗𝑘 𝑡 𝑘 (𝑙) 𝒬ℒ 𝑗 = 𝑙 log(1 + 𝑓 −2𝛾𝑡 𝑗 ) ∀𝑗

MEAN-FIELD AND PLM 𝑏 with N=100 spins Curie-Weiss 𝐾 𝑗𝑘 = −1/𝑂 with N=100 spins Hopfield 𝐾 𝑗𝑘 = 𝜊 𝑗 𝑏 𝜊 𝑘 and two states, M=100k

MEAN-FIELD AND PLM SK model, N=64, with M=10 6 , 10 7 , 10 8 2D model, 𝐾 𝑗𝑘 = −1 , N=49, with M=10 4 , 10 5 , 10 6 E. Aurell and M. Ekeberg 2012

WHAT ABOUT THE STRUCTURE ?

WHAT ABOUT THE STRUCTURE ? How does the L1-norm is included in PLM ? 𝑙 𝐾 𝑗𝑘 𝑡 𝑘 𝑙 𝒬ℒ 𝑗 = 𝑙 log 1 + 𝑓 −2𝛾𝑡 𝑗 − 𝜇 𝑘 |𝐾 𝑗𝑘 | ∀𝑗 Leads to sparse solution … how to fix 𝜇 ?

WHAT ABOUT THE STRUCTURE ?

VERY SIMPLE IDEA : DECIMATION Progressively decimating parameters with a small absolute values Not NEW : In optimization problem using BP (Montanari et al.) • Brain damage (Lecun) •

DECIMATION ALGORITHM Given a set of equilibrium configurations and all unfixed paramaters 1. Maximize the Pseudo-Likelihood function over all non-fixed variables 2. Decimate the 𝜍(𝑢) smallest variables (in magnitude) and fixed them 3. If (criterion is reached) 1. exit 4. Else 1. 𝑢 ← 𝑢 + 1 2. goto 1. Join work with F. Ricci-Tersenghi

DECIMATION ALGORITHM Given a set of equilibrium configurations and all unfixed paramaters 1. Maximize the Pseudo-Likelihood function over all non-fixed variables 2. Decimate the 𝜍(𝑢) smallest variables (in magnitude) and fixed them 3. If (criterion is reached) 1. exit 4. Else 1. 𝑢 ← 𝑢 + 1 2. goto 1. ????

CAN YOU GUESS THE CRITERION ? Random graph with 16 nodes

CAN YOU GUESS THE CRITERION ? The difference increases Random graph with 16 nodes The difference decreases

HOW DOES IT LOOK! 2D ferro model M=4500 𝞬 =0.8

COMPARISON WITH L1 : ROC My objective! # true positive # true negative

COMPARISON WITH L1 : ROC

SOME MORE COMPARISONS (IF TIME)

TO BE CONTINUED … Can be adapted for the max-likelihood of the parallel dynamics (A.D and P. Zhang) 𝑓 −𝛾𝑡 𝑗 (𝑢+1)( 𝑘 𝐾 𝑗𝑘 𝑡 𝑘 (𝑢)+ℎ 𝑗 ) p( 𝑡(𝑢 + 1)| 𝑡(𝑢)) = 2cosh(𝛾( 𝑘 𝐾 𝑗𝑘 𝑡 𝑘 (𝑢) + ℎ 𝑗 ) ) 𝑗 Has been applied to « detection of cheating by decimation algorithm » Shogo Yamanaka, Masayuki Ohzeki, A.D.

EXTENSION ? The PLM relies on the evaluation of the one-point marginal, why not use two-points or more ? “Composite Likelihood Estimation for Restricted Boltzmann machines” by Yasuda et al. Define 𝒬ℒ 𝑙 = 1 (𝑒𝑏𝑢𝑏) | (𝑒𝑏𝑢𝑏) ) #𝑙−𝑢𝑣𝑞𝑚𝑓𝑡 𝑙−𝑣𝑞𝑚𝑓 𝑑 𝑒𝑏𝑢𝑏 𝑞( 𝑡 𝑑 𝑡 𝑑 They show that 𝒬ℒ 1 ≤ 𝒬ℒ 2 ≤ ⋯ ≤ 𝒬ℒ 𝑙 ≤ ⋯ ≤ 𝒬ℒ 𝑂 True Likelihood !

EXTENSION : THREE-BODY INTERACTIONS The maximum likelihood can be seen as a maximum entropy problem where we would like to fit the 2-point correlations and local bias ! 𝓘 = 𝐾 𝑗𝑘 𝑡 𝑗 𝑡 𝑘 + ℎ 𝑗 𝑡 𝑗 𝑗<𝑘 𝑗 There are already a lot of parameters O(N 2 ) What if the system « could » have n-body interactions ? 𝓘 = 𝐾 𝑗𝑘 𝑡 𝑗 𝑡 𝑘 + ℎ 𝑗 𝑡 𝑗 + 𝐾 𝑗𝑘 𝑡 𝑗 𝑡 𝑘 𝑡 𝑙 + ⋯ 𝑗<𝑘 𝑗 𝑗<𝑘<𝑙

EXTENSION : THREE-BODY INTERACTIONS We need to find an indicato ator that there could be new interactions Let’s consider the following experience Take a system S1, 2D ferro without field • Take a system S2, 2D ferro without field but with some 3B interactions • Make the inference on the two models with a pairwise model and a • model with 3B interactions included

EXTENSION : THREE-BODY INTERACTIONS Error on t the correlati ation on matrix LEFT : S1 (whatever model I use for inferences) RIGHT : S2 when doing inference with the wrong model

EXTENSION : THREE-BODY INTERACTIONS Take the error on the 3points correlation functions, plot them by decreasing order! Can you guess uess how many three-bo body y intera racti ctions ons there re are ?

EXTENSION : THREE-BODY INTERACTIONS - Wrong model – - Correct model – Histogram of the error on the 3p-corr Histogram of the error on the 3p-corr

BEYOND MEAN-FIELD APPROXIMATION AURLIEN DECELLE LABORATOIRE DE - PowerPoint PPT Presentation

BEYOND MEAN-FIELD APPROXIMATION AURLIEN DECELLE LABORATOIRE DE RECHERCHE EN INFORMATIQUE UNIVERSIT PARIS SUD MOTIVATIONS Why inverse problems ? In Machine Learning online recognition tasks In Physics understanding a

Overview of mean-field and beyond mean-field theoretical studies on giant resonances G. Col

Statistics in Biology The Mean Mean ( x ) is a measure of the central tendency of a set of data

Notion of mean point in the data Why bother about mean point? Defining mean point can be

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root

As a prelude to the back-analysis intended for the full MAE Center report that is currently under

ACTIVE AND EPHEMERAL REGIONS IN THE SOLAR MEAN MAGNETIC FIELD EDDIE ROSS W.J. CHAPLIN, G.R.

A Tutorial on Mean Field and Refined Mean Field Approximation Nicolas Gast Inria, Grenoble,

Mean Field Equilibria of Dynamic Auctions Ramesh Johari Stanford University June 7, 2012 1 / 99

DETERMINISTIC MEAN FIELD GAMES Italo Capuzzo Dolcetta Sapienza Universit` a di Roma and GNAMPA

Variational Mean Field Variational Mean Field for Graphical Models for Graphical Models

Mean Field Games problems for linear control system and ergodic behavior of Mean Field Games

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Mean Absolute Deviation Mean Absolute Deviation O Definition: Mean Absolute Deviation (MAD) is

Population size and Conservation TEST 1 Mean = 83, Geometric mean = 82, Harmonic mean = 81,

MAP for Gaussian mean and variance Conjugate priors Mean: Gaussian prior Variance:

High-Resolution MPAS Simulations for Analysis of Climate Change Effects on Weather Extremes

Positioning Ones-Self in the Research Process : eliminating pseudo-science, a split personality

The Hinge of Fate? Economic and Social Populism in the 2016 Presidential Election A Preliminary

PRESENTATION 30 JANUARY 2020 P R O F I T A B L E S U S T A I N A B L E S T A K E H O

RTWG Report to the MOPC November 4, 2013 Dennis Reed Chairman RTWG 1 11/1/2013 APPROVAL

Trick or Tweak On the (In)security of OTRs Tweaks Raphael Bost 1 , 2 Olivier Sanders 3 1

Hagedorn (2016)

Reducing the Carbon Footprint in South Carolina Ethan R. Ware 1441 Main Street, Suite 1250

BEYOND MEAN-FIELD APPROXIMATION AURLIEN DECELLE LABORATOIRE DE - PowerPoint PPT Presentation

BEYOND MEAN-FIELD APPROXIMATION AURLIEN DECELLE LABORATOIRE DE RECHERCHE EN INFORMATIQUE UNIVERSIT PARIS SUD MOTIVATIONS Why inverse problems ? In Machine Learning online recognition tasks In Physics understanding a

Overview of mean-field and beyond mean-field theoretical studies on giant resonances G. Col

Statistics in Biology The Mean Mean ( x ) is a measure of the central tendency of a set of data

Notion of mean point in the data Why bother about mean point? Defining mean point can be

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) &amp; (Root

As a prelude to the back-analysis intended for the full MAE Center report that is currently under

ACTIVE AND EPHEMERAL REGIONS IN THE SOLAR MEAN MAGNETIC FIELD EDDIE ROSS W.J. CHAPLIN, G.R.

A Tutorial on Mean Field and Refined Mean Field Approximation Nicolas Gast Inria, Grenoble,

Mean Field Equilibria of Dynamic Auctions Ramesh Johari Stanford University June 7, 2012 1 / 99

DETERMINISTIC MEAN FIELD GAMES Italo Capuzzo Dolcetta Sapienza Universit` a di Roma and GNAMPA

Variational Mean Field Variational Mean Field for Graphical Models for Graphical Models

Mean Field Games problems for linear control system and ergodic behavior of Mean Field Games

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Mean Absolute Deviation Mean Absolute Deviation O Definition: Mean Absolute Deviation (MAD) is

Population size and Conservation TEST 1 Mean = 83, Geometric mean = 82, Harmonic mean = 81,

MAP for Gaussian mean and variance Conjugate priors Mean: Gaussian prior Variance:

High-Resolution MPAS Simulations for Analysis of Climate Change Effects on Weather Extremes

Positioning Ones-Self in the Research Process : eliminating pseudo-science, a split personality

The Hinge of Fate? Economic and Social Populism in the 2016 Presidential Election A Preliminary

PRESENTATION 30 JANUARY 2020 P R O F I T A B L E S U S T A I N A B L E S T A K E H O

RTWG Report to the MOPC November 4, 2013 Dennis Reed Chairman RTWG 1 11/1/2013 APPROVAL

Trick or Tweak On the (In)security of OTRs Tweaks Raphael Bost 1 , 2 Olivier Sanders 3 1

Hagedorn (2016)

Reducing the Carbon Footprint in South Carolina Ethan R. Ware 1441 Main Street, Suite 1250

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root