beyond mean field
play

BEYOND MEAN-FIELD APPROXIMATION AURLIEN DECELLE LABORATOIRE DE - PowerPoint PPT Presentation

BEYOND MEAN-FIELD APPROXIMATION AURLIEN DECELLE LABORATOIRE DE RECHERCHE EN INFORMATIQUE UNIVERSIT PARIS SUD MOTIVATIONS Why inverse problems ? In Machine Learning online recognition tasks In Physics understanding a


  1. BEYOND MEAN-FIELD APPROXIMATION AURÉLIEN DECELLE LABORATOIRE DE RECHERCHE EN INFORMATIQUE UNIVERSITÉ PARIS SUD

  2. MOTIVATIONS Why inverse problems ?  In Machine Learning → online recognition tasks  In Physics → understanding a physical system from observations  In social science → getting insight of latent properties

  3. HOW HARD ? Direct problems are already hard : understanding equilibrium properties can be (very) challenging (e.g. spin glasses) Inverse problems can be harder : ideally maximizing the likelihood would involve to compute the partition function many times In particular, serious problems can appear because if  Overfitting  Non-convex functions  Slow convergence in the direct problem

  4. HOW HARD ? Depending on the system, different optimization scheme can be adopted

  5. DEEP LEARNING

  6. ICML STUFFS

  7. WHY IT IS NEEDED TO GO BEYOND MF MF is mapping the distribution of the data onto a particular form of probability distribution min 𝜘 𝐿𝑀( 𝑞 𝑒𝑏𝑢𝑏 || 𝑞 𝑢𝑏𝑠𝑕𝑓𝑢 (𝜘)) nMF 𝑞 𝑜𝑁𝐺 𝜘 = 𝑗 𝑞 𝑗 (𝑡 𝑗 ) 𝑞 𝑗𝑘 (𝑡 𝑗 ,𝑡 𝑘 ) Bethe approx 𝑞 𝐶𝐵 𝜘 = 𝑗𝑘 𝑞 𝑗 (𝑡 𝑗 )𝑞 𝑘 (𝑡 𝑘 ) 𝑗 𝑞 𝑗 (𝑡 𝑗 )

  8. WHY IT IS NEEDED TO GO BEYOND MF What about when the system can not be describe by this particular form of distribution ? Long-range correlations • Very specific topology • Presence of hidden nodes • ⊕ how to put prior information ?

  9. OTHER METHODS ? Pseudo udo-Lik Likeli elihood Trade off between complexity and the level of approximation • Consistent for infinite sampling • Can deal with priors • But overfit Max x likelih kelihood Same as the two last points of above • But overfit and can be very slow

  10. OTHER METHODS ? Adap apti tive e cluster uster exansio ansion Avoid overfitting • Consistently develop cluster of larger sizes • Minimum um Probabilistic tic Flow But it is hard to write it … Fast to converge • Consistent • But probably does not work well for small sampling. Con ontrastic trastic diver vergen gence ce Very fast • A trade off can be found between speed and exactness • Overfit, and can be bad if very slow convergence !

  11. PSEUDO-LIKELIHOOD METHOD  Principle  Comparison with MF  Regularization  Decimation  Generalisation and extension

  12. SETTINGS We consider the following problem : A system of discrete variables 𝑡 𝑗 = 1, … , 𝑟 (ok let’s say 𝑡 𝑗 = ± 1 in the following) - Interacting by pairs and having biases. 𝑡) = 𝑓 −𝛾𝓘( 𝑡) 𝓘 = 𝐾 𝑗𝑘 𝑡 𝑗 𝑡 𝑘 + ℎ 𝑗 𝑡 𝑗 p( 𝑎 <𝑗,𝑘> 𝑗 𝑡 𝑏 } 𝑏=1,..,𝑁 Then, a set of configuration is collected : { Using them, it is possible to compute the likelihood ∗ ) 2 (𝐾 𝑗𝑘 −𝐾 𝑗𝑘 Reconstruction error ε 2 = 2 𝐾 𝑗𝑘

  13. SETTINGS The likelih ihood functi ction 𝑓 −𝛾𝓘(𝑡(𝑏)) Proba of observing the configurations = 𝑏 𝑎 𝑡 (𝑏) ) − log(𝑎)) Define the log-likelihood ℒ = 𝑏 (−𝛾𝓘( 𝜖ℒ ∝< 𝑡 𝑗 𝑡 𝑘 > 𝑒𝑏𝑢𝑏 −< 𝑡 𝑗 𝑡 𝑘 > 𝜖𝐾 𝑗𝑘 𝑛𝑝𝑒𝑓𝑚 Problem of maximization … How to compute average values efficiently ?

  14. PSEUDO-LIKELIHOOD Goal : find a function that can be maximize and would infer correctly the Js 𝑞 𝑡 = 𝑞 𝑡 𝑗 𝑡 𝑘\i ) 𝑞 𝑡 = 𝑞 𝑡 𝑗 𝑡 𝑘\i )𝑞( 𝑡 𝑘\i ) 𝑡 𝑗 𝑓 −𝛾𝑡𝑗( 𝑘 𝐾𝑗𝑘𝑡𝑘+ℎ𝑗) 2cosh(𝛾 ( 𝑘 𝐾 𝑗𝑘 𝑡 𝑘 +ℎ 𝑗 ) ) can be minimized ! 𝑞 𝑡 𝑗 𝑡 𝑘\i ) = Ekeberg et al. : Protein foldings ??? : training RBM

  15. PSEUDO-LIKELIHOOD Can we have theoretical insight ? Yes, for gibbs infinite sampling, the maximum is correct ! Consider : 𝒬ℒ 𝑗 = 𝑏 log(𝑞 𝑡 𝑗 𝑡 𝑘\i )) we replace the distribution over the data by Boltzmann 𝑡 𝒟 ) 𝑓 −𝛾𝓘 𝐻 ( 𝒟 𝒟 )) 𝒬ℒ 𝑗 = log(𝑞 𝑡 𝑗 𝑡 𝑘\i 𝑎 𝐻 𝒟 The maximum is reached when the couplings from 𝓘 𝐻 and 𝓘 of are equals

  16. PSEUDO-LIKELIHOOD When no hidden variables are present, the PL is convex ! Therefore only one maxima exists ! The PL can be minimized without too much trouble using for instance Newton method • Gradient descent • And the complexity goes as O(N 2 M) Let’s understand how this works and how it compares to MF

  17. RECALL OF THE SETTING A set of M equilibrium configurations 𝑡 (𝑙) , 𝑙 = 1, . . , 𝑁 On one side we use the MF equations 𝑛 𝑗 = tanh( 𝐾 𝑗𝑘 𝑛 𝑘 + ℎ 𝑘 ) −1 𝐾 𝑗𝑘 = −𝑑 𝑗𝑘 𝑘 On the other side we maximize the Pseudo-Likelihood distributions (𝑙) 𝐾 𝑗𝑘 𝑡 𝑘 (𝑙) 𝒬ℒ 𝑗 = 𝑙 log(1 + 𝑓 −2𝛾𝑡 𝑗 ) ∀𝑗

  18. MEAN-FIELD AND PLM 𝑏 with N=100 spins Curie-Weiss 𝐾 𝑗𝑘 = −1/𝑂 with N=100 spins Hopfield 𝐾 𝑗𝑘 = 𝜊 𝑗 𝑏 𝜊 𝑘 and two states, M=100k

  19. MEAN-FIELD AND PLM SK model, N=64, with M=10 6 , 10 7 , 10 8 2D model, 𝐾 𝑗𝑘 = −1 , N=49, with M=10 4 , 10 5 , 10 6 E. Aurell and M. Ekeberg 2012

  20. WHAT ABOUT THE STRUCTURE ?

  21. WHAT ABOUT THE STRUCTURE ? How does the L1-norm is included in PLM ? 𝑙 𝐾 𝑗𝑘 𝑡 𝑘 𝑙 𝒬ℒ 𝑗 = 𝑙 log 1 + 𝑓 −2𝛾𝑡 𝑗 − 𝜇 𝑘 |𝐾 𝑗𝑘 | ∀𝑗 Leads to sparse solution … how to fix 𝜇 ?

  22. WHAT ABOUT THE STRUCTURE ?

  23. WHAT ABOUT THE STRUCTURE ?

  24. VERY SIMPLE IDEA : DECIMATION Progressively decimating parameters with a small absolute values Not NEW : In optimization problem using BP (Montanari et al.) • Brain damage (Lecun) •

  25. DECIMATION ALGORITHM Given a set of equilibrium configurations and all unfixed paramaters 1. Maximize the Pseudo-Likelihood function over all non-fixed variables 2. Decimate the 𝜍(𝑢) smallest variables (in magnitude) and fixed them 3. If (criterion is reached) 1. exit 4. Else 1. 𝑢 ← 𝑢 + 1 2. goto 1. Join work with F. Ricci-Tersenghi

  26. DECIMATION ALGORITHM Given a set of equilibrium configurations and all unfixed paramaters 1. Maximize the Pseudo-Likelihood function over all non-fixed variables 2. Decimate the 𝜍(𝑢) smallest variables (in magnitude) and fixed them 3. If (criterion is reached) 1. exit 4. Else 1. 𝑢 ← 𝑢 + 1 2. goto 1. ????

  27. CAN YOU GUESS THE CRITERION ? Random graph with 16 nodes

  28. CAN YOU GUESS THE CRITERION ? The difference increases Random graph with 16 nodes The difference decreases

  29. HOW DOES IT LOOK! 2D ferro model M=4500 𝞬 =0.8

  30. COMPARISON WITH L1 : ROC My objective! # true positive # true negative

  31. COMPARISON WITH L1 : ROC

  32. SOME MORE COMPARISONS (IF TIME)

  33. TO BE CONTINUED … Can be adapted for the max-likelihood of the parallel dynamics (A.D and P. Zhang) 𝑓 −𝛾𝑡 𝑗 (𝑢+1)( 𝑘 𝐾 𝑗𝑘 𝑡 𝑘 (𝑢)+ℎ 𝑗 ) p( 𝑡(𝑢 + 1)| 𝑡(𝑢)) = 2cosh(𝛾( 𝑘 𝐾 𝑗𝑘 𝑡 𝑘 (𝑢) + ℎ 𝑗 ) ) 𝑗 Has been applied to « detection of cheating by decimation algorithm » Shogo Yamanaka, Masayuki Ohzeki, A.D.

  34. EXTENSION ? The PLM relies on the evaluation of the one-point marginal, why not use two-points or more ? “Composite Likelihood Estimation for Restricted Boltzmann machines” by Yasuda et al. Define 𝒬ℒ 𝑙 = 1 (𝑒𝑏𝑢𝑏) | (𝑒𝑏𝑢𝑏) ) #𝑙−𝑢𝑣𝑞𝑚𝑓𝑡 𝑙−𝑣𝑞𝑚𝑓 𝑑 𝑒𝑏𝑢𝑏 𝑞( 𝑡 𝑑 𝑡 𝑑 They show that 𝒬ℒ 1 ≤ 𝒬ℒ 2 ≤ ⋯ ≤ 𝒬ℒ 𝑙 ≤ ⋯ ≤ 𝒬ℒ 𝑂 True Likelihood !

  35. EXTENSION : THREE-BODY INTERACTIONS The maximum likelihood can be seen as a maximum entropy problem where we would like to fit the 2-point correlations and local bias ! 𝓘 = 𝐾 𝑗𝑘 𝑡 𝑗 𝑡 𝑘 + ℎ 𝑗 𝑡 𝑗 𝑗<𝑘 𝑗 There are already a lot of parameters O(N 2 ) What if the system « could » have n-body interactions ? 𝓘 = 𝐾 𝑗𝑘 𝑡 𝑗 𝑡 𝑘 + ℎ 𝑗 𝑡 𝑗 + 𝐾 𝑗𝑘 𝑡 𝑗 𝑡 𝑘 𝑡 𝑙 + ⋯ 𝑗<𝑘 𝑗 𝑗<𝑘<𝑙

  36. EXTENSION : THREE-BODY INTERACTIONS We need to find an indicato ator that there could be new interactions Let’s consider the following experience Take a system S1, 2D ferro without field • Take a system S2, 2D ferro without field but with some 3B interactions • Make the inference on the two models with a pairwise model and a • model with 3B interactions included

  37. EXTENSION : THREE-BODY INTERACTIONS Error on t the correlati ation on matrix LEFT : S1 (whatever model I use for inferences) RIGHT : S2 when doing inference with the wrong model

  38. EXTENSION : THREE-BODY INTERACTIONS Take the error on the 3points correlation functions, plot them by decreasing order! Can you guess uess how many three-bo body y intera racti ctions ons there re are ?

  39. EXTENSION : THREE-BODY INTERACTIONS - Wrong model – - Correct model – Histogram of the error on the 3p-corr Histogram of the error on the 3p-corr

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend