Model inference s e from l b a v observed data r e s b o - PowerPoint PPT Presentation

Model inference s e from l b a v observed data r e s b o time dynamics underlying mechanisms predictive model (correlations) (interactions) Many issues : limitation over temporal and spatial sampling, noise (measurement, dynamics), stationarity, classes of models (number of parameters), computational effort for inference, ... (signal/noise < 1, large systems) ! Physics (uniform interactions " low dimensional models, reproducibility " good sampling, thermal equilibrium)

Example 1: Concerted activity of a neural population Fujisawa, Amarasingham, Schnitzer, Meister (2003) Harrison, Buzsaki (2008) Schneidman et al. (2006)

• Network depend on activity (functional connections) • Connections can be modified through learning … ! Talk by S. Cocco on replay of cell assemblies after learning (memory consolidation) • More sophisticated methods to infer effective connections for encoding/decoding: encoding input output decoding

Example 2: Coevolution of residues in protein families PDZ domains Morais Cabral et al. (1996) • Conservation of residues (used for homology detection, phylogeny reconstruction) • Two-residue correlations ? could reflect structural and functional constraints ... Gobel et al. (1994) ! Talk by M. Weigt on covariation in protein families and the inverse Hopfield model ! Talk by O. Rivoire on spin glass models for protein evolution

Example 3: Order and organization in bird flocks ! Talk by by A. Jelic on information transfer in flocks of starlings

Example 4: Coupled dynamics of species in an ecological system Population ecology : interactions between species (existence, additivity, …) Lotka-Volterra equations: dN i dt = N i ( r i – # A ij N j ) j Issues : Measurement noise, dynamical noise, limited number of samples, unknown (not measured) species, … Is there any reliable signal about species-species 'interactions'? (additivity?) Exploit interactions to predict dynamics, extinction ?

Goals • Compression of data (eliminate indirect correlations, sparser representation?) correlated activity interaction interaction interaction • Find effective interaction network (ex: contact map in protein residue case) • Obtain predictive, generative models (ex : model for artificial protein sequences) Could be used to test effect of perturbation … to define 'energy' landscape and probe configuration space ... • Feedback with experiments : design of optimal, maximally informative protocols

Microscopic model for the data (1) (here, stationary and discrete data) (1,0,0,0,1,0,1,1, ..., 1,0,0,1,1,0) Data (0,1,0,0,0,1,1,1, ..., 0,1,1,0,0,0) " Probability p( $ 1 , $ 2 ,... , $ N )? (1,1,0,1,0,1,1,0, ..., 1,1,0,0,1,1) ... (0,1,0,0,1,0,0,0, ..., 0,0,0,1,0,0) m i = < ! i >, c ij = < ! i $ j >, ... (constraints are realizable) Constraints Maximum entropy principle (Jaynes, 1957) Find p( ! ) maximizing the entropy S[p] = - % p( ! ) ln p( ! ) under the selected constraints !

Microscopic model for the data (2) Analogy with Thermodynamics and Ensembles in Statistical Physics • System with energy E, volume V, N particles, ... • Fix average value of volume V # impose pressure p : E $ E + pV number of particules N # impose chemical potential µ : E $ E - µ N • E ( ! ;J,h) = - ! J ij $ i $ j - ! h i $ i Model Ising model! i<j i • p MAXENT ( ! ;J,h) = exp( -E( ! ;J,h) ) / Z[J,h] where Z[J,h] = % exp( -E( ! ;J,h) ) ! • find couplings and fields such that all N+ N(N-1)/2 constraints are fulfilled & ln Z[J,h] & ln Z[J,h] = m i = c ij , & h i & J ij

Boltzmann Machine Learning Ackley, Hinton, Sejnowski (1985) • Start from random J ij and h i • Calculate < $ i $ j > and < $ k > using Monte Carlo simulations • Compare to c ij and m k (data) and update J ij $ J ij - a (< $ i $ j > - c ij ) ) h k $ h k - a (< $ k > - m k ) Problems: 1. issue of thermalization (critical point ? may take exponential-in-N time …) 2. convergence (yes, but flat modes ?) $ slow

Microscopic model for the data (3) B Cross-entropy of data (= ! 1 ,..., ! B ) S † [J,h] = % - ln p( ! b ;J,h) b=1 = B ( ln Z[J,h] - ! J ij c ij - ! h i m i ) i i<j S † [J,h;data] J,h J*,h* • The minimum of S is the Ising model we are looking for : S † [J,h] S[p] S[p MAXENT ] = S † [J*,h*] • The Hessian of S is positive semi-definite, hence S is convex

Microscopic model for the data (4) Hessian of the cross-entropy ( ) < $ i $ j $ k > - < $ i > < $ j $ k > < $ i $ j $ k $ l > - < $ i $ j > < $ k $ l > & 2 S(J,h;data) = & (J,h) & (J,h) < $ i $ j $ k > - < $ i $ j > < $ k > < $ i $ j > - < $ i > < $ j > where < > = Gibbs average with the Ising model ( ) x ij X T ( ) X = ( # x ij ( $ i $ j - < $ i $ j >) + # x k ( $ k - < $ k >) ) 2 < > X = $ x k i<j k Log-prior on the parameters... Zero modes ? J,h

Bayesian inference framework (1) Data = set of configurations ! b , b = 1, 2, … , B= nb. of configs P[ J,h | Data] ' & P[ ! b | J,h ] ( P 0 [ J,h ] Bayes formula b P 0 [ J,h ] (useful in case of undersampling ...) Prior For instance : P 0 ' exp ( - ! J ij /(2J 2 ) ) 0 i<j P[ ! | J,h ] = exp ( ! J ij $ i $ j + ! h i $ i ) / Z[ J,h ] Likelihood i<j i

Bayesian inference framework (2) P[ J,h | Data ] ' exp ( B[ ! J ij c ij + ! h i m i ]) / Z[ J,h ] B Posterior Proba of J,h i<j i ' P 0 [ J,h ] S = - ln P[ J,h | Data ] Regularized Cross-entropy = B ( ln Z[J,h] - ! J ij c ij - ! h i m i ) - ln P 0 [ J,h ] i<j i = B ( ln Z[J,h] - ! J ij c ij - ! h i m i ) + ! J ij /(2J 2 ) 0 i<j i i<j (with Gaussian prior)

Questions 1. Practical methods to find interactions J ij from the correlations c ij ? (fast, accurate algorithms) 2. How much data does one need to get reliable interactions? (overfitting ...) |J* ) J| ? J* $ configurations {S i } $ c $ J B = nb. configs

Questions Asymptotic inference : B $ infinity, while N is kept fixed Error on each parameter of the order of B -1/2 What happens in practice, i.e. when B and N are of the same order of magnitude ? 3. How large should be the sampled sub-system? correlation length Is the inverse problem well-behaved ?

Inference approaches • Gaussian inference (and Mean field) • Inverse Hopfield-Potts model • Pseudo-likelihood algorithms • Advanced statistical physics methods

Interactions from correlations for Gaussian variables N Gaussian variables x i with variances c ii = <x i x i > = 1 i ! j : c ij = <x i x j > = + + + … J ij % J ik J kj % J ik J kl J lj k k,l Matrix notation: c = Id + J + J 2 + J 3 + … = (Id-J) -1 " J = Id – c -1 Time = N 3 Problems: • Matrix c corrupted by noise, which makes inversion very unreliable Empirical matrix c = ( ) ^ " Errors of order (N/B) 1/2 … c ij ± B -1/2 N nb. of data

• Not correct for discrete variables (neurons=0,1 ; amino-acids=1,2, .., 21) J Gaussian … but with regularization pseudo-count L 2 (quadratic log-prior) 2-spin system • Gaussian theory corresponds to mean-field theory Why? MF exact when effective fields average out = with many neighbors & weak interactions, and fluctuations are described by Gaussian law example: very large dimensions D in physical systems lead to N(0,1/D)

Inference approaches • Mean field inference • Inverse Hopfield-Potts model • Pseudo-likelihood algorithms • Advanced statistical physics methods

Inverse Hopfield model : retarded learning phase transition Example: N=100 , ! = Gaussian (0,.7) | * + ) * | #config=40 # config / N Phase transition! #config=400 Retarded learning: Watkin, Nadal (1994) Baik, Ben Arous, Peche (2005)

Inverse Hopfield Model : posterior entropy of patterns Example : P=1 pattern, no field Cocco, M., Sessak (2011) * 2 /N = 1.1 * 2 /N = 0.5 Retarded learning transition ) ) s s t t i i b b ( ( ~1/ ( ? ( " c =2, ! L =1)

Inverse Hopfield Model : error on the inferred patterns For pattern components ~1, corrections to S 0 are useless (at best) unless many configurations are available ... " Mean-Field is better (even if wrong) when few data are available

Inference approaches • Mean field inference • Inverse Hopfield-Potts model • Pseudo-likelihood algorithms • Advanced statistical physics methods

Pseudo-likelihood methods (1) Idea: avoid calculation of partition function using Callen identities (1963) < $ 0 > = < tanh( % J 0k $ k + h 0 ) > k k ! 0 b ) % tanh( % J 0k $ k + h 0 ) / B J 0k 0 k ! 0 sampled configurations Pseudo cross entropy: b « S » = % log 2 cosh( % J 0k $ k + h 0 ) - B (h 0 m 0 + % J 0k c 0k ) sampled k ! 0 k ! 0 configurations Prior: increase signal/noise ratio by exploiting the sparsity of J ij cost function ({J}) = pseudo-cross entropy (h 0 ,{J 0k }) + * % |J 0k | k

Model inference s e from l b a v observed data r e s b o - PowerPoint PPT Presentation

Model inference s e from l b a v observed data r e s b o time dynamics underlying mechanisms predictive model (correlations) (interactions) Many issues : limitation over temporal and spatial sampling, noise (measurement,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Quartet Inference from SNP Data Under the Coalescent Model Syed Shalan Naqvi Quartet Inference

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

TensorRT 2. Setup of the TensorRT inference engine 2. Setup of the TensorRT inference engine 3. I/O

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

MAXIMIZING UTILIZATION FOR DATA CENTER INFERENCE WITH TENSORRT INFERENCE SERVER David Goodwin,

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Mathematical approximation Jo Hardin Professor, Pomona College DataCamp Inference for Linear

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Causal Inference and Response Surface Modeling Inference and

The Foundations: Logic and Proofs Chapter 1, Part III: Proofs Rules of Inference Section 1.6

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

GMN GMNN: Gr Graph Ma Mark rkov Neur Neural al Ne Networks Meng Qu 1 2 , Yoshua Bengio 1 2 4

Undirected Graphical Model Application Aryan Arbabi CSC 412 Tutorial February 1, 2018 Outline

Confronting the Partition Function Lecture slides for Chapter 18 of Deep Learning

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Mark S. Handcock

!"#$%$&'&()&+,"%-.&%'+/#01'(+234-5)+6789:

Motivation Beyond local representation of language Information Extraction Reason

General Formulations for Structures: Markov Logic CS 6355: Structured Prediction 1 This lecture

Estimating Recall the general mean-variance specification E( Y | x ) = f ( x , ) , var( Y |

Model inference s e from l b a v observed data r e s b o - PowerPoint PPT Presentation

Model inference s e from l b a v observed data r e s b o time dynamics underlying mechanisms predictive model (correlations) (interactions) Many issues : limitation over temporal and spatial sampling, noise (measurement,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Quartet Inference from SNP Data Under the Coalescent Model Syed Shalan Naqvi Quartet Inference

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

TensorRT 2. Setup of the TensorRT inference engine 2. Setup of the TensorRT inference engine 3. I/O

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

MAXIMIZING UTILIZATION FOR DATA CENTER INFERENCE WITH TENSORRT INFERENCE SERVER David Goodwin,

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Mathematical approximation Jo Hardin Professor, Pomona College DataCamp Inference for Linear

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Causal Inference and Response Surface Modeling Inference and

The Foundations: Logic and Proofs Chapter 1, Part III: Proofs Rules of Inference Section 1.6

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

GMN GMNN: Gr Graph Ma Mark rkov Neur Neural al Ne Networks Meng Qu 1 2 , Yoshua Bengio 1 2 4

Undirected Graphical Model Application Aryan Arbabi CSC 412 Tutorial February 1, 2018 Outline

Confronting the Partition Function Lecture slides for Chapter 18 of Deep Learning

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Mark S. Handcock

!&quot;#$%$&amp;'&amp;()&amp;*+,&quot;%-.&amp;*%'+/#01'(+234-5)+6789:

Motivation Beyond local representation of language Information Extraction Reason

General Formulations for Structures: Markov Logic CS 6355: Structured Prediction 1 This lecture

Estimating Recall the general mean-variance specification E( Y | x ) = f ( x , ) , var( Y |

!"#$%$&'&()&+,"%-.&%'+/#01'(+234-5)+6789: