A statistical Bayesian framework for the identification of - PowerPoint PPT Presentation

A statistical Bayesian framework for the identification of biological networks from perturbation experiments ECCB 2010, Ghent, Belgium Nicole Radde Institute for Systems Theory and Automatic Control University of Stuttgart September 26, 2010

Parameter estimation as inverse problem Optimization problem model m characterized by a parameter vector θ Given: dataset y Find ˆ θ that optimizes an objective function F ( θ, y ) Wanted: ˆ θ = arg min θ F ( θ, y ) Identification of biological networks from perturbations, Nicole Radde 1 / 23

Content Statistical approaches for parameter estimation 1 Bayesian regularization 2 Application results 3 Conclusions 4 Identification of biological networks from perturbations, Nicole Radde 2 / 23

Statistical approaches Statistical approaches y : Random variables, sampling distribution p ( y | θ ) Standard objective function: log-likelihood − log L y ( θ ) = − log p ( y | θ ) can directly include noise can handle latent variables → marginalization: � p ( y | θ ) = p ( y , x | θ ) X y : observables with x : latent variables Identification of biological networks from perturbations, Nicole Radde 3 / 23

Identifiability Practical non-identifiability: Sparse data (low time resolution, hidden states) Flat likelihood: Fisher information I ( θ ML ) has small eigenvalues → Normal approximation N ( θ ML , I − 1 ( θ ML )) has large covariance becomes better with increasing dataset size Structural non-identifiability: Independent of dataset size Correlation between parameters Identification of biological networks from perturbations, Nicole Radde 4 / 23

Structural non-identifiability Example: Measure steady state: k − 1 − − ⇀ A − − B ˙ ↽ [ A ] = − k 1 [ A ] + k − 1 [ B ] k 1 y = [¯ A ] ˙ − ˙ B ] = k − 1 [ B ] = [ A ] [¯ Parameters k 1 [ A ] + [ B ] = N θ = ( k 1 , k − 1 ) � � 2 � � − 1 k − 1 Likelihood function: p ( k 1 , k − 1 ) ∝ exp k 1 − 1 2 Identification of biological networks from perturbations, Nicole Radde 5 / 23

Regularization Idea Problem: Data does not contain enough information to identify parameter values Large variance of ML/MSE estimates across different experiments Regularization: Add additional data-independent regularization term in objective function e.g. Tikhonov-regularization: � θ TR = arg min ˆ � y i − x i ( θ ) � 2 α � θ � 2 + (1) θ � �� i data-term regularization term Identification of biological networks from perturbations, Nicole Radde 6 / 23

Bayesian regularization θ, y are both random variables joint distribution: p ( y , θ ) = p ( y | θ ) p ( θ ) = p ( θ | y ) p ( y ) Objective function is the posterior distribution: p ( y | θ ) Likelihood function p ( θ | y ) = p ( y | θ ) p ( θ ) with p ( θ ) Prior distribution p ( y ) p ( y ) Evidence posterior 0.18 0.16 prior 0.14 0.12 0.1 0.18 0.08 0.06 0.16 0.04 0.14 0.02 0.12 0 8 0.1 6 0.08 4 2 0.06 0 θ 2 0.04 -2 0.02 -4 0 -6 -8 -8 -6 -4 -2 0 2 4 6 8 θ 1 Identification of biological networks from perturbations, Nicole Radde 7 / 23

Bayesian regularization − log p ( θ | y ) = − log p ( y | θ ) − log p ( θ ) + logp ( y ) � �� data-term regularization term independent of θ The posterior distribution does not only provide point estimates, but also contains information about confidence intervals and identifiability Information-theoretic concepts can be used as measures for the information content of the posterior distribution Identification of biological networks from perturbations, Nicole Radde 8 / 23

Stochastic embedding of ODEs Measurement noise models x = f ( x , θ 1 ) → x ( t , x 0 , θ 1 ) ˙ deterministic System: y t Observations: i = x i ( t , x 0 , θ 2 ) + ǫ ( θ 2 ) , ǫ : noise stochastic Independence graph: Sampling distribution: n T � � p ( y t p ( y | x , θ ) = i | x i ( t , x 0 , θ 1 ) , θ 2 ) i =1 t =1 Identification of biological networks from perturbations, Nicole Radde 9 / 23

Sampling schemes: Rejection sampling Rejection sampling 1. Sample θ t from prior p ( θ ) 2. Accept θ t with p = p ( θ | y ) M · p ( θ ) 0.4 posterior prior envelope 0.35 0.3 0.25 Acceptance rate: M p(x)=posterior(x)/M*prior(x) 0.2 0.15 0.1 0.05 0 -10 -5 0 5 10 θ Uncorrelated samples, but low (1/M) acceptance rate! Identification of biological networks from perturbations, Nicole Radde 10 / 23

MCMC sampling Markov Chain Monte Carlo sampling Can be used if acceptance rate of rejection or importance sampling is low Produces correlated samples, but has higher acceptance rate Computationally expensive if mixing is slow (long time of MC to converge to equilibrium distribution) Sampling scheme: 1 Sample θ t +1 from a Markov chain p ( θ t +1 | θ t ) 2 Accept θ t +1 with � � 1 , p ( θ t +1 | y ) p ( θ t | θ t +1 ) p = min p ( θ t | y ) p ( θ t +1 | θ t ) Identification of biological networks from perturbations, Nicole Radde 11 / 23

Hamiltonian Monte Carlo 1. Write target density p ( θ ) ∼ exp ( − V ( θ )) 2. Extend sampling space θ by auxiliary momentum vector η : � � − 1 2 η T η − V ( θ ) p ( θ, η ) ∼ exp = exp ( − H ( θ, η )) 3. Start with random momentum drawn from a Gaussian distribution 4. Create trajectory in the space θ according to ˙ θ = η ˙ = −∇ V ( θ ) η 5. Accept new θ with P A = min(1 , exp ( − ∆ H ( θ, η ))) Faster mixing time by producing less correlated samples (larger steps), but harder to tune and implement. Identification of biological networks from perturbations, Nicole Radde 12 / 23

Posterior summaries Posterior samples can be used for: Posterior density estimation: Estimation of posterior summaries: Entropy Posterior information content about θ KLD(prior � posterior) Information content of data y about θ Mode Maximum a-posteriori point estimator Mean Point estimator Experimental design: Choose experiment that maximizes the expected information content of the posterior distribution Identification of biological networks from perturbations, Nicole Radde 13 / 23

Secretory pathway control Regulation of secretion at the TGN via the protein kinase D and the ceramide transfer protein CERT Cooperation with the Institute of Immunology and Cell Biology: Angelika Hausser Monilola Olayioye Identification of biological networks from perturbations, Nicole Radde 14 / 23

Modeling framework Model for secretory pathway control 1 ⋆ = θ 1 x 6 − x 1 x 1 ˙ PKD 6 2 ˙ = θ 2 x 1 − x 2 PI(4)KIII β x 2 = θ 3 x 2 − x 3 x 3 ˙ PI(4)P ⋆ = θ 4 x 3 − x 4 − θ 8 x 1 x 4 ˙ CERT x 4 1+ x 4 5 3 x 5 = θ 5 x 4 − θ 6 x 5 − θ 9 x 5 ˙ ceramide 1+ x 5 4 x 5 ˙ = − θ 7 x 6 + θ 9 DAG x 6 1+ x 5 In the following: Estimation of θ = ( θ 2 , θ 8 ) Identification of biological networks from perturbations, Nicole Radde 15 / 23

Perturbation experiments 1 1 ⋆ ⋆ 6 2 6 2 ⋆ ⋆ 5 3 5 3 4 4 Measurements: Relative steady states of two components under different network perturbations: x P = ¯ y P ǫ ∼ N (0 , σ 2 ) ¯ x U + ǫ, i ¯ Prior: Gamma distributions Identification of biological networks from perturbations, Nicole Radde 16 / 23

Posterior 4.5 4.5 4 4 3.5 3.5 3 3 2.5 θ 8 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 θ 2 Identification of biological networks from perturbations, Nicole Radde 17 / 23

Simple ODE Example System: 1 θ 2 x 1 = a 1 − u θ 1 x 1 − θ 2 x 2 − b 1 x 1 ˙ x 2 = a 2 + u θ 1 x 1 − h ( c 1 , x 2 ) c 1 − b 2 x 2 ˙ θ 1 2 x 3 = a 3 − θ 2 x 1 + h ( c 1 , x 2 ) − b 3 x 3 ˙ x 2 h ( c , x ) = c 3 1 + x 2 Measurements: y i = ¯ x i ( θ, u ) u ) + ǫ , ǫ ∼ N (0 , (0 . 01 y i ) 2 ) i ∈ I = { 1 , 3 } ¯ x i ( θ, ˆ Parameters: a = 1 2(2 , 3 , 1) T , b = 1 2(1 , 4 , 2) T , c = 0 . 7 , α = 0 . 1 , θ ⋆ = (1 , 0 . 1) T � � � y ⋆ − y ( θ ) � 2 − 1 Posterior: p ( θ | y ) ∝ exp − α � θ � 0 . 5 2 σ 2 Identification of biological networks from perturbations, Nicole Radde 18 / 23

Sampling tests: MCMC vs. HMC prior: MCMC posterior: HMC posterior: Parameters are identifiable HMC performs better in this example with the same efficient sampling size, but is also computationally more expensive We expect that HMC outperforms MCMC in higher dimensions Identification of biological networks from perturbations, Nicole Radde 19 / 23

Sampling-Summary Correlation time τ int , A : average amount of Markov Chain steps after which we have a new independent point. It varies depending on the observable A . It reduces the effective sample size: N N A eff . = . 2 τ int , A Example: N = 1000 t in s τ int ,θ 1 τ int ,θ 2 efficiency Hybrid Monte Carlo 1479.8 23 10 0.020 Metropolis Monte Carlo 315.83 103 84 0.017 the efficiency is the number of independent points per second. t is the computation time (duration of the whole sampling procedure). Identification of biological networks from perturbations, Nicole Radde 20 / 23

A statistical Bayesian framework for the identification of - PowerPoint PPT Presentation

A statistical Bayesian framework for the identification of biological networks from perturbation experiments ECCB 2010, Ghent, Belgium Nicole Radde Institute for Systems Theory and Automatic Control University of Stuttgart September 26, 2010

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Statistical hypotheses Bayesian and non-Bayesian STAT 587 (Engineering) Iowa State University

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

RISK IDENTIFICATION Everything your competitor knows about Risk Identification on Software

Dynamical generation of decoherence: Universal scaling of decoherence factors Amit Dutta

CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction and Vinod Variyam

t.IPCyilxi.BE o o.E iTNCyilBotp Xi NCpl0 o2I P43 Here dumb 13 is to estimate a algorithm

UPDATE ON Screening for unruptured aneurysms INTRACRANIAL Who to screen? ANEURYSM

Sperner, Tucker and Ky Fans lemmas for manifolds Oleg R. Musin University of Texas at

Random planar maps, alternating knots and links Gilles Sc haeer CNRS Sbastien

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!) Administrivia o

Provable Security Authenticated Key Exchange Joint work with Emmanuel Bresson and Olivier