- Prof. Dr. Holger Fröhlich
Algorithmic Bioinformatics Bonn-Aachen International Center for Information Technology (B-IT)
30/09/2010
Nested Effects Models at Work Prof. Dr. Holger Frhlich Algorithmic - - PowerPoint PPT Presentation
30/09/2010 Nested Effects Models at Work Prof. Dr. Holger Frhlich Algorithmic Bioinformatics Bonn-Aachen International Center for Information Technology (B-IT) Principle Idea of Nested Effects Models Distinguish between: Perturbed
Algorithmic Bioinformatics Bonn-Aachen International Center for Information Technology (B-IT)
30/09/2010
Page 5 Holger Fröhlich Algorithmic Bioinformatics
S1 S2 S3 S4
Principle Idea of Nested Effects Models
E E E E E S1 S3 S2 S4 E E E E
Distinguish between: Perturbed genes (hidden variables) Observed effects Measure downstream effects of each knock- down Network reconstruction is based on observed effects under different perturbations
Observed effects Perturbed genes Markowetz et al., 2005
Φ θ
Page 6 Holger Fröhlich Algorithmic Bioinformatics
Page 7 Holger Fröhlich Algorithmic Bioinformatics
Likelihood of the Signaling Graph (Φ) Two different approaches: Bayesian: Integrate over effects linkage graphs Θ assuming : Take MAP/ML estimator for Θ:
Θ
Θ
Markowetz et al., 2005; Fröhlich et al., 2007, 2008 Tresch & Markowetz., 2008
( | ) ( ) P P Θ Φ = Θ
Page 8 Holger Fröhlich Algorithmic Bioinformatics
Calculation of Effect Likelihoods Factorization of the likelihood under i.i.d. assumption:
( | ) ( | , ) ( ) ( | , 1) ( 1) ~ ( | )
tk sk sk s S k t S tk tk ts s S k t S
P D P D P P D P P D m
ε ε Θ ∈ ∈ ∈ ∈ ∈ ∈
Φ = Φ Θ Θ = Φ Θ = Θ = = Φ
1 if 1 ( | ) 1 if 1
tk tk tk tk tk tk
D D m P D m m α α β β = = = = − = −
Markowetz et al., 2005
Page 9 Holger Fröhlich Algorithmic Bioinformatics
Modeling Continuous Data
comparing interventions to non-interventions.
Under the null hypothesis (i.e. expecting no effect) p-values are distributed uniformly Under the alternative hypothesis (i.e. expecting an effect) there is a high density for small p-values and a strong decrease for increasing p-values [Pounds et al., 2003]. -> fit via EM algorithm
Fröhlich et al., 2008
1 2 3
( ) Beta( , ,1) Beta( ,1, )
tk k k tk t k tk t
f D D D π π α π β = + +
( ) (1) 1 (1)
tk
f D f f tk tk tk tk
− −
Page 10 Holger Fröhlich Algorithmic Bioinformatics
Bioconductor Package ”nem” library(nem) load(“raw_pvaluesBoutros2002.rda“) D = getDensityMatrix(pvalues)
Page 11 Holger Fröhlich Algorithmic Bioinformatics
How to Infer the Network Structure? Choose candidate graph Calculate score, e.g. using Bayesian statistics (average over E-Gene positions) Propose different topology
S1 S3 S2 E E E E E S4 E E E E
Likelihood model
Markowetz et al., 2005
all topologies
Com
bina natorial al ex expl plos
n = 4: 355 possible networks n = 10: ~1027 possible networks
Page 12 Holger Fröhlich Algorithmic Bioinformatics
Heuristics for Large Networks (> 4 S-Genes).
Sampling Based (MCMC, Simulated Annealing) SA: Fröhlich et al., BMC Bioinformatics, 2007 time consuming neighborhood relation in transitively closed graphs difficult Greedy hill climbing Fröhlich et al., Bioinformatics, 2008 Module networks Fröhlich et al., BMC Bioinformatics, 2007 Fröhlich et al., Bioinformatics, 2008 Triplets inference Markowetz et al., Bioinformatics, 2007 Alternating MAP optimization over Φ and θ Tresch and Markowetz, Stat. Appl. Mol. Biol., 2008
Page 13 Holger Fröhlich Algorithmic Bioinformatics
Large Scale Networks: Module Networks
complete enumeration of all network hypotheses
networks (< 5 S-genes)
Divide and conquer 1. Highest scoring subnetworks for modules of S-Genes 2. Estimate connections between modules
Page 14 Holger Fröhlich Algorithmic Bioinformatics
Large Scale Networks: Module Networks
E S2 Network Log-likelihood S3 S5 S4 S9 E S8 E S6 S7 S1 S10
10
E E Fröhlich et al., 2007, 2008
Page 15 Holger Fröhlich Algorithmic Bioinformatics
Network Inference with the nem-Package control=set.default.paramet ers(unique(colnames(D)), type="CONTmLLBayes") mynem = nem(D, inference=“ModuleNetwork “, control=control, verbose=FALSE) plot.nem(mynem, SCC=FALSE, D=D, draw.lines=TRUE)
Page 16 Holger Fröhlich Algorithmic Bioinformatics
Automated Selection of Relevant E-Genes (Feature Selection) Motivation: Irrelevant E-genes can degrade network estimation accuracy 1. Select E-Genes having a positive contribution to the model’s log-likelihood
2. Re-estimate the network with the new set of E-Genes 3. Iterate the process until convergence
Fröhlich et al., 2008
Page 17 Holger Fröhlich Algorithmic Bioinformatics
Network Inference with the nem-Package D2 = BoutrosRNAiDiscrete[,9:16] control=set.default.parameters (unique(colnames(D2)), selEGenes=TRUE) mynem2 = nem(D2, inference=“triples“, control=control, verbose=FALSE) plot.nem(mynem2, D=D2, draw.lines=TRUE)
Page 18 Holger Fröhlich Algorithmic Bioinformatics
Incorporation of Prior Knowledge
,
( ) ( ) | | 1 ( | ) exp 2
ij i j ij ij ij
P P P ν ν ν Φ = Φ − Φ − Φ Φ =
( )
2
( ) ( | ) ( ) ~ (1,0.5) 1 ( ) 1 2 | |
ij ij ij ij ij
P P P d InvGamma P ν ν ν ν
∞
Φ = Φ Φ = + Φ − Φ
ν (scale of prior) Complete trust in prior Complete trust in data
Φ= Signaling Graph Φ‘ ‘ = Prior Belief ν = Hyperparameter of Laplace Distribution
Fröhlich et al., 2008
Page 19 Holger Fröhlich Algorithmic Bioinformatics
Using Prior Knowledge with the nem-Package control=set.default.parameters (unique(colnames(D)), selEGenes=TRUE, type=“CONTmLLMAP“, Pm=diag(4)) mynem3 = nem(D, control=control, verbose=FALSE) plot.nem(mynem3, SCC=FALSE, D=D, draw.lines=TRUE)
Page 20 Holger Fröhlich Algorithmic Bioinformatics
Statistical Stability and Significance How stable the inferred network? Do small changes of E-genes lead to different network hypotheses? Use non-parametric bootstrap Is the inferred network better than random? Randomly permute node labels and look, whether random network has a higher likelihood.
Sample n E-genes with replacement
R Q S P
0.8 0.9 0.7 repeat
Page 21 Holger Fröhlich Algorithmic Bioinformatics
Statistical Stability and Significance How stable the inferred network? Do small changes of E-genes lead to different network hypotheses? Use non-parametric bootstrap Is the inferred network better than random? Randomly permute node labels and look, whether random network has a higher likelihood.
Sample n E-genes with replacement
S P R Q
0.8 0.9 0.7 repeat
Page 22 Holger Fröhlich Algorithmic Bioinformatics
Statistical Stability and Significance How stable the inferred network? Do small changes of E-genes lead to different network hypotheses? Use non-parametric bootstrap Is the inferred network better than random? Randomly permute node labels and look, whether random network has a higher likelihood.
Sample n E-genes with replacement
P R S Q
0.8 0.9 0.7 repeat
Page 23 Holger Fröhlich Algorithmic Bioinformatics
Bootstrapping and Significance Calculation with the nem-Package control=set.default.parameters (unique(colnames(D)), type=“CONTmLLBayes“, Pm=diag(4)) mynem.boot = nem.bootstrap(D, nboot=100, control=control) plot.nem(mynem.boot, SCC=FALSE, plot.probs=TRUE) nem.calcSignificance(D, N=1000, mynem.boot)
p = 0.037 (label permutation test)
Page 24 Holger Fröhlich Algorithmic Bioinformatics
Summary: nem-package Inference of features of signaling pathways from high dimensional, targeted perturbation effects Different likelihood models Discretized data P-value log-densities Algorithms for inference of large networks Module Networks Triplets Greedy hillclimbing ... Possibility to integrate prior knowledge Automatic selection of relevant E-genes Various plotting and analysis methods Non-parametric bootstrap Label permutation p-values
Page 25 Holger Fröhlich Algorithmic Bioinformatics
Page 26 Holger Fröhlich Algorithmic Bioinformatics
Acknowledgements German Cancer Research Cancer (DKFZ) Holger Sültmann Marc Fellmann (alumnus) Sabrina Belauger University of Göttingen Tim Beißbarth University of Regensburg Rainer Spang University of Munich (LMU) Achim Tresch Cancer Research UK Florian Markowetz Bonn-Aachen International Center for IT (B-IT) Algorithmic Bioinformatics Paurush Praveen Khalid Abnaof Yupeng Cun Jan Plitschka Afshin Sadeghi Amit Kawalia Sohil Anand