Nested Effects Models at Work Prof. Dr. Holger Frhlich Algorithmic - - PowerPoint PPT Presentation

nested effects models at work
SMART_READER_LITE
LIVE PREVIEW

Nested Effects Models at Work Prof. Dr. Holger Frhlich Algorithmic - - PowerPoint PPT Presentation

30/09/2010 Nested Effects Models at Work Prof. Dr. Holger Frhlich Algorithmic Bioinformatics Bonn-Aachen International Center for Information Technology (B-IT) Principle Idea of Nested Effects Models Distinguish between: Perturbed


slide-1
SLIDE 1
  • Prof. Dr. Holger Fröhlich

Algorithmic Bioinformatics Bonn-Aachen International Center for Information Technology (B-IT)

30/09/2010

Nested Effects Models at Work

slide-2
SLIDE 2

Page 5 Holger Fröhlich Algorithmic Bioinformatics

S1 S2 S3 S4

Principle Idea of Nested Effects Models

E E E E E S1 S3 S2 S4 E E E E

Distinguish between: Perturbed genes (hidden variables) Observed effects Measure downstream effects of each knock- down Network reconstruction is based on observed effects under different perturbations

Observed effects Perturbed genes Markowetz et al., 2005

Φ θ

slide-3
SLIDE 3

Page 6 Holger Fröhlich Algorithmic Bioinformatics

Nested Effects Models (NEMs) are transitively closed causal networks explaining the nested structure of downstream effects.

slide-4
SLIDE 4

Page 7 Holger Fröhlich Algorithmic Bioinformatics

Likelihood of the Signaling Graph (Φ) Two different approaches:  Bayesian: Integrate over effects linkage graphs Θ assuming :  Take MAP/ML estimator for Θ:

( | ) ( | , ) ( ) P D P D P

Θ

Φ = Φ Θ Θ

ˆ arg max ( | , ) ( ) ˆ ( | , ) ( ) ˆ ( | , ) ( ) P D P P D P P D P D

Θ

Θ = Φ Θ Θ Φ Θ Φ Φ Θ =

Markowetz et al., 2005; Fröhlich et al., 2007, 2008 Tresch & Markowetz., 2008

( | ) ( ) P P Θ Φ = Θ

slide-5
SLIDE 5

Page 8 Holger Fröhlich Algorithmic Bioinformatics

Calculation of Effect Likelihoods  Factorization of the likelihood under i.i.d. assumption:

  • 1. Model for binary data D with fixed error probabilities α and β:

( | ) ( | , ) ( ) ( | , 1) ( 1) ~ ( | )

tk sk sk s S k t S tk tk ts s S k t S

P D P D P P D P P D m

ε ε Θ ∈ ∈ ∈ ∈ ∈ ∈

Φ = Φ Θ Θ = Φ Θ = Θ = = Φ

∫ ∑ ∏ ∏ ∑ ∏ ∏

1 if 1 ( | ) 1 if 1

tk tk tk tk tk tk

D D m P D m m α α β β = =   = = −   = − 

Markowetz et al., 2005

slide-6
SLIDE 6

Page 9 Holger Fröhlich Algorithmic Bioinformatics

Modeling Continuous Data

  • 2. Data D are computed as p-values for significant change, when

comparing interventions to non-interventions.

 Under the null hypothesis (i.e. expecting no effect) p-values are distributed uniformly  Under the alternative hypothesis (i.e. expecting an effect) there is a high density for small p-values and a strong decrease for increasing p-values [Pounds et al., 2003].  -> fit via EM algorithm

Fröhlich et al., 2008

1 2 3

( ) Beta( , ,1) Beta( ,1, )

tk k k tk t k tk t

f D D D π π α π β = + +

( ) (1) 1 (1)

if 1 ( | ) if 1

tk

f D f f tk tk tk tk

m P D m m

− −

 =  =  =  

slide-7
SLIDE 7

Page 10 Holger Fröhlich Algorithmic Bioinformatics

Bioconductor Package ”nem” library(nem) load(“raw_pvaluesBoutros2002.rda“) D = getDensityMatrix(pvalues)

slide-8
SLIDE 8

Page 11 Holger Fröhlich Algorithmic Bioinformatics

How to Infer the Network Structure?  Choose candidate graph  Calculate score, e.g. using Bayesian statistics (average over E-Gene positions)  Propose different topology

S1 S3 S2 E E E E E S4 E E E E

Likelihood model

Markowetz et al., 2005

  • Complete enumeration of

all topologies

Com

  • mbi

bina natorial al ex expl plos

  • sion:

n = 4: 355 possible networks n = 10: ~1027 possible networks

slide-9
SLIDE 9

Page 12 Holger Fröhlich Algorithmic Bioinformatics

Heuristics for Large Networks (> 4 S-Genes).

 Sampling Based (MCMC, Simulated Annealing)  SA: Fröhlich et al., BMC Bioinformatics, 2007  time consuming  neighborhood relation in transitively closed graphs difficult  Greedy hill climbing Fröhlich et al., Bioinformatics, 2008  Module networks Fröhlich et al., BMC Bioinformatics, 2007 Fröhlich et al., Bioinformatics, 2008  Triplets inference Markowetz et al., Bioinformatics, 2007  Alternating MAP optimization over Φ and θ Tresch and Markowetz, Stat. Appl. Mol. Biol., 2008

slide-10
SLIDE 10

Page 13 Holger Fröhlich Algorithmic Bioinformatics

Large Scale Networks: Module Networks

  • Problem:

complete enumeration of all network hypotheses

  • nly possible for small

networks (< 5 S-genes)

  • Solution:

Divide and conquer 1. Highest scoring subnetworks for modules of S-Genes 2. Estimate connections between modules

slide-11
SLIDE 11

Page 14 Holger Fröhlich Algorithmic Bioinformatics

Large Scale Networks: Module Networks

E S2 Network Log-likelihood S3 S5 S4 S9 E S8 E S6 S7 S1 S10

10

E E Fröhlich et al., 2007, 2008

slide-12
SLIDE 12

Page 15 Holger Fröhlich Algorithmic Bioinformatics

Network Inference with the nem-Package control=set.default.paramet ers(unique(colnames(D)), type="CONTmLLBayes") mynem = nem(D, inference=“ModuleNetwork “, control=control, verbose=FALSE) plot.nem(mynem, SCC=FALSE, D=D, draw.lines=TRUE)

slide-13
SLIDE 13

Page 16 Holger Fröhlich Algorithmic Bioinformatics

Automated Selection of Relevant E-Genes (Feature Selection)  Motivation: Irrelevant E-genes can degrade network estimation accuracy 1. Select E-Genes having a positive contribution to the model’s log-likelihood

  • nly.

2. Re-estimate the network with the new set of E-Genes 3. Iterate the process until convergence

Fröhlich et al., 2008

slide-14
SLIDE 14

Page 17 Holger Fröhlich Algorithmic Bioinformatics

Network Inference with the nem-Package D2 = BoutrosRNAiDiscrete[,9:16] control=set.default.parameters (unique(colnames(D2)), selEGenes=TRUE) mynem2 = nem(D2, inference=“triples“, control=control, verbose=FALSE) plot.nem(mynem2, D=D2, draw.lines=TRUE)

slide-15
SLIDE 15

Page 18 Holger Fröhlich Algorithmic Bioinformatics

Incorporation of Prior Knowledge

,

( ) ( ) | | 1 ( | ) exp 2

ij i j ij ij ij

P P P ν ν ν Φ = Φ   − Φ − Φ Φ =      

  • Bias scoring such that known interactions are considered
  • Bayesian prior on network structure

( )

2

( ) ( | ) ( ) ~ (1,0.5) 1 ( ) 1 2 | |

ij ij ij ij ij

P P P d InvGamma P ν ν ν ν

Φ = Φ Φ = + Φ − Φ

ν (scale of prior) Complete trust in prior Complete trust in data

Φ= Signaling Graph Φ‘ ‘ = Prior Belief ν = Hyperparameter of Laplace Distribution

Fröhlich et al., 2008

slide-16
SLIDE 16

Page 19 Holger Fröhlich Algorithmic Bioinformatics

Using Prior Knowledge with the nem-Package control=set.default.parameters (unique(colnames(D)), selEGenes=TRUE, type=“CONTmLLMAP“, Pm=diag(4)) mynem3 = nem(D, control=control, verbose=FALSE) plot.nem(mynem3, SCC=FALSE, D=D, draw.lines=TRUE)

slide-17
SLIDE 17

Page 20 Holger Fröhlich Algorithmic Bioinformatics

Statistical Stability and Significance  How stable the inferred network?  Do small changes of E-genes lead to different network hypotheses?   Use non-parametric bootstrap  Is the inferred network better than random?  Randomly permute node labels and look, whether random network has a higher likelihood.

Sample n E-genes with replacement

R Q S P

0.8 0.9 0.7 repeat

slide-18
SLIDE 18

Page 21 Holger Fröhlich Algorithmic Bioinformatics

Statistical Stability and Significance  How stable the inferred network?  Do small changes of E-genes lead to different network hypotheses?   Use non-parametric bootstrap  Is the inferred network better than random?  Randomly permute node labels and look, whether random network has a higher likelihood.

Sample n E-genes with replacement

S P R Q

0.8 0.9 0.7 repeat

slide-19
SLIDE 19

Page 22 Holger Fröhlich Algorithmic Bioinformatics

Statistical Stability and Significance  How stable the inferred network?  Do small changes of E-genes lead to different network hypotheses?   Use non-parametric bootstrap  Is the inferred network better than random?  Randomly permute node labels and look, whether random network has a higher likelihood.

Sample n E-genes with replacement

P R S Q

0.8 0.9 0.7 repeat

slide-20
SLIDE 20

Page 23 Holger Fröhlich Algorithmic Bioinformatics

Bootstrapping and Significance Calculation with the nem-Package control=set.default.parameters (unique(colnames(D)), type=“CONTmLLBayes“, Pm=diag(4)) mynem.boot = nem.bootstrap(D, nboot=100, control=control) plot.nem(mynem.boot, SCC=FALSE, plot.probs=TRUE) nem.calcSignificance(D, N=1000, mynem.boot)

p = 0.037 (label permutation test)

slide-21
SLIDE 21

Page 24 Holger Fröhlich Algorithmic Bioinformatics

Summary: nem-package  Inference of features of signaling pathways from high dimensional, targeted perturbation effects  Different likelihood models  Discretized data  P-value log-densities  Algorithms for inference of large networks  Module Networks  Triplets  Greedy hillclimbing  ...  Possibility to integrate prior knowledge  Automatic selection of relevant E-genes  Various plotting and analysis methods  Non-parametric bootstrap  Label permutation p-values

slide-22
SLIDE 22

Page 25 Holger Fröhlich Algorithmic Bioinformatics

Fast and Efficient Learning of Dynamic Nested Effects Models Please come to our poster (G28, Tue)!

slide-23
SLIDE 23

Page 26 Holger Fröhlich Algorithmic Bioinformatics

Acknowledgements  German Cancer Research Cancer (DKFZ)  Holger Sültmann  Marc Fellmann (alumnus)  Sabrina Belauger  University of Göttingen  Tim Beißbarth  University of Regensburg  Rainer Spang  University of Munich (LMU)  Achim Tresch  Cancer Research UK  Florian Markowetz  Bonn-Aachen International Center for IT (B-IT) Algorithmic Bioinformatics  Paurush Praveen  Khalid Abnaof  Yupeng Cun  Jan Plitschka  Afshin Sadeghi  Amit Kawalia  Sohil Anand