A comparative study of Gaussian Graphical Model approaches for - PowerPoint PPT Presentation

A comparative study of Gaussian Graphical Model approaches for genomic data Roberto Anglani Institute of Intelligent Systems for Automation, CNR-ISSIA, Bari, Italy in collaboration with PF Stifanelli, TM Creanza, VC Liuzzi, S Mukherjee, N Ancona 1st International Workshop on Pattern Recognition in Proteomics, Structural Biology and Bioinformatics. PR PS BB 2011 Ravenna, Italy, 13. Sept 2011

Motivation A living cell is a complex system Genes and gene products interact in complicated patterns controlled by biochemical interactions and regulatory activities Uncovering the interaction pictures SYSTEM BIOLOGY TASKS Modelling functional interactions between genes, proteins and transcriptional factors in a Gene Regulatory Network (GRN) R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

Motivation Complexity needs mathematical modelling High-throughput technologies provide huge amounts of data Theoretical and computational approaches are necessary to model gene regulatory networks Stochastic tools: Graphical models Study and visualize the conditional FOCUS independence structure between random variables (e.g. microarray data) R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

Scope Preliminary investigation on isoprenoid pathways in A. thaliana 1 Compare different theoretical approaches for the study of the conditional dependencies 2 Infer a gene network for the isoprenoid biosinthesis pathways in A. thaliana R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

1 Compare different theoretical approaches for the study of the conditional dependencies R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

1.0 Graphical models g g G = (V,E) GRAPH g genes VERTICES g g conditional dependencies EDGES g powerful tool for small # of genes ADVANTAGE (wrt # observations) SHORTCOMING high-throughput data # genes p >> # samples n for any statistical inference PROBLEM for the reliability of inferred GRNs R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

1.1 GGMs with pairwise Markov property In this study we consider only UNDIRECTED undirected Gaussian graphs GRAPHS with pairwise Markov property X = ( X 1 , X 2 , . . . , X p ) ∈ R p p-VARIATE NORMAL ( i, j ) / ∈ E ⊥ X j | X V \{ i,j } X i ⊥ DISTRIBUTION ⇔ ⇔ ρ ij · V \{ i,j } = 0 ABSENCE OF EDGE R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

1.2 Facing n<<p problem Partial correlation matrix is then crucial for study of the edge structure HOW TO SOLVE n << p PROBLEM? Reducing # of genes or gene lists NEGLECT MULTI- GENE EFFECTS Toh & Horimoto (2002) Evaluating only limited-order correlation Wille & Bulhman (2004), Castelo & Roverato (2006), Gilbert & Dudoit (2009) Regularized estimates of precision matrix GENE EFFECTS OK MULTI- Yuan & Lin (2007), Friedman & Tibshirani (2008), Witten & Tibshirani (2009) Pseudoinv. estimates of precision matrix Schaffer & Strimmer (2005) R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

1.3 Moore-Penrose Pseudoinverse   x 11 x 12 x 1 p · · · ESTIMATE OF = S DATASET w/ x 21 x 22 x 2 p · · ·   COVARIANCE X = n SAMPLES   . . .   = ˆ ESTIMATE OF p VARIABLES . . . Θ . . .  · · ·  INV. COVAR. n < p x n 1 x n 1 x np · · · PINV The precision matrix ϴ is obtained Moore-Penrose as pseudoinverse of S , by using the pseudoinverse Singular Value Decomposition θ ij ρ ij · V \{ i,j } = − i � = j � θ ii θ jj R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

1.4 L 2 penalization L 2C #2 The precision matrix ϴ is obtained Cov-regularized from maximization of a log-likelihood method function with a L 2 penalization Witten & Tibshirani (2009) L ( Θ ) = log det Θ − Tr( S Θ ) − λ � Θ � 2 ( λ > 0) F � s 2 i + 8 λ i = − s i EIGENVALUE θ ± Θ − 1 − 2 λ Θ = S 4 λ ± ⇒ PROBLEM 4 λ � ˆ i u i u ⊤ θ + Θ = � Θ � 2 F = tr( Θ ⊤ Θ ) i i λ that maximizes penalized log-likelihood: we carry out 20 random splits of CHOICE OF THE the dataset in a training and a validation sets and then we evaluate the log- PARAMETER λ likelihood over the validation set Friedman & Tibshirani (2008) R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

1.5 Regularized Least Squares RCM Given RLS estimates of the variables Residual corr. X i and X j , we evaluate Pearson method correlation between the residuals REGRESSION X i = � β ( i ) , X \ i \ j � + b i X j = � β ( j ) , X \ i \ j � + b j MODEL 1 REGULARIZED n � X i − β ( i ) X \ i \ j � 2 2 + λ � β ( i ) � 2 min 2 LEAST SQUARES β ∈ R p − 2 r j = ˜ RESIDUAL r i = ˜ X j − X j X i − X i VECTORS cov( r i , r j ) PARTIAL CORR ρ ij · V \{ i,j } = = r r i r j � MATRIX var( r i )var( r j ) CHOICE OF THE minimization of the Leave-One-Out cross validation errors PARAMETER λ R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

1.6 A comparative study from multivariate Gaussian p 50 200 400 GENERATED DATASETS distribution N ( 0 , Σ gs ), Σ gs = ϴ gs-1 n 20 200 500 STRUCTURE p(p-1)/2 RANDOM HUBS AND SPARSITY OF ϴ gs-1 CLIQUES 2p off-diagonal terms are set randomly to a fixed value θ ik = θ RANDOM HUBS we partition the columns into disjoint groups G k index k indicates the k -th column chosen as central in each group. off-diagonal terms θ ik = θ if i ∈ G k , otherwise θ ik = 0 CLIQUES fully connected hubs For each pattern, for each inferring method, we ACCURACY evaluate timing and AUC performances AND TIMING (Accuracy of classification of edges and non-edges) Friedman & Tibshirani (2010) R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

1.7 Results of comparative study p = 400 � 2 C PINV RCM n AUC AUC std T (s) AUC AUC std T (s) AUC AUC std T (s) 500 r 0.998 0.0001 38.86 0.987 0.0006 0.161 0.999 0.0001 8343 500 h 1.000 0.0000 83.74 0.999 0.0000 0.164 1.000 0.0000 6468 500 c 0.995 0.0002 84.95 0.963 0.0014 0.164 0.996 0.0002 6449 200 r 0.976 0.0003 38.44 0.581 0.0161 0.111 0.984 0.0006 3566 200 h 1.000 0.0000 81.13 0.806 0.0150 0.115 0.999 0.0001 3555 200 0.936 0.0008 82.02 0.587 0.0049 0.121 0.923 0.0009 3747 c 20 0.808 0.0011 39.03 0.929 0.0018 0.093 0.924 0.0017 105 r 20 h 0.999 0.0001 82.03 1.000 0.0000 0.091 0.999 0.0000 106 20 c 0.668 0.0014 82.13 0.659 0.0014 0.091 0.659 0.0014 108 p = 200 � 2 C PINV RCM n AUC AUC std T (s) AUC AUC std T (s) AUC AUC std T (s) 500 r 0.999 0.0001 5.807 0.999 0.0001 0.0377 0.999 0.0001 807 500 h 1.000 0.0000 10.655 1.000 0.0000 0.0376 1.000 0.0000 450 500 c 0.996 0.0002 10.821 0.999 0.0001 0.0439 0.999 0.0000 436 200 r 0.986 0.0003 5.592 0.703 0.0067 0.0310 0.990 0.0007 861 200 h 1.000 0.0000 10.425 0.748 0.0124 0.0309 0.999 0.0003 856 c 200 0.944 0.0010 10.529 0.612 0.0064 0.0336 0.950 0.0008 1028 20 r 0.784 0.0016 6.150 0.880 0.0048 0.0187 0.871 0.0046 24.5 20 h 0.999 0.0001 10.574 0.999 0.0002 0.0182 0.999 0.0001 27.9 Schaffer & c 20 0.669 0.0016 10.545 0.649 0.0017 0.0189 0.654 0.0017 25.3 Strimmer (2005) R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

2 Infer a gene network for the isoprenoid biosinthesis pathways in A. thaliana R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

2.1 Isoprenoid pathways in A. Thaliana ISOPRENOIDS group of plant natural products. FUNCTIONS membrane components, hormones and plant defence compounds, etc. MVA AND MPE PATHWAYS They are synthesized through two different routes that take place in two distinct cellular compartments . image from Universitat de Barcelona website http://www.bq.ub.es/~mrodrigu/RESEARCH.htm Evidence of interactions at metabolic level Gene expression levels do not respond to the single inhibition Laule et al., PNAS (2003) of the two pathways Beyond one-gene approach, a GRN has been inferred (795 gene expr. levels from other 56 pathways). It has been Wille & Bulhman, Genome shown the possible presence of various connections Biology (2004) between genes in the two pathways, i.e. possible crosstalk at trascriptional level R Anglani, PR PS BB 2011 - 13. Sept 2011 - A comparative study of GGM approaches for genomic data

A comparative study of Gaussian Graphical Model approaches for - PowerPoint PPT Presentation

A comparative study of Gaussian Graphical Model approaches for genomic data Roberto Anglani Institute of Intelligent Systems for Automation, CNR-ISSIA, Bari, Italy in collaboration with PF Stifanelli, TM Creanza, VC Liuzzi, S Mukherjee, N Ancona

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Comparative evaluation of an Comparative evaluation of an Eulerian Eulerian CFD and Gaussian

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Probabilistic Graphical Models Lecture 21: Advanced Gaussian Processes Andrew Gordon Wilson

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

Grbner Bases of Gaussian Graphical Models Alex Fink, Jenna Rajchgot, Seth Sullivant Queen Mary

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Complex Tryptic Digests Andrew Alpert PolyLC Inc. Columbia, MD U.S.A. HILIC versus RP: Inverse

Bioinformatics Introduction David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk

Revisiting Parameter Estimation in Biological Networks: Influence of Symmetries Jithin K.

Introduction to Bioinformatics Biological words Recap p DNA codes information with alphabet of 4

CSI5126 . Algorithms in bioinformatics Suffjx Trees Marcel Turcotte School of Electrical

Enabling knowledge management in the Agronomic Domain Pierre

Instruments: Networks of Excellence (NoE) Integrated Projects (IP) Specifically

Transfer Learning and Applications in Computational Biology 1 Christian Widmer, 1 , 2 Marius

A comparative study of Gaussian Graphical Model approaches for - PowerPoint PPT Presentation

A comparative study of Gaussian Graphical Model approaches for genomic data Roberto Anglani Institute of Intelligent Systems for Automation, CNR-ISSIA, Bari, Italy in collaboration with PF Stifanelli, TM Creanza, VC Liuzzi, S Mukherjee, N Ancona

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Comparative evaluation of an Comparative evaluation of an Eulerian Eulerian CFD and Gaussian

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Probabilistic Graphical Models Lecture 21: Advanced Gaussian Processes Andrew Gordon Wilson

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

Grbner Bases of Gaussian Graphical Models Alex Fink, Jenna Rajchgot, Seth Sullivant Queen Mary

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical &gt; Tangible? What are their limitations? 93 94 Graphical &gt; Tangible? Graphical

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Complex Tryptic Digests Andrew Alpert PolyLC Inc. Columbia, MD U.S.A. HILIC versus RP: Inverse

Bioinformatics Introduction David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk

Revisiting Parameter Estimation in Biological Networks: Influence of Symmetries Jithin K.

Introduction to Bioinformatics Biological words Recap p DNA codes information with alphabet of 4

CSI5126 . Algorithms in bioinformatics Suffjx Trees Marcel Turcotte School of Electrical

Enabling knowledge management in the Agronomic Domain Pierre

Instruments: Networks of Excellence (NoE) Integrated Projects (IP) Specifically

Transfer Learning and Applications in Computational Biology 1 Christian Widmer, 1 , 2 Marius

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical