Gene Networks Estimation Extensions of the lasso Jos e S anchez - PowerPoint PPT Presentation

Gene Networks Estimation References Gene Networks Estimation Extensions of the lasso Jos´ e S´ anchez Mathematical Sciences, Chalmers University of Technology Sep 12, 2013

Gene Networks Cancer systems biology Estimation References The transfer of information from a protein to either DNA or RNA is not possible. This fact establishes a framework for the study of cancer at molecular level.

Gene Networks Network Modeling Estimation Why gene networks? References A gene regulatory network describes how genes interact with each other to form modules and carry out cell functions. Help in systematically understanding complex molecular mechanisms. Identification of hub genes, since they are potential disease drivers (Kendall et al., 2005; Mani et al., 2008; Nibbe et al., 2010; Slavov and Dawson, 2009).

Gene Networks Network Modeling Estimation Why gene networks? References A gene regulatory network describes how genes interact with each other to form modules and carry out cell functions. Help in systematically understanding complex molecular mechanisms. Identification of hub genes, since they are potential disease drivers (Kendall et al., 2005; Mani et al., 2008; Nibbe et al., 2010; Slavov and Dawson, 2009). Goals Estimation of joint gene regulatory networks for several types of cancer and data types. Incorporate biologically meaningful constraints into the model (commonality, modularity). Take into account the high-dimensionality ( p >> N )of the problem.

Gene Networks Gaussian Graphical Models Estimation References A graph consists of a set of vertices V and edges E , which is a subset of V × V . In a graphical model, the vertices correspond to a set of random variables X = ( X 1 , X 2 , . . . , X p ) coming from distribution P .

Gene Networks Gaussian Graphical Models Estimation References A graph consists of a set of vertices V and edges E , which is a subset of V × V . In a graphical model, the vertices correspond to a set of random variables X = ( X 1 , X 2 , . . . , X p ) coming from distribution P . A conditonal independence graph (CIG), is a graphical model where the absence of an edge between variables X i and X j implies that they are conditionally independent (given the rest), that is X i ⊥ X j | X V \{ i , j } .

Gene Networks Gaussian Graphical Models Estimation References A graph consists of a set of vertices V and edges E , which is a subset of V × V . In a graphical model, the vertices correspond to a set of random variables X = ( X 1 , X 2 , . . . , X p ) coming from distribution P . A conditonal independence graph (CIG), is a graphical model where the absence of an edge between variables X i and X j implies that they are conditionally independent (given the rest), that is X i ⊥ X j | X V \{ i , j } . If the variables X = ( X 1 , X 2 , . . . , X p ) come from the multivariate normal distribution N(0 , Σ), the CIG corresponds to a Gaussian Graphical Model (Lauritzen, 1996). In this case the conditional independencies between the variable is the model (the edges in the graph) are given by the inverse covariance matrix Θ = Σ − 1 .

Gene Networks Gene Network Modeling Estimation References GGM for gene networks Assume genes to be N ( µ, Σ) distributed and model using Gaussian graphical models. The links for the gene network are given by the non-zeros of the precision matrix Θ = Σ − 1 . Since p >> N problem the precision matrix can’t be estimated directly, regularization (sparsity) has to be introduced.

Gene Networks Gene Network Modeling Estimation References GGM for gene networks Assume genes to be N ( µ, Σ) distributed and model using Gaussian graphical models. The links for the gene network are given by the non-zeros of the precision matrix Θ = Σ − 1 . Since p >> N problem the precision matrix can’t be estimated directly, regularization (sparsity) has to be introduced. Not the only methods Bayesian networks. Information theory-based methods. Correlation based methods.

Gene Networks Network Modeling: a Estimation high-dimensional problem References We may not be grapes, but estimation of (human) gene networks is still a high-dimensional problem. Figure : Source: M. Pertea and S. Salzberg/Genome Biology 2010

Gene Networks The Lasso: an approach to the p >> N Estimation problem References Consider the usual multivariate regression setting. X 1 , X 2 , . . . , X n p-dimensional covariates and a univariate response Y 1 , Y 2 , . . . , Y n . We model the response variable through a linear model p � β j X j Y i = i + ε i i = 1 , 2 , . . . , n . j =1

Gene Networks The Lasso: an approach to the p >> N Estimation problem References Consider the usual multivariate regression setting. X 1 , X 2 , . . . , X n p-dimensional covariates and a univariate response Y 1 , Y 2 , . . . , Y n . We model the response variable through a linear model p � β j X j Y i = i + ε i i = 1 , 2 , . . . , n . j =1 The Lasso estimates for β are given by the minimizer of (Tibshirani, 1996) β ( λ ) = 1 ˆ n � Y − X β � 2 2 + λ � β � 1

Gene Networks Penalized GGM for gene networks Estimation Maximize the L 1 penalized likelihood function for the References precision matrix Θ l (Θ) = ln [det (Θ)] − tr ( S Θ) − g ( λ, Θ) where S k is 1 n X T X is the empirical covariance matrix. The graphical lasso (Friedman et al., 2008) � g ( λ, Θ) = λ | θ ij | i � = j

Gene Networks Penalized GGM for gene networks Estimation Maximize the L 1 penalized likelihood function for the References precision matrix Θ l (Θ) = ln [det (Θ)] − tr ( S Θ) − g ( λ, Θ) where S k is 1 n X T X is the empirical covariance matrix. The graphical lasso (Friedman et al., 2008) � g ( λ, Θ) = λ | θ ij | i � = j The group lasso (Yuan and Lin, 2007) � K K � � � � � � | θ k | θ k g ( λ, { Θ } ) = λ 1 ij | + λ 2 ij | � k =1 i � = j i � = j k =1 The fused lasso (Danaher et al., 2011) K K ij − θ k ′ � � � � | θ k | θ k g ( λ, { Θ } ) = λ 1 ij | + λ 2 ij | k =1 i � = j k < k ′ i , j

Gene Networks Network Modeling: a Estimation high-dimensional problem References Specifically, we are interested in estimating the networks for 8 cancer types and 6 types of variables. The problem results in the estimation of about 485 million edges. mRNA 7954 CNA 6562 miRNA 285 Methylation 3831 Mutation 469 Clinical 3

Gene Networks The Alternating Directions Method Estimation of Multipliers References To jointly model sparse GGM we propose an extended version of the fused lasso penalty. K � � tr( S k Θ k ) − ln � det(Θ k ) �� l ( { Θ } ) = n k − g ( λ, { Z } ) k =1 K � ij − Z k ′ � � � Z k � � � + (1 − α ) Z 2 � � Z k � � � � � � g ( λ, { Z } ) = λ 1 + λ 2 α � � � . ij ij � ij � k < k ′ k =1 i � = j i , j

Gene Networks The Alternating Directions Method Estimation of Multipliers References To jointly model sparse GGM we propose an extended version of the fused lasso penalty. K � � tr( S k Θ k ) − ln � det(Θ k ) �� l ( { Θ } ) = n k − g ( λ, { Z } ) k =1 K � ij − Z k ′ � � � � Z k � � + (1 − α ) Z 2 � � Z k � � � � � � g ( λ, { Z } ) = λ 1 + λ 2 α � � � . ij ij � ij � k < k ′ k =1 i � = j i , j The ADMM (Boyd et al., 2011) can be applied to the general problem minimize f ( { Θ } ) + g ( λ, { Z } ) { Θ } , { Z } Θ k = Z k , k = 1 , . . . , K . subject to

Gene Networks ADMM steps Estimation References ADMM solves this problem by defining the scaled augmented lagrangian as follows K ρ � Θ k − Z k + U k � 2 � L ( { Θ } , { Z } , { U } ) = f ( { Θ } ) + g ( λ, { Z } ) + F , 2 k =1 where U k are the dual variables. At iteration m , the variables { Θ } , { Z } and { U } are updated according to Θ k m ← arg min { Θ } { L ( { Θ } , { Z m − 1 } , { U m − 1 } ) } 1 Z k m ← arg min { Z } { L ( { Θ m } , { Z } , { U m − 1 } ) } 2 U k m ← U k m − 1 + Θ k m − Z k 3 m for k = 1 , . . . , K .

Gene Networks ADMM, first step Estimation References For the first step, function g is a constant, so the problem is to minimize the function K K + ρ � Θ k − Z k + U k � 2 � � tr( S k Θ k ) − ln det(Θ k ) � � �� n k F , 2 k =1 k =1 with respect to Θ. Let VDV T be the singular value decomposition of ρ/ n k ( Z k − U k ) − S k . The minimizer is given (Witten and Tibshirani, 2009) DV T where ˜ by V ˜ D is diagonal and � D 2 D jj = n k / 2 ρ ( D jj + jj + 4 ρ/ nk ) .

Gene Networks ADMM, second step Estimation For the second step, function f is a constant, so the problem References is to minimize the function K ρ � Θ k − Z k + U k � 2 � g ( λ, { Z } ) + F 2 k =1 K K ρ � � 2 � � Z k − A k � 2 α | Z k � Z k � � � = F + λ 1 ij | + (1 − α ) ij 2 k =1 k =1 i � = j ij − Z k ′ � � | Z k + λ 2 ij | , k < k ′ i , j with respect to Z , where A k = Θ k + U k . This problem is separable for each element ( i , j ), so we can solve separately the problems K � 1 � 2 � Z k ij − A k � minimize ij { Zij } 2 k =1  K λ 1 � � 2 � λ 2 ij − Z k ′ α | Z k � Z k | Z k  � � + I i � = j ij | + (1 − α ) + ij | ij ρ ρ k < k ′  k =1

Gene Networks Estimation Extensions of the lasso Jos e S anchez - PowerPoint PPT Presentation

Gene Networks Estimation References Gene Networks Estimation Extensions of the lasso Jos e S anchez Mathematical Sciences, Chalmers University of Technology Sep 12, 2013 Gene Networks Cancer systems biology Estimation References

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Boolean models of gene regulatory networks Matthew Macauley Math 4500: Mathematical Modeling

Gene-gene and gene-environment interactions in genetic case- control association studies Jurg Ott

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype

Gene finding Lorenzo Cerutti Swiss Institute of Bioinformatics EMBNet course, September 2002

Family-based analysis of genome-wide gene gene interactions Marit Ackermann Biotec TU Dresden

Gene Tree Parsimony for Incomplete Gene Trees Md. Shamsuzzoha Bayzid and Tandy Warnow

Introduction to Microarray Data Analysis and Gene Networks lecture 8 Alvis Brazma European

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Gene therapy for inborn errors of metabolism of the liver Sharon Cunningham Gene Therapy

Enabling Next Generation Gene Medicines October 3-5, 2018 Cell & Gene Meeting on the Mesa

Boolean models of the lac operon in E. coli Matthew Macauley Clemson University Gene expression

Presidential Commission for the Study of Bioethical Issues September 14, 2010 IGSC Presentation

Biological Pathways Representation by Petri Nets and extensions Andrea Marin December 6, 2006

Expression noise facilitates the evolution of gene regulation Wolf et al . ( eLife , 2015) Manraj

Binding of Activators and Repressors to DNA Part I: Equilibria Peter Schuster a a Theoretical

Tennessee Nurse Family Partnership Program Patti Vanhook, PhD, FNP-BC, FAAN Julie Hubbard, MSN,

2019 Texas Association of Counties Legislative Conference Keino McWhinney, MPP Texas Tech Mental

Case 3: PGx Data Submission to Biomarker Scientific Advice Task: What does the team do next? 1

P bli P bli Public Soybean Public Soybean S S b b Breeding Research in Breeding Research

Gene Networks Estimation Extensions of the lasso Jos e S anchez - PowerPoint PPT Presentation

Gene Networks Estimation References Gene Networks Estimation Extensions of the lasso Jos e S anchez Mathematical Sciences, Chalmers University of Technology Sep 12, 2013 Gene Networks Cancer systems biology Estimation References

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Boolean models of gene regulatory networks Matthew Macauley Math 4500: Mathematical Modeling

Gene-gene and gene-environment interactions in genetic case- control association studies Jurg Ott

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype

Gene finding Lorenzo Cerutti Swiss Institute of Bioinformatics EMBNet course, September 2002

Family-based analysis of genome-wide gene gene interactions Marit Ackermann Biotec TU Dresden

Gene Tree Parsimony for Incomplete Gene Trees Md. Shamsuzzoha Bayzid and Tandy Warnow

Introduction to Microarray Data Analysis and Gene Networks lecture 8 Alvis Brazma European

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Gene therapy for inborn errors of metabolism of the liver Sharon Cunningham Gene Therapy

Enabling Next Generation Gene Medicines October 3-5, 2018 Cell &amp; Gene Meeting on the Mesa

Boolean models of the lac operon in E. coli Matthew Macauley Clemson University Gene expression

Presidential Commission for the Study of Bioethical Issues September 14, 2010 IGSC Presentation

Biological Pathways Representation by Petri Nets and extensions Andrea Marin December 6, 2006

Expression noise facilitates the evolution of gene regulation Wolf et al . ( eLife , 2015) Manraj

Binding of Activators and Repressors to DNA Part I: Equilibria Peter Schuster a a Theoretical

Tennessee Nurse Family Partnership Program Patti Vanhook, PhD, FNP-BC, FAAN Julie Hubbard, MSN,

2019 Texas Association of Counties Legislative Conference Keino McWhinney, MPP Texas Tech Mental

Case 3: PGx Data Submission to Biomarker Scientific Advice Task: What does the team do next? 1

P bli P bli Public Soybean Public Soybean S S b b Breeding Research in Breeding Research

Enabling Next Generation Gene Medicines October 3-5, 2018 Cell & Gene Meeting on the Mesa