systems genetics with graphical markov models
play

Systems genetics with graphical Markov models Robert Castelo - PowerPoint PPT Presentation

Systems genetics with graphical Markov models Robert Castelo robert.castelo@upf.edu @robertclab Dept. of Experimental and Health Sciences (DCEXS) Universitat Pompeu Fabra (UPF) Barcelona Machine Learning for Personalized Medicine Satellite


  1. Systems genetics with graphical Markov models Robert Castelo robert.castelo@upf.edu @robertclab Dept. of Experimental and Health Sciences (DCEXS) Universitat Pompeu Fabra (UPF) Barcelona Machine Learning for Personalized Medicine Satellite Symposium of the ESHG Conference Barcelona, May 19th, 2016 Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 1 / 63

  2. DCEXS/UPF is located at the Barcelona Biomedical Research Park (PRBB) Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 2 / 63

  3. Joint work with Inma Tur Alberto Roverato Kernel Analytics, Barcelona University of Bologna I. Tur, A. Roverato and R. Castelo. Mapping eQTL networks with mixed graphical Markov models. Genetics , 198(4):1377-1383, 2014. http://arxiv.org/abs/1402.4547 Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 3 / 63

  4. Motivation - Quantitative genetics Primary goal: finding the genetic basis of complex (quantitative) higher-order phenotypes (traits). Intercross (Fig. by Karl Broman in ” Introduction to QTL mapping in model organisms” ) 0.025 P 1 P 2 0.020 Density 0.015 F 1 F 1 0.010 0.005 0.000 F 2 60 80 100 120 140 160 180 HDL Leduc et al. Using bioinformatics and systems genetics to dissect HDL-cholesterol genetics in an MRL/MpJ x SM/J intercross. Journal of Lipid Research , 53:1163-1175, 2012. Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 4 / 63

  5. Motivation - Quantitative genetics Find DNA sites along the genome associated to the phenotype, known as quantitative trait loci (QTLs). Simplest approach: regress phenotype on each marker (Soller, 1976), calculating the so-called logarithm of odds (LOD) score. H 0 : y i ∼ N ( µ 0 , σ 2 H 1 : y i | g i ∼ N ( µ g i , σ 2 0 ) 1 ) . L 1 RSS 0 = n LOD = log 10 2 log 10 . L 0 RSS 1 12 10 LOD score 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Chromosome Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 5 / 63

  6. Motivation - Quantitative genetics Estimate the effect size of found QTLs using, for instance, the percentage of variance explained by the QTL. 160 η 2 = RSS 0 − RSS 1 140 = 0 . 346 . ( n − 1) · s 2 Y HDL 120 100 About 35% of the variability in HDL levels is explained by this QTL. 80 MM MS SS Genotype Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 6 / 63

  7. Motivation - Quantitative genetics on genomics data Yeast BY x RM cross (Fig. by Rockman and Kruglyak, 2006). The resulting data published by Brem and Kruglyak (2005) consists of ∼ 6 , 000 genes and ∼ 3 , 000 genotype markers. DNA sites along the genome associated to gene expression are called expression QTLs (eQTLs). Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 7 / 63

  8. Motivation - Quantitative genetics on genomics data Straightforward approach: apply classical QTL analysis methods independently on each gene expression profile (Soller, 1976): H 0 : y ∼ N ( µ 0 , σ 2 � L 1 RSS 0 0 ) = n LOD = log 10 2 log 10 . H 1 : y | g ∼ N ( µ g , σ 2 1 ) L 0 RSS 1 Plot location of genome-wide significant eQTLs with respect to both, eQTL and gene genomic position ( dot plot ). Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 8 / 63

  9. Motivation - Quantitative genetics on genomics data Let Γ denote the an index set for all genes with p Γ = | Γ | (thousands). Let n denote the number of profiled individuals (tens, hundreds). Let Y = { y ij } p Γ × n denote the matrix of gene expression values with p Γ ≫ n : 1 2 . . . Y n g 1 y 11 y 12 . . . y 2 n g 2 y 21 y 22 . . . y 2 n . . . g 3 y 31 y 32 y 3 n . . . . . . . . . . . . . . . g p Γ y p Γ 1 y p Γ 2 . . . y p Γ n Gene expression is a high-dimensional multivariate trait. Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 9 / 63

  10. Motivation - Quantitative genetics on genomics data Gene expression measurements by high-througthput instruments are the result of multiple types of effects : Genetic : DNA polymorphisms affecting transcription initiation and RNA processing. Molecular : RNA-binding events affecting post-transcriptional regulation (e.g., RNA degradation). Environmental : response of the cell to external stimuli. Technical : sample preparation protocols or laboratory conditions create sample-specific biases affecting most of the genes. All these effects render expression measurements in Y highly-correlated, thereby complicating the distinction between direct and indirect effects. Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 10 / 63

  11. Motivation - Quantitative genetics on genomics data Think of genes and eQTLs as forming a network, which we shall call an eQTL network . g5 QTL2 15 LOD scores 10 g15 g5 5 g22 g22 0 QTL2 0 20 40 60 80 100 120 g15 Map position (cM) Assume that gene expression forms a p Γ -multivariate sample following a conditional Gaussian distribution given the joint probability of all eQTLs = ⇒ mixed Graphical Markov model (Lauritzen and Wermuth, 1989) Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 11 / 63

  12. Software availability: the R/Bioconductor package qpgraph Available at http://bioconductor.org/packages/qpgraph Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 12 / 63

  13. Outline Overview of GMMs 1 Propagation of eQTL (genetic) additive effects 2 Conditional independence in mixed GMMs 3 q-Order correlation graphs 4 A three-step estimation strategy 5 Visualization of eQTL networks 6 Analysis of of a yeast cross 7 Concluding remarks 8 Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 13 / 63

  14. Outline Overview of GMMs 1 Propagation of eQTL (genetic) additive effects 2 Conditional independence in mixed GMMs 3 q-Order correlation graphs 4 A three-step estimation strategy 5 Visualization of eQTL networks 6 Analysis of of a yeast cross 7 Concluding remarks 8 Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 14 / 63

  15. Overview of GMMs - undirected Gaussian GMMs Let X V be continuous r.v.’s and G = ( V , E ) an undirected labeled graph: V = { 1 , ..., p } are the vertices of G X V ∼ P ( X V ) ≡ N ( µ, Σ) µ is the p -dimensional mean vector Σ = { σ ij } p × p is the covariance matrix Σ − 1 = { κ ij } p × p is the concentration matrix Note that Pearson and partial correlation coefficients follow from scaling covariance ( Σ ) and concentration ( Σ − 1 ) matrices, respectively: σ ij − κ ij , R = V \{ i , j } . ρ ij = ρ ij . R = √ σ ii σ jj √ κ ii κ jj Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 15 / 63

  16. Overview of GMMs - undirected Gaussian GMMs Let G = ( V , E ) be an undirected graph with V = { 1 , . . . , p } , a Gaussian graphical model can be described as follows: 5   κ 11 κ 12 0 0 0 κ 21 κ 22 κ 23 κ 24 0   Σ − 1 = 3 4   0 κ 32 κ 33 0 κ 35     0 κ 42 0 κ 44 κ 45   2 0 0 κ 53 κ 54 κ 55 1 A probability distribution P ( X V ) is undirected Markov w.r.t. G if ( i , j ) �∈ E ⇒ κ ij = 0 ⇔ X i ⊥ ⊥ X j | X V \{ X i , X j } These models are also known as covariance selection models (Dempster, 1972) or concentration graph models (Cox and Wermuth, 1996). Two vertices i and j are separated in G by a subset S ⊂ V \{ i , j } iff every path between i and j intersects S , denoted hereafter by i ⊥ G j | S . Global Markov property (Hammersley and Clifford, 1971): i ⊥ G j | S ⇒ X i ⊥ ⊥ X j | X S . Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 16 / 63

  17. Overview of GMMs - undirected Gaussian GMMs Consider simulating an undirected Gaussian GMM by simulating a covariance matrix Σ such that Σ is positive definite ( Σ ∈ S + ), 1 the off-diagonal cells of the scaled Σ corresponding to the present edges in 2 G match a given marginal correlation ρ , the zero pattern of Σ − 1 matches the missing edges in G . 3 This is not straightforward since setting directly off-diagonal cells to zero in some initial Γ ∈ S + will not typically lead to a positive definite matrix. Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 17 / 63

  18. Overview of GMMs - undirected Gaussian GMMs Let Γ G be an incomplete matrix with elements { γ ij } for i = j or ( i , j ) ∈ G . 1  ∗  γ 11 γ 12 γ 13 ∗ γ 21 γ 22 γ 24 Γ G =   2 3   γ 31 ∗ γ 33 γ 34   ∗ γ 42 γ 43 γ 44 4 Γ is a positive completion of Γ G if Γ ∈ S + and { Γ − 1 } ij =0 for i � = j , ( i , j ) �∈ G . Draw Γ G from a Wishart distribution W p (Λ , p ) ; Λ=∆ R ∆ , ∆=diag( { � 1 / p } p ) and R = { R ij } p × p where R ij = 1 for i = j and R ij = ρ for i � = j . It is required that Λ ∈ S + and this happens if and only if − 1 / ( p − 1) < ρ < 1 . Finally, to obtain Σ ≡ Γ from Γ G , qpgraph uses the regression algorithm by Hastie, Tibshirani and Friedman (2009, pg. 634) as matrix completion algorithm. Robert Castelo - robert.castelo@upf.edu - @robertclab Systems genetics with GMMs 18 / 63

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend