Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 2: Bayesian Models for Integrative Genomics Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 1 / 34

Part 2: Bayesian Models for Integrative Genomics Summary of methods so far (annotated bibliography). Models that incorporate a priori biological information. Bayesian networks for genomic data integration. Ref: Vannucci, M. and Stingo, F.C. (2011). Bayesian Models for Variable Selection that Incorporate Biological Information (with discussion). In Bayesian Statistics 9 (J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D. Heckerman, A.F.M. Smith and M. West eds.). Oxford: University Press, 659-678. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 2 / 34

Identification of Genomic Biomarkers DNA microarrays allow the parallel quantification of thousands of genes in a single experiment. Goal: identification (selection) of biomarkers that predict a response (clinical outcome, survival time, etc.). Major challenge: small n , large p Biomarkers selection important for treatment strategies and diagnostic tools. Identifying individual genes as therapeutic targets not sufficient. Cancer drugs designed to target specific pathways Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 3 / 34

Pathways: Ordered series of chemical reactions in a living cell that serve different functions. Vast amount of biological knowledge generated and stored in public databases: KEGG, Cell Signaling Technology (CST) Pathway, Ivitrogen iPath, Reactome ... Pathways can be activated or inhibited at different points. Also, genes are not independent biological elements. Information is available on “gene networks”, describing relations among genes both within and between pathways. Signaling through branches or alternative pathways. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 4 / 34

Available data and information: Response variable - Y n × 1 - log(time to distant metastasis) Covariates (gene expressions) - X n × p Pathway-gene relationship - S p × K , where s jk = I { gene j ∈ pathway k } Gene-gene network - R p × p , where r ij = I { direct link between genes i and j } Therefore, We propose to incorporate pathway information in gene selection for disease prediction Priors that account for the gene network Select critical genes and pathways Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 5 / 34

Pathway analyses Gene-set enrichment analysis (Subramanian et al. ,2005) Other pathway-based analyses: Supergene (Park et al. , 2007) Cluster genes using GO, then filter by cluster size and PCs Perform Lasso for the selection of clusters Only selection on clusters, but not on genes Markov random field model (Wei & Li, 2007 & 2008) Gene selection. Identify differentially expressed genes between two experimental conditions utilizing the pathway structure information Bayes models: (Telesca et al. 2008) for gene selection and (Li & Zhang, 2010) for “motifs” selection We select both genes and pathways Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 6 / 34

Proposed Method Pathway information is used in the likelihood 1 to elicit prior 2 to structure MCMC moves 3 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 7 / 34

Model - Pathway Scores and Priors Y Y Y = 1 α + T T T β ε ∼ N ( 0 , σ 2 I I I ) β β + ε ε ε, ε ε T T is n × K and summarizes group behavior of genes as PCA components T obtained from the expression data of genes belonging to individual pathways. Pathway selection via a latent K -vector θ � 1 if pathway k is included k = 1 , . . . , K . θ k = 0 otherwise Mixture prior on regression coefficient β k indexed by θ k β k | θ k , σ 2 ∼ θ k · N ( β 0 , h σ 2 ) + ( 1 − θ k ) · δ 0 ( β k ) . Independent Bernoulli priors for θ k ’s and conjugate priors on α , σ 2 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 8 / 34

Gene Selection via a latent p -vector γ � 1 if gene j is included j = 1 , . . . , p . γ j = 0 otherwise Markov Random Field prior on γ exp ( γ j F ( γ j )) P ( γ j | θ θ, γ i , i ∈ N j ) = θ 1 + exp ( F ( γ j )) F ( γ j ) = µ + η � i ∈ N j ( 2 γ i − 1 ) and N j the set of neighbors of gene j from included pathways. µ controls sparsity. Higher η ’s induce more neighbors to take on same values. We use an hyperprior for η , η ∼ Gamma ( α η , β η ) . See also Wei & Li (2008, Ann. Appl. Stat. ), Telesca et al. (2008), Li & Zhang (2010, JASA ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 9 / 34

Model Fitting and Posterior Inference β , σ 2 , to get the marginal posterior Integrate out α , β β f ( θ γ, η, | Y Y Y , T T T ) ∝ f ( Y Y | T Y T T , θ γ ) · p ( γ θ | η ) · p ( η ) θ, γ θ γ θ θ, γ γ γ γ, θ θ We use a 2-stage Metropolis to update ( θ θ θ, γ γ γ ) pick a pathway k pick a gene j from pathway k add/delete set of moves (with constraints) and update the parameter η of the MRF by employing the general method proposed by Moller et al. (2006) that uses auxiliary variables. Inference for pathways and genes can be made based on: ( θ θ θ, γ γ γ ) with largest joint posterior probability, θ k ’s and γ j ’s with largest marginal posterior probabilities Prediction of future samples can be made via Bayesian model averaging. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 10 / 34

Case Study on Breast Cancer - Van’t Veer et al. (2002, Nature ) Microarray data for 76 breast cancer patients, of which 33 developed distant metastases within 5 years. X : gene expression. Y : log(time to distant metastasis) Matrices S and R : gene-pathways and gene-gene relationships: Link the probes to the Gene IDs (LocusLink) and link the Gene IDs to the pathways (KEGG) R package KEGG-graph to dowanload the gene network A total of 3,592 probes, mapped to 196 pathways, was included in the study. Training and validation sets A priori we expect about 10% good pathways and 3% of the genes Vague priors on model parameters Two MCMC chains with 600,000 iterations ( r = . 9996) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 11 / 34

60 120 50 100 Number of included pathways Number of included genes 40 80 30 60 20 40 10 20 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Iteration Iteration 5 5 x 10 x 10 Figure : Trace plots: Number of included pathways and genes Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 12 / 34

Prediction: MSE=1.57 (7 pathways & 12 genes) MSE=1.93 (11 genes, Sha et al. 2006, Bioinfo. ) Selection: 1.0 Purine metabolism MAPK signaling pathway Cytokine−cytokine receptor interaction 0.8 Neuroactive ligand−receptor interaction Cell cycle Marginal Posterior Probability Axon guidance Cell adhesion molecules (CAMs) Complement and coagulation cascades 0.6 Regulation of actin cytoskeleton Insulin signaling pathway Pathways in cancer 0.4 0.2 0.0 0 50 100 150 200 Pathway Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 13 / 34

Selection (cont’d): Singleton genes (no direct neighbor selected) ACACB (10), C4A (8,12), CALM1 (10), CCNB2 (5), CD4 (7), CDC2 (5), CLDN11 (7), FZD9 (11), GYS2 (10), HIST1H2BN (12), IFNA7 (3), NFASC (7), NRCAM (7), PCK1 (10), PFKP (10), PPARGC1A (10), PXN (9) Island 1 ACTB (9), ACTG1 (9), ITGA1 (9), ITGA7 (9), ITGB3 (9), ITGB4 (9), ITGB6 (9), ITGB8 (7,10), MYL5 (9), MYL9 (9), PDPK1 (10), PIK3CD (9,10,11), PLA2G4A (2), PLCG1 (11), PRKCA (2,11), PRKY (2,10), PRKY (2,10), PTGS2 (11), SOCS3 (10) Island 2 ACVR1B (2,3,11), ACVR1B (2,3,11), TGFB3 (2,3,5,11) Island 3 ENTPD3 (1), GMPS (1) Table : The 41 selected genes divided by islands and with associated pathway indices (in parenthesis). The pathway indices correspond to: 1-Purine metabolism, 2-MAPK signaling pathway, 3-Cytokine-cytokine receptor interaction, 4-Neuroactive ligand-receptor interaction, 5-Cell cycle, 6-Axon guidance, 7-Cell adhesion molecules (CAMs), 8-Complement and coagulation cascades, 9-Regulation of actin cytoskeleton, 10-Insulin signaling pathway, 11-Pathways in cancer, 12-Systemic lupus erythematosus. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 14 / 34

Island 8: DUSP3, DUSP4, MAPK10 Figure : Some selected pathways and islands (sets of connected genes). Stingo et al. ( Ann. Appl. Stat. , 2011) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 2) ABS13-Italy 06/17-21/2013 15 / 34

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 2: Bayesian Models for Integrative Genomics Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA)

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3:

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5:

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1:

Variable selection STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4:

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Fracking for Fossil Fuels in NZ Prepared for Eye of the Storm Climate Conference Victoria

The action of drugs is usually explained using the

Characterisation of the surface composition of Eucalyptus fibers by immunolabelling and enzymatic

M a p p i n g M i c r o b i a l M e t a b o l i s m U s i n g M e

Understanding User Cognition: from Everyday Behavior and Spatial Ability to Code Writing and

from Single Molecules to Pathways Zn,Cu-SOD IMS Cu + Cu + Cox11 MT Matrix CCO Cox17 2S-S

Cheap Talk Games with two types Felix Munoz-Garcia Strategy and Game Theory - Washington State

Safe Harbors and Evidence-Based Medicine M Maura Calsyn C l September 10, 2015 Current

Sambuz

Useful Links

Newsletter

Mail Us

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 2: Bayesian Models for Integrative Genomics Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA)

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Luigi Spezia Biomathematics &amp; Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3:

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5:

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1:

Variable selection STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4:

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Fracking for Fossil Fuels in NZ Prepared for Eye of the Storm Climate Conference Victoria

The action of drugs is usually explained using the

Characterisation of the surface composition of Eucalyptus fibers by immunolabelling and enzymatic

M a p p i n g M i c r o b i a l M e t a b o l i s m U s i n g M e

Understanding User Cognition: from Everyday Behavior and Spatial Ability to Code Writing and

from Single Molecules to Pathways Zn,Cu-SOD IMS Cu + Cu + Cox11 MT Matrix CCO Cox17 2S-S

Cheap Talk Games with two types Felix Munoz-Garcia Strategy and Game Theory - Washington State

Safe Harbors and Evidence-Based Medicine M Maura Calsyn C l September 10, 2015 Current

Sambuz

Useful Links

Newsletter

Mail Us

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?