A Bayesian clustering approach for detecting gene-gene interactions - PowerPoint PPT Presentation

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Sui-Pi Chen and Guan-Hua Huang Institute of Statistics National Chiao Tung University Hsinchu, Taiwan B :ghuang@stat.nctu.edu.tw 2012.8.16 1 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Outline 1 Motivation 2 Methods for detecting gene-gene interaction 3 Proposed method: ABCDE 4 Simulation 5 Real data 6 Efficient Stochastic Search 7 Conclusion 2 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Motivation Outline 1 Motivation 2 Methods for detecting gene-gene interaction 3 Proposed method: ABCDE 4 Simulation 5 Real data 6 Efficient Stochastic Search 7 Conclusion 3 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Motivation Motivation Cultural Common factors environment Polygenic Individual background environment 4 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Motivation Single nucleotide polymorphism (SNP) A DNA sequence variation Two alleles: A and a Treating SNPs as categorical features that have three possible values: AA, Aa, aa. Relabel AA (2),Aa (1),aa (0). 5 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Motivation What is the gene − gene interaction (epistasis)? The effects of a given gene on a biological trait are masked or enhanced by one or more genes. As increasing body of evidence has suggested that epistasis ploy an important role in susceptibility to human complex disease, such as Type 1 diabetes, breast cancer, obesity, and schizophrenia. More evidences have confirmed that display interaction effects without displaying marginal effect. 6 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Methods for detecting gene-gene interaction Outline 1 Motivation 2 Methods for detecting gene-gene interaction MDR BEAM 3 Proposed method: ABCDE 4 Simulation 5 Real data 6 Efficient Stochastic Search 7 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Methods for detecting gene-gene interaction Methods for detecting gene-gene interaction Traditional method Bayesian Two-stage epistasis model methods selection Data- mining 8 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Methods for detecting gene-gene interaction Methods for detecting gene-gene interaction –Logistic regression, contingency table χ 2 test Traditional method – It dose not include the interaction terms without main effect. – High-dimensional data that has high-order interactions, the contingency table have many empty cells. Two-stage – A subset of loci that pass some single-locus significance threshold method is chosen as the “filtered” subset. – An exhaustive search of all two-locus or higher-order interactions is carried out an the “filtered” subset. Data-mining –Nonparametic method –Not doing an exhaustive search –Multifactor Dimensionality Reduction (MDR) Bayesian model –Bayesian epistasis association mapping (BEAM) selection –Algorithm via Bayesian Clustering to Detect Epistasis (ABCDE) 9 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Methods for detecting gene-gene interaction MDR Multifactor Dimensionality Reduction (MDR) Step 2: Calculate case-control Step 1: Step 3: Identify High-risk ratios for each Multilocus 2-locus Multilocus genotypes genotype 1,2,3 SNP 2 (1,2) (1,3) (2,3) SNP1 Step 6: Select best 2-locus model Step 5: Step 4: Cross-validation Average PE Caculate --prediction error (PE) 10 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Methods for detecting gene-gene interaction MDR MDR From all best models, the model with minimal average prediction error is the final best model. MDR is the data reduction strategy which is the nonparametric model and genetic model-free. Permutation test for the final best model. Applying MDR to 1000 permutation datasets, we use the PE of the 1000 final best models for the original data to create an empirical distribution for estimate of a p-value. Note. This permutation test includes the variation of the search. 11 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Methods for detecting gene-gene interaction BEAM BEAM algorithm BEAM (Zhang and Liu, 2007) algorithm case-control study Metropolis-Hasting algorithm posterior probabilities - each SNP not associated with the disease - each SNP associated with the disease - each SNP involved with other SNPs in epistasis B statistic each SNP or set of SNPs for significant association asymptotically distributed as a shifted χ 2 with 3 k − 1 degrees of freedom 12 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Methods for detecting gene-gene interaction BEAM BEAM algorithm I = ( I 1 , · · · , I L ) indicator the membership of the SNPs with I j = 0 , 1 , 2 . BEAM found no significant interactions associated in the AMD data. Disease 13 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Proposed method: ABCDE Outline 1 Motivation 2 Methods for detecting gene-gene interaction 3 Proposed method: ABCDE Model Stochastic search Permutation test 4 Simulation 5 Real data 6 Efficient Stochastic Search 14 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Proposed method: ABCDE Algorithm via Bayesian Clustering to Detect Epistasis (ABCDE) Disease Disease Independent effect Independent Independent effect effect (a) BEAM (b) ABCDE 15 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Proposed method: ABCDE ABCDE algorithm ABCDE algorithm bayesian clustering approach case-control study Gibbs weighted Chinese restaurant (GWCR) procedure posterior probabilities - each SNPs is associated with the disease - clustered SNPs is associated with the disease. Permutation test for candidate disease subset selected by ABCDE 10-fold cross validation the heart of MDR approach: dimensional reduction. 16 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Proposed method: ABCDE Example c = ( C 1 , · · · , C n ( c ) ) . c = ( { 1 } , { 2 , 3 } , { 4 , 5 } , { 6 } ) . Add the group indicator a = ( a 1 , a 2 , · · · , a n ( c ) ) . Group membership of subset C j : a j ∈ { 0 , 1 , 2 , · · · , g ( c ) } . The partition of interest is h = ( H 1 , · · · , H n ( h ) ) , where H j = ( C j , a j ) . h = ( { 1 } , { 2 , 3 } , { 4 , 5 } , { 6 } ) , (0 , 2 , 2 , 1)) . Disease SNP 2 SNP 4 SNP 6 SNP 1 SNP 3 SNP 5 17 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Proposed method: ABCDE Model Notations in ABCDE Treating SNPs as categorical features that have three possible values: AA(2), Aa(1), aa(0). N d cases and N u controls are genotyped at L SNPs. G = ( D , U ) D = ( d 1 , d 2 , · · · , d N d ) be the case genotype ; U = ( u 1 , u 2 , · · · , u N u ) be the control genotype. Genotypes of patient i at L SNPs: d i = ( d i 1 , · · · , d iL ) . Genotypes of control i at L SNPs: u i = ( u i 1 , · · · , u iL ) . Case Control 0210012112 0122201110 SNP1 SNP2 0120222110 0222001222 . . . . . . 1122100021 1002222110 SNP10 18 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Proposed method: ABCDE Model Product partition model 19 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Proposed method: ABCDE Model The data model- Group 0 Case genotype frequencies at unlinked SNPs are the same as control frequencies. Case Control Genotype AA Aa aa AA Aa aa Count m 0 j 1 m 0 j 2 m 0 j 3 n 0 j 1 n 0 j 2 n 0 j 3 Case+Control Genotype AA Aa aa Frequencies θ 0 j 1 θ 0 j 2 θ 0 j 3 Count m 0 j 1 + n 0 j 1 m 0 j 2 + n 0 j 2 m 0 j 3 + n 0 j 3 20 / 60

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Proposed method: ABCDE Model The data model- Group 0 Conditional distribution of G C j given h and θ 0 j as 3 � θ 0 ji ( m 0 ji + n 0 ji ) , f 0 ( G C j | θ 0j ) = i =1 Specify a Dirichlet( α 0 ) prior for θ 0 j = ( θ 0 j 1 , θ 0 j 2 , θ 0 j 3 ) , where α 0 = ( α 01 , α 02 , α 03 ) . We integrate out θ 0 j and get the marginal distribution given h as 3 Γ( | α 0 | ) Γ( α 0 i + m 0 ji + n 0 ji ) � f 0 ( G C j ) = , Γ( | α 0 | + N d + N u ) Γ( α 0 i ) i =1 | α 0 | : the sum of all elements in α 0 . 21 / 60

A Bayesian clustering approach for detecting gene-gene interactions - PowerPoint PPT Presentation

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Sui-Pi Chen and Guan-Hua Huang Institute

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Bayesian Two-way Clustering expression analysis: can they be made to work? for Gene Expression

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Phylogeny Phylogeny Evolutionary history of a species or a group of species Goal:

Salmon in Scottish coastal waters: recent advancements in knowledge in relation to their

Brains, Genes, and Language Evolution Morten H. Christiansen Cornell University Santa Fe

What are polymorphisms? Genetic differences between individuals in a population.

Investor Presentation April 2019 1 Market Overview M ARKET O VERVIEW Egypts Unique Dynamics

It Takes Heart to Be a Hero Song Prom omoting ting Am Amer erican ican He Heart t at t th

WIND TURBINE WAKES, WAKE EFFECT IMPACTS, AND WIND LEASES: USING SOLAR ACCESS LAWS AS THE MODEL

Annual report 2016 Short version excluding 2014 figures 1 Annual report 2016 Contents