Detecting gene-gene interactions in high-throughput genotype data - - PowerPoint PPT Presentation

detecting gene gene interactions in high throughput
SMART_READER_LITE
LIVE PREVIEW

Detecting gene-gene interactions in high-throughput genotype data - - PowerPoint PPT Presentation

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Sui-Pi Chen and Guan-Hua Huang


slide-1
SLIDE 1

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure

Sui-Pi Chen and Guan-Hua Huang

Institute of Statistics National Chiao Tung University Hsinchu, Taiwan B:ghuang@stat.nctu.edu.tw

2012.11.20 1 / 21

slide-2
SLIDE 2

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Motivation

Motivation

Cultural factors Individual environment Polygenic background Common environment

2 / 21

slide-3
SLIDE 3

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Motivation

Single nucleotide polymorphism (SNP)

A DNA sequence variation Two alleles: A and a Treating SNPs as categorical features that have three possible values: AA, Aa, aa. Relabel AA (2),Aa (1),aa (0).

3 / 21

slide-4
SLIDE 4

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Motivation

What is the gene−gene interaction (epistasis)?

The effects of a given gene on a biological trait are masked or enhanced by one or more genes. As increasing body of evidence has suggested that epistasis play an important role in susceptibility to human complex disease, such as Type 1 diabetes, breast cancer, obesity, and schizophrenia. More evidences have confirmed that display interaction effects without displaying marginal effect. When analyzing thousands and thousands genes from high-throughput SNP arrays, this can further complicate the problem due to computational burden.

4 / 21

slide-5
SLIDE 5

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Methods for detecting gene-gene interaction

Methods for detecting gene-gene interaction

epistasis

Traditional method Two-stage methods Data- mining Bayesian model selection

5 / 21

slide-6
SLIDE 6

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Methods for detecting gene-gene interaction

Methods for detecting gene-gene interaction

Traditional –Logistic regression, contingency table χ2 test method – It dose not include the interaction terms without main effect. – High-dimensional data that has high-order interactions, the contingency table have many empty cells. Two-stage – A subset of loci that pass some single-locus significance threshold method is chosen as the “filtered” subset. – An exhaustive search of all two-locus or higher-order interactions is carried out an the “filtered” subset. Data-mining –Nonparametic method –Not doing an exhaustive search –Multifactor Dimensionality Reduction (MDR) Bayesian model –Bayesian epistasis association mapping (BEAM) selection –Algorithm via Bayesian Clustering to Detect Epistasis (ABCDE)

6 / 21

slide-7
SLIDE 7

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Methods for detecting gene-gene interaction BEAM

BEAM algorithm

BEAM (Zhang and Liu, 2007) algorithm

case-control study Metropolis-Hasting algorithm posterior probabilities

  • each SNP not associated with the disease
  • each SNP associated with the disease
  • each SNP involved with other SNPs in epistasis

B statistic

each SNP or set of SNPs for significant association asymptotically distributed as a shifted χ2 with 3k − 1 degrees

  • f freedom

7 / 21

slide-8
SLIDE 8

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Methods for detecting gene-gene interaction BEAM

BEAM algorithm

I = (I1, · · · , IL) indicator the membership of the SNPs with Ij = 0, 1, 2. BEAM found no significant interactions associated in the AMD data.

Disease

8 / 21

slide-9
SLIDE 9

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Proposed method: ABCDE

Algorithm via Bayesian Clustering to Detect Epistasis (ABCDE)

Disease

(a) BEAM

Disease

Independent effect Independent effect Independent effect

(b) ABCDE

9 / 21

slide-10
SLIDE 10

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Proposed method: ABCDE

ABCDE algorithm

ABCDE algorithm

bayesian clustering approach case-control study Gibbs weighted Chinese restaurant (GWCR) procedure posterior probabilities

  • each SNPs is associated with the disease
  • clustered SNPs is associated with the disease.

Permutation test for candidate disease subset selected by ABCDE

10-fold cross validation the heart of MDR approach: dimensional reduction.

10 / 21

slide-11
SLIDE 11

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Proposed method: ABCDE Model

Product partition model

11 / 21

slide-12
SLIDE 12

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Simulation

Simulation

To evaluate the performance of ABCDE, we simulated data from 10 different models.

Single-set models (models 1-5) Multiple-set models (models 6-8) LD-extend models (models 9-10)

Comparison between ABCDE and BEAM.

12 / 21

slide-13
SLIDE 13

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Simulation

Single-set models

disease

Model 1 1 2 Model 2

disease

1,2 Model 3 Model 4 1,2,3 Model 5

disease disease

1,2,3,4,5,6

disease

1,2

13 / 21

slide-14
SLIDE 14

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Simulation

Result for Single-set models

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0 ABCDE BEAM

Model 1

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0

Model 2

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0

Model 3

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0

Model 4

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0

Model 5

14 / 21

slide-15
SLIDE 15

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Simulation

Multiple-set models and LD-extend models

disease

Model 6 1,2 3,4 Model 7

disease

1,2 3,4,5 Model 8

disease

1,2,3 4 5 Model 9 1,2 3,4 5 6 Model 10

disease disease

1,2 3,4 5 6 7 8

15 / 21

slide-16
SLIDE 16

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Simulation

Result for Multiple-set models and LD-extend models

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0

Model 6

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0

Model 7

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0

Model 8

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0

Model 9

0.05 0.1 0.2 0.5 MAF power 0.0 0.2 0.4 0.6 0.8 1.0

Model 10

16 / 21

slide-17
SLIDE 17

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Real data

Real data

Detect pairwise and/or higher-order SNP interactions and understand the genetic architecture of schizophrenia through ABCDE and BEAM. 1512 individuals, including 912 schizophrenia cases and 600 controls.

Gene Chr number DISC1 1q 16 LMBRD1 6q 11 DPYSL2 8p 14 TRIM35 8p 10 PTK2B 8p 19 NRG1 8p 10 DAO 12q 5 G72 13q 5 RASD2 22q 4 CACNG2 22q 6

17 / 21

slide-18
SLIDE 18

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Real data

Flow chart-Quality Control

1512 samples (912 cases , 600 controls) 100 SNPs (10 genes) Quality control <Haploview> Exclusion criterion of samples

  • individual with GCR<70%

Exclusion criterion of SNPs

  • HWp-value<0.0001
  • GCR<75%
  • MAF<0.005

1509 samples (909 cases , 600 controls) 95 SNPs (10 genes)

18 / 21

slide-19
SLIDE 19

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Real data

Result

Table: Identified significant epistatic sets by BEAM using all 95 SNPs.

SNP Chr. Gene B-statistic(p-value) BA(p-value) PA(p-value) rsDISC1P-3 1q DISC1 55.19(9.89 × 10−11) 0.5944(0) 0.5557(0.018) rsDISC1-23 1q DISC1 31.31(1.51 × 10−5) 0.5705(0) 0.5416(0.224) rsDPYSL-4 8p DPYSL 21.26(0.002) 0.5561(0) 0.5156(0.399) rsTRIM35-5 8p TRIM 32.23(9.52 × 10−6) 0.5693(0) 0.5296(0.386) rsNRG1P-7 8p NRG1 59.88(9.44 × 10−12) 0.5996(0) 0.5815(0.024) rsG72-E-2 13q G72 43.16(4.03 × 10−8) 0.5839(0) 0.5695(0.029) 19 / 21

slide-20
SLIDE 20

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Real data

Result

Table: Identified significant epistatic sets by ABCDE using all 95 SNPs.

SNPs Chr. Gene B-statistic(p-value) BA(p-value) PA(p-value) rsDPYSL-15,rsSDPYSL2-11 8p DPYSL 58.48(4 × 10−6) 0.5304(0.01) 0.5933(0.005) rsSTRIM35-1,rsTRIM35-2,rsTRIM35-5 8p TRIM35 127.97(0) 0.5647(0) 0.5146(0.412) rsSDPYSL2-1,rsDPYSL-3,rsDPYSL-4 8p DPYSL2 81.63(0.016) 0.5678(0) 0.6619(0) rsDAO-6,rsDAO-7,rsDAO-8 12q DAO 216.99(0) 0.582(0) 0.6531(0) rsG72-E-1,rsG72-E-2,rsG72-13 13q G72 91.00(5.32 × 10−4) 0.5866(0) 0.575(0.006) rsSDISC1-1,rsDISC1P-3, 1q DISC1 251.41(0) 0.6325(0) 0.6178(0) rsDISC1-23,rsDISC1-27 rsSDPYSL2-1,rsDPYSL-3, 8p DPYSL2 197.15(2.3 × 10−5) 0.5686(0) 0.6185(0) rsDPYSL-4,rsSDPYSL2-5 rsNRG1P-6,rsNRG1P-7, (8p, 22q) NRG1, 86.96(1) 0.5962(0) 0.5642(0.05) rsCACNG2-16,rsCACNG2-15 CACNG2 rsSTRIM35-1,rsTRIM35-2,rsTRIM35-4, 8p TRIM35 354.85(1) 0.572(0) 0.5255(0.403) rsTRIM35-5,rsTRIM35-6 rsDAO-6,rsDAO-7,rsDAO-8 (12q,22q) DAO, 171.62(1) 0.5737(0) 0.6137(0) rsCACNG2-2,rsCACNG2P-1, CACNG2 rsCNCNG2-18 20 / 21

slide-21
SLIDE 21

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure Conclusion

Conclusion

We propose the ABCDE algorithm which can character all explicit (interaction) effects, regardless of the number of groups. We further develop permutation tests to validate the disease association of SNP subsets selected by ABCDE. Applying ABCDE to the real data, we identify several known and novel schizophrenia-associated SNPs and sets of SNPs. We may develop a parallel implementation of the ABCDE, which is the algorithm for large scale epistatic interaction mapping, including genome-wide studies with hundreds of thousands of markers.

21 / 21