[PPT] - Detecting Epistatic Interactions Contributing to a Quantitative PowerPoint Presentation

SLIDE 1

Detecting Epistatic Interactions Contributing to a Quantitative Trait: The Restricted Partition Method

Rob Culverhouse, PhD Washington University in St. Louis, School of Medicine May 28, 2004

SLIDE 2

Single locus analog for our analyses:

Measured Genotype

Quantitative trait analysis using unrelated individuals

No notion of “affected” without placing a threshold
For loci in linkage disequilibrium with trait locus,

expect genotypes to have different mean trait values 41.5 12.2 34.5 mean(trait) aa Aa AA

SLIDE 3

Epistasis

Genes interacting in a non-additive way

SLIDE 4

Epistasis

Genes interacting in a non-additive way Examples:

Triglyceride level (Nelson et al. 2001)
Alzheimer disease (Zubenko et al. 2001)
Breast cancer (Ritchie et al. 2001)

SLIDE 5

Epistasis

Genes interacting in a non-additive way Examples:

Triglyceride level (Nelson et al. 2001)
Alzheimer disease (Zubenko et al. 2001)
Breast cancer (Ritchie et al. 2001)
Drug effects (response and toxicity)

SLIDE 6

Epistasis

Genes interacting in a non-additive way Some possible consequences:

Which is the “bad” allele may depend on

genetic background or environmental exposure

SLIDE 7

Kardia et al 1999.

SLIDE 8

Epistasis

Genes interacting in a non-additive way Some possible consequences:

Which is the “bad” allele may depend on

genetic background or environmental exposure

“Importance” of a locus depends on allele freq.

SLIDE 9

“Importance” of a locus depends on allele freq

Fixed genetic model for TSC

Alan Templeton 2000

0.50 0.50 0.95 0.03 0.02 Population 2 0.78 0.22 0.15 0.77 0.08 Population 1 p(A2) p(A1) p(ε4) p(ε3) p(ε2) LDLR alleles ApoE alleles

SLIDE 10

“Importance” of a locus depends on allele freq

Fixed genetic model for TSC

Alan Templeton 2000

0.50 0.50 0.95 0.03 0.02 Population 2 0.78 0.22 0.15 0.77 0.08 Population 1 p(A2) p(A1) p(ε4) p(ε3) p(ε2) LDLR alleles ApoE alleles

SLIDE 11

“Importance” of a locus depends on allele freq

Fixed genetic model for TSC

% Variance explained 31.1 52.8 total 2.0 25.3 3.7 Population 2 8.9 2.9 41.0 Population 1 ApoE x LDLR LDLR ApoE

Alan Templeton 2000

0.50 0.50 0.95 0.03 0.02 Population 2 0.78 0.22 0.15 0.77 0.08 Population 1 p(A2) p(A1) p(ε4) p(ε3) p(ε2) LDLR alleles ApoE alleles

SLIDE 12

Epistasis

Genes interacting in a non-additive way Some possible consequences:

Which is the “bad” allele may depend on

genetic background or environmental exposure

“Importance” of a locus depends on allele freq.
Contributing loci may only be noticed in a

multilocus analysis

SLIDE 13

0.0 8.7 1.0

iability Explained by Best e Genotypic Classes Males, n=188

InDel & HincII HincII (LDLR) InDel (A1C3A4 ) Single Site Contributions Best Set % of variation explained

Variability in Ln(Triglyceride) explained by Single locus vs Two locus analyses

(Nelson et al 2001)

Males, N =188

SLIDE 14

0.0 8.7 1.0

iability Explained by Best e Genotypic Classes Males, n=188

InDel & HincII HincII (LDLR) InDel (A1C3A4 ) Single Site Contributions Best Set % of variation explained

Variability in Ln(Triglyceride) explained by Single locus vs Two locus analyses

(Nelson et al 2001)

Males, N =188

SLIDE 15

Two Locus Epistatic Model

(a qualitative trait example)

0.5 0.5 0.5

0.5

? ? ?

aa 0.5

? ? ?

Aa

? bb

0.5

? ?

AA

Bb BB p(A)=p(B)=0.5 Cell entries indicate probability

f having disease

Analyzing these loci separately would give the impression that neither one contributes to the phenotype

SLIDE 16

Two Locus Epistatic Model

(a qualitative trait example) 0.5 0.5 0.5

0.5 ? ? ? aa 0.5 ? ? ? Aa ?

bb

0.5 ? ? AA

Bb BB

p(A)=p(B)=0.5 Cell entries indicate probability

f having disease

Analyzing these loci separately would give the impression that neither one contributes to the phenotype

SLIDE 17

Two Locus Epistatic Model

(a qualitative trait example)

0.5 0.5 0.5 0.5

1 1

aa 0.5

1

Aa

1

bb 0.5

1

AA Bb BB p(A)=p(B)=0.5 Cell entries indicate probability

f having disease

In fact, the trait is completely determined by the 2-locus genotype

SLIDE 18

Maximum Possible Heritability

in Purely Epistatic (Qualitative) Models

SLIDE 19

Maximum Possible Heritability

in Purely Epistatic (Qualitative) Models

SLIDE 20

Maximum Possible Heritability

in Purely Epistatic (Qualitative) Models

SLIDE 21

Testing for Epistasis contributing to quantitative traits

Basic Question: Do subsets of multi-locus genotypes correspond to different mean trait values?

SLIDE 22

Testing for Epistasis contributing to quantitative traits

Basic Question: Do subsets of multi-locus genotypes correspond to different mean trait values? Simplest approach: F-test for difference in means between several groups Drawbacks:

Rejection of the null does not provide a model
No measure of importance for the differences

SLIDE 23

Combinatorial Partition Method

(Nelson et al. 2001) Evaluates every partition a multilocus genotype matrix for the amount of phenotypic variation explained Advantages:

Provides an epistatic model for further investigation
Relates the partition to a measure of importance: R2

SLIDE 24

Combinatorial Partition Method

(Nelson et al. 2001) Evaluates every partition a multilocus genotype matrix for the amount of phenotypic variation explained Advantages:

Provides an epistatic model for further investigation
Relates the partition to a measure of importance: R2

Drawbacks:

Computation - (impractical for more than 2 loci)
No easy way to assess statistical significance

SLIDE 25

CPM algorithm for 2-locus analyses

CPM (Nelson et al. 2001. Genome Research 11:458-470)

Thanks to Taylor Maxwell

SLIDE 26

Computations for CPM

Ways to partition g genotypes into K sets:

21,146 partitions evaluated for each pair of bi-allelic candidate loci

Approximately 1021 partitions for each combination of 3 loci

S(g,k) = 1 k! (−1)i k i ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

i= 0 k−1

∑

(k − i)g

SLIDE 27

Computations for CPM

Ways to partition g genotypes into K sets:

21,146 partitions evaluated for each pair of bi-allelic candidate loci

Approximately 1021 partitions for each combination of 3 loci Evaluating 1 million partitions each second, checking the partitions for the first three loci: 31 million years

S(g,k) = 1 k! (−1)i k i ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

i= 0 k−1

∑

(k − i)g

SLIDE 28

21 39 34 InDel13

I/I I/D D/D

4.85 4.99

16 7 HincII 11 30 22 8

4.66

12 10 3 9 4 4 4 16 10 5 1 10 8 8 5 22 13 6 1

+/+ +/-

/-

23 1 6 3 4

4.79 5.04 4.58

PON192

9.26% 20.1%

55 62 71 52 78 58

+/+ +/- -/- I/I I/D D/D I/I I/D D/D I/I I/D D/D +/+ +/- -/- +/+ +/- -/- +/+ +/- -/-

Mean STD

0.39 0.47 0.37

Mean STD

0.37 0.45 0.31

Thanks to Taylor Maxwell

Why a 3-locus analysis might be good:

Serum Triglyceride 2-loci explain 9.3% of the trait variation, 3-loci explain 20.1%

SLIDE 29

Observation

No partition that merges genotypes with widely differing means can be efficient at explaining the variation This fact can be used to restrict the number of partitions evaluated

SLIDE 30

Observation

Quantitative Trait Genotypes

SLIDE 31

Restricted Partition Method

Algorithm:

Test cells for different means (using multiple comparison method)
Merge two nearest groups (that are not significantly different)
Iterate until groups all different or all cells are merged

If more than one group remains, evaluate model for variation explained (R2)

SLIDE 32

aa Aa AA bb Bb BB

SLIDE 33

aa Aa AA bb Bb BB

SLIDE 34

aa Aa AA bb Bb BB

SLIDE 35

aa Aa AA bb Bb BB

SLIDE 36

aa Aa AA bb Bb BB

SLIDE 37

aa Aa AA bb Bb BB

SLIDE 38

aa Aa AA bb Bb BB

SLIDE 39

Computational Complexity for RPM

80 iterations, one evaluation 26 iterations, one evaluation 8 iterations to find the partition,

ne partition evaluated

RPM

4 3 2

simultaneous loci analyzed

SLIDE 40

Computational Complexity for RPM

80 iterations, one evaluation 26 iterations, one evaluation 8 iterations to find the partition,

ne partition evaluated

RPM

> 1088 4 > 1021 3 21,146 2

CPM

simultaneous loci analyzed

SLIDE 41

What to do with the extra clock cycles?

Use permutation tests to obtain p-values for the results

SLIDE 42

Testing the RPM

Initial Simulations:

A class of purely epistatic quantitative trait model
2 contributing and 8 unlinked loci simulated (allele freq = 0.5 for all)
Groups had different mean trait values = µi
Traits of individuals = µi + ε

(ε from N(0,1))

4 distances between the group means examined
500 unrelated subjects each simulation

Checker board

SLIDE 43

Testing the RPM

(Simulated Data - 1000 data sets, 500 individuals each)

1.1 40.2 90.0 FP% Other loci Contributing Loci 0.508 0.209 0.066 0.024

RPM R2

0.014 0.015 0.014 0.014 R2 ≠ 0 37.6 77.9 0.500 2.0 38.3 79.3 0.200 1.0 35.8 51.4 0.059 0.5 37.8 9.7 0.015 0.25 TP % TP%

Model R2

sd

SLIDE 44

Testing the RPM

(Simulated Data)

1.1 40.2 90.0 FP% Other loci Contributing Loci 0.508 0.209 0.066 0.024

RPM R2

0.014 0.015 0.014 0.014 R2 ≠ 0 37.6 77.9 0.500 2.0 38.3 79.3 0.200 1.0 35.8 51.4 0.059 0.5 37.8 9.7 0.015 0.25 TP % TP%

Model R2

sd

SLIDE 45

Testing the RPM

(Simulated Data)

1.1 40.2 90.0 FP% Other loci Contributing Loci 0.508 0.209 0.066 0.024

RPM R2

0.014 0.015 0.014 0.014 R2 ≠ 0 37.6 77.9 0.500 2.0 38.3 79.3 0.200 1.0 35.8 51.4 0.059 0.5 37.8 9.7 0.015 0.25 TP % TP%

Model R2

sd

SLIDE 46

Testing the RPM

(Simulated Data)

1.1 40.2 90.0 FP% Other loci Contributing Loci 0.508 0.209 0.066 0.024

RPM R2

0.014 0.015 0.014 0.014 R2 ≠ 0 37.6 77.9 0.500 2.0 38.3 79.3 0.200 1.0 35.8 51.4 0.059 0.5 37.8 9.7 0.015 0.25 TP % TP%

Model R2

sd

SLIDE 47

Testing the RPM

(Simulated Data)

1.1 40.2 90.0 FP% Other loci Contributing Loci 0.508 0.209 0.066 0.024

RPM R2

0.014 0.015 0.014 0.014 R2 ≠ 0 37.6 77.9 0.500 2.0 38.3 79.3 0.200 1.0 35.8 51.4 0.059 0.5 37.8 9.7 0.015 0.25 TP % TP%

Model R2

sd

SLIDE 48

Testing the RPM

(Simulated Data)

1.1 40.2 90.0 FP% Other loci Contributing Loci 0.508 0.209 0.066 0.024

RPM R2

0.014 0.015 0.014 0.014 R2 ≠ 0 37.6 77.9 0.500 2.0 38.3 79.3 0.200 1.0 35.8 51.4 0.059 0.5 37.8 9.7 0.015 0.25 TP % TP%

Model R2

sd

SLIDE 49

Testing the RPM

(Simulated Data)

1.1 40.2 90.0 FP% Other loci Contributing Loci 0.508 0.209 0.066 0.024

RPM R2

0.014 0.015 0.014 0.014 R2 ≠ 0 37.6 77.9 0.500 2.0 38.3 79.3 0.200 1.0 35.8 51.4 0.059 0.5 37.8 9.7 0.015 0.25 TP % TP%

Model R2

sd

SLIDE 50

Testing the RPM

(Simulated Data)

1.1 40.2 90.0 FP% Other loci Contributing Loci 0.508 0.209 0.066 0.024

RPM R2

0.014 0.015 0.014 0.014 R2 ≠ 0 37.6 77.9 0.500 2.0 38.3 79.3 0.200 1.0 35.8 51.4 0.059 0.5 37.8 9.7 0.015 0.25 TP % TP%

Model R2

sd

SLIDE 51

(0.02%) (0.02%) (0.02%) (0.11%) (0.04%) (0.11%) (0.05%) (0.14%) (5.2%) (5.8%) (4.6%) (5.8%) 230 259 204 256

pu < 0.05

Other loci

(False Positives)

Contributing Loci

(Power) 100% 100% 87% 8%

pc < 0.05

2 5 2 6

pc < 0.05

1 100% 0.500 2.0 1 100% 0.200 1.0 1 81% 0.059 0.5 5 7% 0.015 0.25

pc < 0.01 pc < 0.01

R2 sd

Power tests for the RPM

(100 data sets, 10 loci, 5000 permutations/locus pair)

SLIDE 52

(0.02%) (0.02%) (0.02%) (0.11%) (0.04%) (0.11%) (0.05%) (0.14%) (5.2%) (5.8%) (4.6%) (5.8%) 230 259 204 256

pu < 0.05

Other loci

(False Positives)

Contributing Loci

(Power) 100% 100% 87% 8%

pc < 0.05

2 5 2 6

pc < 0.05

1 100% 0.500 2.0 1 100% 0.200 1.0 1 81% 0.059 0.5 5 7% 0.015 0.25

pc < 0.01 pc < 0.01

R2 sd

Power tests for the RPM

(100 data sets, 10 loci, 5000 permutations/locus pair)

SLIDE 53

(0.02%) (0.02%) (0.02%) (0.11%) (0.04%) (0.11%) (0.05%) (0.14%) (5.2%) (5.8%) (4.6%) (5.8%) 230 259 204 256

pu < 0.05

Other loci

(False Positives)

Contributing Loci

(Power) 100% 100% 87% 8%

pc < 0.05

2 5 2 6

pc < 0.05

1 100% 0.500 2.0 1 100% 0.200 1.0 1 81% 0.059 0.5 5 7% 0.015 0.25

pc < 0.01 pc < 0.01

R2 sd

Power tests for the RPM

(100 data sets, 10 loci, 5000 permutations/locus pair)

SLIDE 54

(0.02%) (0.02%) (0.02%) (0.11%) (0.04%) (0.11%) (0.05%) (0.14%) (5.2%) (5.8%) (4.6%) (5.8%) 230 259 204 256

pu < 0.05

Other loci

(False Positives)

Contributing Loci

(Power) 100% 100% 87% 8%

pc < 0.05

2 5 2 6

pc < 0.05

1 100% 0.500 2.0 1 100% 0.200 1.0 1 81% 0.059 0.5 5 7% 0.015 0.25

pc < 0.01 pc < 0.01

R2 sd

Power tests for the RPM

(100 data sets, 10 loci, 5000 permutations/locus pair)

SLIDE 55

Unequal Allele Frequency Models

(100 data sets each, N=500, 5000 permutation/locus pair) Examined epistatic models with various R2: 0.05, 0.10, 0.30

Contributing loci allele frequencies

.1 .3 .5 .1 .3 .5 .2 .1 .2

Non-contributing loci allele frequencies

.3 .4 .5 .3 .4 .5

SLIDE 56

Unequal Allele Frequency Models

(100 data sets each, N=500, 5000 permutation/locus pair) Examined epistatic models with various R2: 0.05, 0.10, 0.30 Results for R2 = 0.05 Other Loci Combined

(false positives)

Contributing Loci

(power)

1 68 (6.2%) 0.64 0.71 0.1 0.1 46 (4.2%) 1.00 1.00 0.1 1 55 (5.0%) 0.71 0.85 0.3 0.3 1 1 49 (4.5%) 1.00 1.00 0.1 1 3 62 (5.6%) 0.99 1.00 0.3 1 59 (5.4%) 0.68 0.78 0.5 0.5 pc < 0.01 pc < 0.05 pu < 0.05 pc < 0.01 pc < 0.05 Allele Freq

SLIDE 57

Unequal Allele Frequency Models

(100 data sets each, N=500, 5000 permutation/locus pair) Examined epistatic models with various R2: 0.05, 0.10, 0.30 Results for R2 = 0.05 Other Loci Combined

(false positives)

Contributing Loci

(power)

1 68 (6.2%) 0.64 0.71 0.1 0.1 46 (4.2%) 1.00 1.00 0.1 1 55 (5.0%) 0.71 0.85 0.3 0.3 1 1 49 (4.5%) 1.00 1.00 0.1 1 3 62 (5.6%) 0.99 1.00 0.3 1 59 (5.4%) 0.68 0.78 0.5 0.5 pc < 0.01 pc < 0.05 pu < 0.05 pc < 0.01 pc < 0.05 Allele Freq

SLIDE 58

Unequal Allele Frequency Models

(100 data sets each, N=500, 5000 permutation/locus pair) Examined epistatic models with various R2: 0.05, 0.10, 0.30 Results for R2 = 0.05 Other Loci Combined

(false positives)

Contributing Loci

(power)

1 68 (6.2%) 0.64 0.71 0.1 0.1 46 (4.2%) 1.00 1.00 0.1 1 55 (5.0%) 0.71 0.85 0.3 0.3 1 1 49 (4.5%) 1.00 1.00 0.1 1 3 62 (5.6%) 0.99 1.00 0.3 1 59 (5.4%) 0.68 0.78 0.5 0.5 pc < 0.01 pc < 0.05 pu < 0.05 pc < 0.01 pc < 0.05 Allele Freq

SLIDE 59

Applying the RPM to real data

Etoposide metabolism data:

Etoposide is a commonly used anticancer agent with a broad range
f anti-tumor activity.
Data Provided by the St. Jude Children’s Research Hospital

SLIDE 60

Applying the RPM to real data

Etoposide metabolism data:

Etoposide is a commonly used anticancer agent with a broad range
f anti-tumor activity.
Data Provided by the St. Jude Children’s Research Hospital
Phenotypes: 2 pharmacokinetic assessments of etoposide metabolism
Predictor covariates: Genotypes from 8 candidate loci, Race, Sex

(Data: genotypes and phenotypes of 102 individuals)

SLIDE 61

Applying the RPM to real data

Etoposide metabolism data:

Etoposide is a commonly used anticancer agent with a broad range
f anti-tumor activity.
Data Provided by the St. Jude Children’s Research Hospital
Phenotypes: 2 pharmacokinetic assessments of etoposide metabolism
Predictor covariates: Genotypes from 8 candidate loci, Race, Sex

(Data: genotypes and phenotypes of 102 individuals)

None of the predictors were significant in univariate analyses

SLIDE 62

Etoposide Metabolism

First analysis: p-values corrected for 2 x C(10,2) = 90 comparisons

UGT1A1 genotype 22 12 11 78 77 68 67 66 57 56 MDRC ex 26 p-value = 0.045 (corrected) R2 = 0.266 102 9 1.96 66 1.07 27 0.63 N Mean Group

Result for Trait 2 (AUC)

SLIDE 63

Etoposide Metabolism

Second analysis: 4 Subpopulations: AA, CA, Male, Female

p-values corrected for a total of 378 tests (including the original 90) Sex GSTP M F 22 12 11

R2 = 0.628 p-value = 0.018 (corrected) 5 4.17 11 3.91 9 3.68 N Mean Group

Trait 1(clearance) (AA)

77

GSTP

67 66 22 12 11 R2 = 0.291 p-value = 0.036 (corrected) 5 2.21 37 1.13 35 0.75 N Mean Group

Trait 2 (AUC) (CA)

UGT1A1

SLIDE 64

Etoposide Metabolism

Second analysis: 4 Subpopulations: AA, CA, Male, Female

p-values corrected for a total of 378 tests (including the original 90) Sex GSTP M F 22 12 11

R2 = 0.628 p-value = 0.018 (corrected) 5 4.17 11 3.91 9 3.68 N Mean Group

Trait 1(clearance) (AA)

77

GSTP

67 66 22 12 11 R2 = 0.291 p-value = 0.036 (corrected) 5 2.21 37 1.13 35 0.75 N Mean Group

Trait 2 (AUC) (CA)

UGT1A1

SLIDE 65

Continuing work

Further testing:
Models with 3 and 4 contributing loci
Effect of model misspecification
Greater number of simulations for robustness
Varying the merging parameters (now merges if p > 0.05)
Applying to real data (including gene x environment interactions)
Adapting the method for qualitative traits
Difficulties to address:
Computation time for permutation tests
Multiple testing correction (FDR?)
Robustness (cross validation?)

SLIDE 66

For more information

Detecting Epistatic Interactions Contributing to Quantitative Traits

Robert Culverhouse, Tsvika Klein, and William Shannon Online in Genetic Epidemiology

SLIDE 67

SLIDE 68

2 4 6 8 10

1.6 1.8 8.2

CPM Results: lnTrig Va Partitions into Thr Females, n=241

C112R (APOE) InDel (APOB) InDel & C112R Single Site Contributions Best Set % of variation explained

Variability in Ln(Triglyceride) explained by Single locus vs Two locus analyses

(Nelson et al 2001)