Fun with Mixed Models Vic Biostats Seminar 30th April 2015 - - PowerPoint PPT Presentation

fun with mixed models
SMART_READER_LITE
LIVE PREVIEW

Fun with Mixed Models Vic Biostats Seminar 30th April 2015 - - PowerPoint PPT Presentation

Fun with Mixed Models Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk Overview 1 Estimating SNP Heritability 2 Extensions 3 Computational Technicalities 4 Classification Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk The


slide-1
SLIDE 1

Fun with Mixed Models

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-2
SLIDE 2

Overview

1 Estimating SNP Heritability 2 Extensions 3 Computational Technicalities 4 Classification

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-3
SLIDE 3

The Linear Mixed Model

Suppose our GWAS data comprise Phenotype Y (vector of length n) SNP calls S (matrix of size n × N) Plus any covariates Z. Y = Zα + g + e with g ∼ N(0, Kσ2

g)

and e ∼ N(0, Iσ2

e)

α is a vector of fixed effects, g and e are the genetic and environmental random effects (with corresponding components of variance σ2

g and σ2 e).

Typically, use kinship matrix K = XX T

N , where Xij = Sij− ¯ Sj SD(Sj).

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-4
SLIDE 4

Traditionally Used for Heritability Estimation

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-5
SLIDE 5

Solved Via REML

Y = Zα + g + e with g ∼ N(0, Kσ2

g)

and e ∼ N(0, Iσ2

e)

The raw model likelihood follows from assuming Y ∼ N(Zα, V ) where V = Kσ2

g + Iσ2 e :

l(Y |α, K, σ2

g, σ2 e) = −n

2 log(2πσ2

e)− 1

2(Y −Zα)TV −1(Y −Zα)− 1 2 log |V |. The restricted likelihood is obtained by “integrating across” α. Y ∼ N(0, P) where P = V −1 − V −1Z(Z TV −1Z)−1Z TV −1 : l(Y |K, σ2

g, σ2 e) = −n-p

2 log(2πσ2

e)−1

2Y TPY −1 2 log |V |−1 2 log |Z TV −1Z|.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-6
SLIDE 6

1 Estimating SNP Heritability 2 Extensions 3 Computational Technicalities 4 Classification

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-7
SLIDE 7

Estimating Total SNP Heritability

Jian Yang et al. realised by applying to “unrelated individuals”, could estimate total proportion of phenotypic variance explained by all SNPs.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-8
SLIDE 8

Linear Random Effects Regression Model

Suppose we assume the following relationship: Y = α + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + β7X7 + β8X8 + β9X9 + β10X10 + β11X11 + β12X12 + β13X13 + β14X14 + β15X15 + β16X16 + β17X17 + β18X18 + β19X19 + β20X20 + β21X21 + β22X22 + β23X23 + β24X24 + β25X25 + β26X26 + β27X27 + β28X28 + . . . + β500 000X500 000 + e, where βj ∼ N(0, σ2

g/N) and e ∼ N(0, σ2 e).

Then g = N

j=1 βjXj = Xβ ∼ N(0, XX T N σ2 g) and Y ∼ N(α, Kσ2 g + Iσ2 e)

where K = XX T

N

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-9
SLIDE 9

Estimating Total SNP Heritability

The heritability of human height is 80%. Jian Yang et al. calculated that 45% of phenotypic variance could be explained by common SNPs.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-10
SLIDE 10

Estimating Total SNP Heritability

The heritability of human height is 80%. Jian Yang et al. calculated that 45% of phenotypic variance could be explained by common SNPs.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-11
SLIDE 11

Solves the “Missing Heritability” Problem

Environment Genetics

Human Height

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-12
SLIDE 12

Solves the “Missing Heritability” Problem

Environment Genetics

Human Height

Missing Herit. GWAS SNPs

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-13
SLIDE 13

Solves the “Missing Heritability” Problem

Environment Genetics

Human Height

Missing Herit. GWAS SNPs Still Missing ALL SNPs

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-14
SLIDE 14

Solves the “Missing Heritability” Problem

Environment Other Genetics Other SNPs GWAS SNPs

Human Height Schizophrenia Obesity Crohn's Disease Bipolar Disorder Epilepsy

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-15
SLIDE 15

The Method Generally Works

REML Assumptions: All SNPs are Causal Gaussian Effect Sizes Gaussian Noise Terms Inverse Relationship between MAF and Effect Size Overall, we found the approach amazingly robust to misspecification

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-16
SLIDE 16

Computing K = XX T

N

tet

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-17
SLIDE 17

Impact of Uneven Tagging

Regions of high LD have disproportionately large contribution

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-18
SLIDE 18

Impact of Uneven Tagging

A common problem when performing principal component analysis.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-19
SLIDE 19

Estimates Can be Sensitive to LD of Causal Variants

Causal variants in high LD areas ⇒ over-estimation of h2

SNP

Causal variants in low LD areas ⇒ under-estimation of h2

SNP

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-20
SLIDE 20

Adjusting for Uneven Tagging

Genotyped SNPs Underlying Variation β1 β2 β7 U1 U2 U3 U4 X1 X2 X3 X4 Weightings 1 1 1 1 β3

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-21
SLIDE 21

Adjusting for Uneven Tagging

Genotyped SNPs Underlying Variation β1 β5 β2 β3β6 β8β7β8 U1 U2 U3 U4 X1 X2 X3 X4 X1 X3 X4 X4 Weightings ½ ½ 1 ½½ ¼¼¼ β9 ¼ X4

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-22
SLIDE 22

Weightings Reduce the Biases

LDAK weightings down-weight SNPs well-tagged by neighbours and up-weight SNPs poorly-tagged by neighbours

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-23
SLIDE 23

LDAK: Linkage Disequilibrium Adjusted Kinships

LDAK weights offer an alternative to pruning. e.g., when computing genetic profile risk scores.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-24
SLIDE 24

GCTA vs LDAK

In the end, whether SNPs explain 50% or 60% of heritability not a big deal.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-25
SLIDE 25

1 Estimating SNP Heritability 2 Extensions 3 Computational Technicalities 4 Classification

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-26
SLIDE 26

Basic Model

Y = α + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + β7X7 + β8X8 + β9X9 + β10X10 + β11X11 + β12X12 + β13X13 + β14X14 + β15X15 + β16X16 + β17X17 + β18X18 + β19X19 + β20X20 + β21X21 + β22X22 + β23X23 + β24X24 + β25X25 + β26X26 + β27X27 + β28X28 + . . . + β500 000X500 000 + e. Assume βj ∼ N(0, σ2

g/N) and e ∼ N(0, σ2 e).

Then Y ∼ N(α, Kσ2

g + Iσ2 e), where K = XX T N

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-27
SLIDE 27

Bivariate Analysis

Trait 1: Y1 = Zα1 + β1X1 + β2X2 + . . . + β500 000X500 000 + e1 = Zα1 + g1 + e1 Trait 2: Y2 = Zα2 + γ1X1 + γ2X2 + . . . + γ500 000X500 000 + e2 = Zα2 + g2 + e2 Now interested in the correlation between genetic effects: ρ = cor(g1, g2). Or equivalently can think of the average correlation between effect sizes: ρ = cor(βj, γj)

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-28
SLIDE 28

Examining Concordance Between Traits

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-29
SLIDE 29

Genome Partitioning

Y = α + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + β7X7 + β8X8 + β9X9 + β10X10 + β11X11 + β12X12 + β13X13 + β14X14 + β15X15 + β16X16 + β17X17 + β18X18 + β19X19 + β20X20 + β21X21 + β22X22 + β23X23 + β24X24 + β25X25 + β26X26 + β27X27 + β28X28 + . . . + β500 000X500 000 + e. Assume βj ∼ N(0, σ2

g/N) and e ∼ N(0, σ2 e).

Then Y ∼ N(α, Kσ2

g + Iσ2 e), where K = XX T N

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-30
SLIDE 30

Genome Partitioning

Y = α + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + β7X7 +β8X8 + β9X9 + β10X10 + β11X11 + β12X12 + β13X13 + β14X14 +β15X15 + β16X16 + β17X17 + β18X18 + β19X19 + β20X20 + β21X21 +β22X22 + β23X23 + β24X24 + β25X25 + β26X26 + β27X27 + β28X28 + . . . + β500 000X500 000 + e. Assume βj ∼ N(0, σ2

g1/N1) and βk ∼ N(0, σ2 g2/N2).

Then Y ∼ N(α, K1σ2

g1 + K2σ2 g2 + Iσ2 e), where K1 = X1X T

1

N1

and K2 = X2X T

2

N2

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-31
SLIDE 31

Genome Partitioning

Height BMI vWF QTi

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-32
SLIDE 32

Intensity of Heritability

We define the “intensity of heritability” of a set of SNPs as their heritability divided by their genetic variation. Can then test for differences in intensity of heritability.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-33
SLIDE 33

Intensity of Heritability

Are genic SNPs more important than inter-genic SNPs? Inter-genic defined as all SNPs > 100kbp from a coding region.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-34
SLIDE 34

Intensity of Heritability

Are genic SNPs more important than inter-genic SNPs? Can test eQTLs vs non-eQTLs; high-quality SNPs vs low-quality, etc.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-35
SLIDE 35

Concordance Between Traits

Are SNPs associated with one trait more important for others. p-values for Schizophrenia and Crohn’s obtained from independent studies.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-36
SLIDE 36

Concordance Between Traits

Are SNPs associated with one trait more important for others. SNPs associated with Crohn’s are more important for Crohn’s. (Good!)

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-37
SLIDE 37

Concordance Between Traits

Are SNPs associated with one trait more important for others. SNPs associated with Schizophrenia are more important for Bipolar.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-38
SLIDE 38

Concordance Between Traits

Are SNPs associated with one trait more important for others. Find concordance between Crohn’s and Type 1 Diabetes.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-39
SLIDE 39

Using Intensity of Heritability for Prediction

We are interested in obtaining linear prediction models of the form: Y = X1β1 + X2 ˆ β2 + . . . + X500 000 ˆ β500 000 ˆ βj is the estimated effect size of SNP j (can be zero). BLUP (Best Linear Unbiased Prediction) assumes the linear mixed model: Y = Zα + g + e with g ∼ N(0, Kσ2

g)

and e ∼ N(0, Iσ2

e).

When K = XX T

N , this is equivalent to assuming βj ∼ N(0, σ2 g/N).

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-40
SLIDE 40

BLUP Excessively Shrinks

Single-SNP Analysis BLUP

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-41
SLIDE 41

MultiBLUP

MultiBLUP generalizes BLUP: Y = Zα + g1 + g2 + . . . + gM + e with gm ∼ N(0, Kmσ2

gm).

If K1 = X1X T

1

N1 , K2 = X2X T

2

N2

. . . , KM = XMX T

M

NM

then this is equivalent to a random effects regression model where SNPs in X1 allowed to have a different effect size prior to those in X2, etc. MultiBLUP improves prediction when random effects correspond to distinct intensities of heritability.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-42
SLIDE 42

MHC / Non-MHC MultiBLUP

MHC/non-MHC Trait BLUP MultiBLUP Bipolar Disorder 0.27 0.27 Coronary Artery Disease 0.13 0.13 Crohn’s Disease 0.32 0.29 Hypertension 0.15 0.14 Rheumatoid Arthritis 0.21 0.35 Type 1 Diabetes 0.25 0.56 Type 2 Diabetes 0.16 0.16 Mean 0.21 0.27

Dividing into MHC and Non-MHC advantageous for RA and TID (but same or worse for others).

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-43
SLIDE 43

Gene-Based Association Testing

To estimate the total contribution of all SNPs, we use the model: Y = α + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + β7X7 + β8X8 + β9X9 + β10X10 + β11X11 + β12X12 + β13X13 + β14X14 + β15X15 + β16X16 + β17X17 + β18X18 + β19X19 + β20X20 + β21X21 + β22X22 + β23X23 + β24X24 + β25X25 + β26X26 + β27X27 + β28X28 + . . . + β500 000X500 000 + e. blank

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-44
SLIDE 44

Gene-Based Association Testing

To test a set of SNPs S, can reduce it to: Y = α + β1X1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + β7X7 + β8X8 + β9X9 + β10X10 + β11X11 + β12X12 + β13X13 + β14X14 + β15X15 + β16X16 + β17X17 + β18X18 + β19X19 + β20X20 + β21X21 + β22X22 + β23X23 + β24X24 + β25X25 + β26X26 + β27X27 + β28X28 + . . . + β500 000X500 000 + e. i.e., Y = Zα +

j∈S Xjβj + e with βj ∼ N(0, σ2 S/NS).

Perform a likelihood ratio test for P(σ2

S > 0).

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-45
SLIDE 45

Simulation Study

Generate phenotypes where 50/1000 genes contribute heritability.

1−Specificity Sensitivity 0.00 0.01 0.02 0.03 0.04 0.05 0.0 0.1 0.2 0.3 0.4 0.5 0.6

1 Causal SNP per Gene

minSNP GATES VEGAS GWiS GBAT−Classical GBAT−Bayesian

1−Specificity Sensitivity 0.00 0.01 0.02 0.03 0.04 0.05 0.0 0.1 0.2 0.3 0.4 0.5 0.6

2 Causal SNPs per Gene

minSNP GATES VEGAS GWiS GBAT−Classical GBAT−Bayesian

1−Specificity Sensitivity 0.00 0.01 0.02 0.03 0.04 0.05 0.0 0.1 0.2 0.3 0.4 0.5 0.6

3 Causal SNPs per Gene

minSNP GATES VEGAS GWiS GBAT−Classical GBAT−Bayesian

minSNP GATES VEGAS GWiS GBAT Classical GBAT Bayesian 100 150 200 250 300

Average Rank of Causals

  • minSNP

GATES VEGAS GWiS GBAT Classical GBAT Bayesian 100 150 200 250 300

Average Rank of Causals

  • minSNP

GATES VEGAS GWiS GBAT Classical GBAT Bayesian 100 150 200 250 300

Average Rank of Causals

GBAT most powerful (and fastest).

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-46
SLIDE 46

GBAT

GBAT provides a complement to single-SNP tests of association. p-values are well-suited for pathway analysis.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-47
SLIDE 47

Gene-Based Association Testing

Effectively, you are testing for increased intensity of heritability on a very local level. Therefore, can incorporate this gene-based (or regional) testing in MultiBLUP.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-48
SLIDE 48

Adaptive MultiBLUP

Step 1: Divide genome into (say) 75kbp overlapping chunks. Step 2: Test each chunk for association (using GBAT). Step 3: Identify all significant chunks (say P < 10−5). (Merge these chunks with neighbouring chunks with P < 0.01.) E.g., for Type 1 Diabetes, obtain 4 local regions. Step 4: Run MultiBLUP with five random effects.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-49
SLIDE 49

Adaptive MultiBLUP Offers Flexible Shrinkage

Genetic Profile Risk Scores BLUP Adaptive MultiBLUP

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-50
SLIDE 50

Performance of Adaptive MultiBLUP

Current methods MultiBLUP Risk Stepwise Trait∗ BLUP Score Regression BSLMM MHC/non-MHC Adaptive BD 0.27 0.25 (1) 0.02 0.27 0.27 0.27 CAD 0.13 0.12 (1) 0.08 0.15 0.13 0.16 CD 0.32 0.28 (1) 0.18 0.34 0.29 0.36 HT 0.15 0.14 (1) 0.00 0.14 0.14 0.17 RA 0.21 0.28 (3) 0.32 0.33 0.35 0.37 T1D 0.25 0.34 (5) 0.54 0.57 0.56 0.59 T2D 0.16 0.14 (1) 0.10 0.17 0.16 0.18 Mean 0.21 0.22 0.18 0.28 0.27 0.30

Other methods offer performance similar to Adaptive MultiBLUP: BSLMM (Bayesian Sparse Linear Mixed Models), BayesR, SparSNP. But I believe Adaptive MultiBLUP most computationally efficient: Can be applied to upwards of 50 000 samples.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-51
SLIDE 51

Different Kinship Matrices

The “Yang” kinship standardizes each SNP: Kik = N

j (Sij− ¯ Sj)((Skj− ¯ Sj) Var(Sj)

This corresponds to assumption each SNP contributes equal heritability. Animal/plant geneticists generally use: Kik =

N

j (Sij− ¯

Sj)((Skj− ¯ Sj) Constant

This corresponds to assumption each SNP has equal effect magnitude.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-52
SLIDE 52

Different Kinship Matrices

Can interrogate genetic architecture by trying alternative scalings. Rare Variants Bigger Effects ← → Common Variants Bigger Effects Evidence that rarer variants tend to have (slightly) larger effects.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-53
SLIDE 53

1 Estimating SNP Heritability 2 Extensions 3 Computational Technicalities 4 Classification

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-54
SLIDE 54

Null Distribution for Likelihood Ratio Test

Likelihood ratio test statistic: T = 2 × (l1 − l0). l1 & l0 are maximum log likelihoods under null & alternative models. Standard MLE theory assumes T ∼ χ2(1). However, because testing σ2

g > 0 (one sided / on boundary?)

MLE theory (?) assumes T ∼ 1

2δ{0} + 1 2χ2(1).

This leads to conservative p-values (because a random effect?).

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-55
SLIDE 55

Null Distribution for Likelihood Ratio Test

Likelihood ratio test statistic: T = 2 × (l1 − l0). l1 & l0 are maximum log likelihoods under null & alternative models. Standard MLE theory assumes T ∼ χ2(1). However, because testing σ2

g > 0 (one sided / on boundary?)

MLE theory (?) assumes T ∼ 1

2δ{0} + 1 2χ2(1).

This leads to conservative p-values (because a random effect?).

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-56
SLIDE 56

Null Distribution for Likelihood Ratio Test

Null Distribution: 1 2 χ2(1)

Expected −log10(P) Observed −log10(P) 1 2 3 4 0.5 1.5 2.5 3.5

Estimate pNULL, a, and b from permutation.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-57
SLIDE 57

Null Distribution for Likelihood Ratio Test

Null Distribution: 1 2 χ2(1)

Expected −log10(P) Observed −log10(P) 1 2 3 4 0.5 1.5 2.5 3.5

Null Distribution: p Γ(a,b)

Expected −log10(P) Observed −log10(P) 1 2 3 4 1 2 3 4

Estimate p, a, and b from permutation - important for gene-based tests.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-58
SLIDE 58

Bayesian REML

The basic model: Y = Zα + g + e with g ∼ N(0, Kσ2

g)

and e ∼ N(0, Iσ2

e)

In LDAK, the user can specify a beta prior for h2 =

σ2

g

σ2

g+σ2 e .

For the general model uses a Dirichlet prior for the M heritabilities. LDAK estimates the posterior distribution of h2 and reports Bayes Factors. Particularly useful for gene-based tests.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-59
SLIDE 59

Improved Estimation of h2

Standard REML estimates assumes a normal distribution for h2. Figure too large - but basically shows breakdown for small h2. LDAK assumes a normal distribution for logit(h2) = h2/(1 − h2).

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-60
SLIDE 60

Leads to More Precise Inferences

Can compute Bayes Factors more accurately.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-61
SLIDE 61

Heritability Meta Analysis

Previous approaches obtain a global estimate of P(h2

S) by inverse-variance

weighting of individual cohort estimates of h2

S.

Then either use a Wald Test, or combine individual cohort p-values using Fisher’s Method.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-62
SLIDE 62

Heritability Meta Analysis

By contrast, valid to use inverse-variance weighting on logit scale. Can derive an approximate global likelihood ratio test statistic for the fixed effect model. But we prefer combining Bayes Factors (∼ random effect model).

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-63
SLIDE 63

Using Eigen-Decompositions

Estimate variance components using Average Information REML. (Similar to Newton-Raphson, but approximates the second derivative.) For basic model, each iteration requires inversion of V = Kσ2

g + Iσ2 e.

This becomes trivial given the eigen-decomposition K = UEUT: V = U(Eσ2

g + Iσ2 e)UT and V −1 = U(Eσ2 g + Iσ2 e)−1UT

Alternatively, can use Woodbury Matrix Identity: V −1 = ( XX T

N σ2 g + Iσ2 e)−1 = 1 σ2

e I − X

σ2

e ( X T X

σ2

e

+ N

σ2

g )−1 X T

σ2

e .

When K low-rank (N < n) this leads to huge speed ups.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-64
SLIDE 64

Missing phenotypic values

Suppose wish to estimate h2 but fraction 1 − p phenotypes missing:

15 2

  • 5

19

  • 7

NA NA

Kinships C

  • v

a r Y ~ K and Z

Would typically just exclude affected individuals.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-65
SLIDE 65

Missing phenotypic values

Suppose wish to estimate h2 but fraction 1 − p phenotypes missing:

15 2

  • 5

19

  • 7

NA NA

Kinships C

  • v

a r Y ~ K and Z

Would typically just exclude affected individuals.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-66
SLIDE 66

Missing phenotypic values

Alternative: replace missing with the mean and analyse all:

15 2

  • 5

19

  • 7

4.5 4.5

Kinships C

  • v

a r Y ~ K and Z

Resulting estimate will have expected value ph2.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-67
SLIDE 67

The Dentist Method

Therefore, we propose you first “fill” missing values, then “scale” the resulting estimate of h2 by 1

p.

Can use the same trick for BLUP or mixed model association analysis.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-68
SLIDE 68

The Dentist Method

Simulated data with 10% of phenotypic values missing. Heritability Analysis BLUP Performance MM Association Testing

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-69
SLIDE 69

The Dentist Method

Simulated data with 50% of phenotypic values missing. Heritability Analysis BLUP Performance MM Association Testing

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-70
SLIDE 70

Why Use the Dentist Method?

When analysing high-dimensional data, filling in missing phenotypic values allows us to use the same decomposition for each phenotype. Appllied to breast cancer data; n = 1674, 27k gene expressions. Less than 1% missing values, but these affect ∼ 80% of phenotypes. Filling reduces analysis from weeks to hours.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-71
SLIDE 71

Outstanding Issues

Have efficient algorithm for inverting V when: V = Kσ2

g + Iσ2 e

for K full-rank. V = Kσ2

g + Iσ2 e

for K low-rank. V = K1σ2

g1 + K2σ2 g2 + Iσ2 e

for K1 full-rank, K2 low-rank. How about for V = K1σ2

g1 + K2σ2 g2 + Iσ2 e

when K1 and K2 full-rank? Some applications approximate K2 with a low-rank matrix. OK if σ2

g2 a nuisance parameter.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-72
SLIDE 72

REML for Case/Control Data

Liability model for binary traits: Desired Model: L = Zα + g + e with g ∼ N(0, Kσ2

g)

and e ∼ N(0, Iσ2

e)

where Yi = 1 iff Zi > T. Common practice is to analyse as if a continuous phenotype, then transform to liability scale.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-73
SLIDE 73

Probit REML

If we knew each individual’s liability values, estimating σ2

g and σ2 e would

be straightforward (and trivial given eigen-decomposition of K). Therefore, envisage MCMC scheme where at each step we first sample Li, then maximise (or sample from) σ2

g and σ2 e.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-74
SLIDE 74

Summary

Estimating SNP heritability Intensity of Heritability Gene-based Association Testing Adaptive MultiBLUP Bayesian REML and Heritability Meta-Analysis Efficient REML with Multiple Random Effects??? Probit REML???

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-75
SLIDE 75

Acknowledgments

Work with David Balding. Money from Medical Research Council. LDAK available at www.ldak.org. I’m here til end of May Find me in Old Geology South

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

slide-76
SLIDE 76

Concordance Between Traits

Are SNPs associated with one trait more important for others. Repeated with MHC region excluded.

Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk