fun with mixed models
play

Fun with Mixed Models Vic Biostats Seminar 30th April 2015 - PowerPoint PPT Presentation

Fun with Mixed Models Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk Overview 1 Estimating SNP Heritability 2 Extensions 3 Computational Technicalities 4 Classification Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk The


  1. Fun with Mixed Models Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  2. Overview 1 Estimating SNP Heritability 2 Extensions 3 Computational Technicalities 4 Classification Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  3. The Linear Mixed Model Suppose our GWAS data comprise Phenotype Y (vector of length n ) SNP calls S (matrix of size n × N ) Plus any covariates Z . g ∼ N (0 , K σ 2 e ∼ N (0 , I σ 2 Y = Z α + g + e with g ) and e ) α is a vector of fixed effects, g and e are the genetic and environmental random effects (with corresponding components of variance σ 2 g and σ 2 e ). N , where X ij = S ij − ¯ Typically, use kinship matrix K = XX T S j SD ( S j ) . Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  4. Traditionally Used for Heritability Estimation Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  5. Solved Via REML g ∼ N (0 , K σ 2 e ∼ N (0 , I σ 2 Y = Z α + g + e with g ) and e ) The raw model likelihood follows from assuming V = K σ 2 g + I σ 2 Y ∼ N ( Z α, V ) where e : e ) − 1 2( Y − Z α ) T V − 1 ( Y − Z α ) − 1 e ) = − n l ( Y | α, K , σ 2 g , σ 2 2 log(2 πσ 2 2 log | V | . The restricted likelihood is obtained by “integrating across” α . P = V − 1 − V − 1 Z ( Z T V − 1 Z ) − 1 Z T V − 1 : Y ∼ N (0 , P ) where e ) = − n - p e ) − 1 2 Y T PY − 1 2 log | V |− 1 l ( Y | K , σ 2 g , σ 2 2 log(2 πσ 2 2 log | Z T V − 1 Z | . Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  6. 1 Estimating SNP Heritability 2 Extensions 3 Computational Technicalities 4 Classification Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  7. Estimating Total SNP Heritability Jian Yang et al. realised by applying to “unrelated individuals”, could estimate total proportion of phenotypic variance explained by all SNPs. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  8. Linear Random Effects Regression Model Suppose we assume the following relationship: Y = α + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + β 6 X 6 + β 7 X 7 + β 8 X 8 + β 9 X 9 + β 10 X 10 + β 11 X 11 + β 12 X 12 + β 13 X 13 + β 14 X 14 + β 15 X 15 + β 16 X 16 + β 17 X 17 + β 18 X 18 + β 19 X 19 + β 20 X 20 + β 21 X 21 + β 22 X 22 + β 23 X 23 + β 24 X 24 + β 25 X 25 + β 26 X 26 + β 27 X 27 + β 28 X 28 + . . . + β 500 000 X 500 000 + e , where β j ∼ N (0 , σ 2 g / N ) and e ∼ N (0 , σ 2 e ). j =1 β j X j = X β ∼ N (0 , XX T Then g = � N N σ 2 g ) and Y ∼ N ( α, K σ 2 g + I σ 2 e ) where K = XX T N Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  9. Estimating Total SNP Heritability The heritability of human height is 80%. Jian Yang et al. calculated that 45% of phenotypic variance could be explained by common SNPs. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  10. Estimating Total SNP Heritability The heritability of human height is 80%. Jian Yang et al. calculated that 45% of phenotypic variance could be explained by common SNPs. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  11. Solves the “Missing Heritability” Problem Human Height Environment Genetics Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  12. Solves the “Missing Heritability” Problem Human Height GWAS SNPs Environment Genetics Missing Herit. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  13. Solves the “Missing Heritability” Problem Human Height GWAS SNPs Environment ALL SNPs Still Genetics Missing Missing Herit. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  14. Solves the “Missing Heritability” Problem Human Height Schizophrenia Obesity GWAS SNPs Environment Other SNPs Other Genetics Crohn's Disease Bipolar Disorder Epilepsy Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  15. The Method Generally Works REML Assumptions: All SNPs are Causal Gaussian Effect Sizes Gaussian Noise Terms Inverse Relationship between MAF and Effect Size Overall, we found the approach amazingly robust to misspecification Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  16. Computing K = XX T N tet Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  17. Impact of Uneven Tagging Regions of high LD have disproportionately large contribution Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  18. Impact of Uneven Tagging A common problem when performing principal component analysis. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  19. Estimates Can be Sensitive to LD of Causal Variants Causal variants in high LD areas ⇒ over-estimation of h 2 SNP Causal variants in low LD areas ⇒ under-estimation of h 2 SNP Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  20. Adjusting for Uneven Tagging Weightings 1 1 1 1 β 1 β 2 β 3 β 7 Genotyped SNPs X 1 X 2 X 3 X 4 Underlying Variation U 1 U 2 U 3 U 4 Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  21. Adjusting for Uneven Tagging Weightings ½ ½ 1 ½½ ¼¼¼ ¼ β 1 β 5 β 2 β 3 β 6 β 8 β 7 β 8 β 9 Genotyped SNPs X 1 X 1 X 2 X 3 X 3 X 4 X 4 X 4 X 4 Underlying Variation U 1 U 2 U 3 U 4 Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  22. Weightings Reduce the Biases LDAK weightings down-weight SNPs well-tagged by neighbours and up-weight SNPs poorly-tagged by neighbours Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  23. LDAK: Linkage Disequilibrium Adjusted Kinships LDAK weights offer an alternative to pruning. e.g., when computing genetic profile risk scores. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  24. GCTA vs LDAK In the end, whether SNPs explain 50% or 60% of heritability not a big deal. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  25. 1 Estimating SNP Heritability 2 Extensions 3 Computational Technicalities 4 Classification Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  26. Basic Model Y = α + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + β 6 X 6 + β 7 X 7 + β 8 X 8 + β 9 X 9 + β 10 X 10 + β 11 X 11 + β 12 X 12 + β 13 X 13 + β 14 X 14 + β 15 X 15 + β 16 X 16 + β 17 X 17 + β 18 X 18 + β 19 X 19 + β 20 X 20 + β 21 X 21 + β 22 X 22 + β 23 X 23 + β 24 X 24 + β 25 X 25 + β 26 X 26 + β 27 X 27 + β 28 X 28 + . . . + β 500 000 X 500 000 + e . Assume β j ∼ N (0 , σ 2 g / N ) and e ∼ N (0 , σ 2 e ). e ), where K = XX T Then Y ∼ N ( α, K σ 2 g + I σ 2 N Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  27. Bivariate Analysis Trait 1: Y 1 = Z α 1 + β 1 X 1 + β 2 X 2 + . . . + β 500 000 X 500 000 + e 1 = Z α 1 + g 1 + e 1 Trait 2: Y 2 = Z α 2 + γ 1 X 1 + γ 2 X 2 + . . . + γ 500 000 X 500 000 + e 2 = Z α 2 + g 2 + e 2 Now interested in the correlation between genetic effects: ρ = cor ( g 1 , g 2 ). Or equivalently can think of the average correlation between effect sizes: ρ = cor ( β j , γ j ) Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  28. Examining Concordance Between Traits Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  29. Genome Partitioning Y = α + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + β 6 X 6 + β 7 X 7 + β 8 X 8 + β 9 X 9 + β 10 X 10 + β 11 X 11 + β 12 X 12 + β 13 X 13 + β 14 X 14 + β 15 X 15 + β 16 X 16 + β 17 X 17 + β 18 X 18 + β 19 X 19 + β 20 X 20 + β 21 X 21 + β 22 X 22 + β 23 X 23 + β 24 X 24 + β 25 X 25 + β 26 X 26 + β 27 X 27 + β 28 X 28 + . . . + β 500 000 X 500 000 + e . Assume β j ∼ N (0 , σ 2 g / N ) and e ∼ N (0 , σ 2 e ). e ), where K = XX T Then Y ∼ N ( α, K σ 2 g + I σ 2 N Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  30. Genome Partitioning Y = α + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + β 6 X 6 + β 7 X 7 + β 8 X 8 + β 9 X 9 + β 10 X 10 + β 11 X 11 + β 12 X 12 + β 13 X 13 + β 14 X 14 + β 15 X 15 + β 16 X 16 + β 17 X 17 + β 18 X 18 + β 19 X 19 + β 20 X 20 + β 21 X 21 + β 22 X 22 + β 23 X 23 + β 24 X 24 + β 25 X 25 + β 26 X 26 + β 27 X 27 + β 28 X 28 + . . . + β 500 000 X 500 000 + e . Assume β j ∼ N (0 , σ 2 g 1 / N 1 ) and β k ∼ N (0 , σ 2 g 2 / N 2 ). e ), where K 1 = X 1 X T and K 2 = X 2 X T Then Y ∼ N ( α, K 1 σ 2 g 1 + K 2 σ 2 g 2 + I σ 2 1 2 N 1 N 2 Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  31. Genome Partitioning Height BMI vWF QTi Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  32. Intensity of Heritability We define the “intensity of heritability” of a set of SNPs as their heritability divided by their genetic variation. Can then test for differences in intensity of heritability. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  33. Intensity of Heritability Are genic SNPs more important than inter-genic SNPs? Inter-genic defined as all SNPs > 100kbp from a coding region. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  34. Intensity of Heritability Are genic SNPs more important than inter-genic SNPs? Can test eQTLs vs non-eQTLs; high-quality SNPs vs low-quality, etc. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

  35. Concordance Between Traits Are SNPs associated with one trait more important for others. p -values for Schizophrenia and Crohn’s obtained from independent studies. Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend