analysis of multiple related phenotypes in genome wide
play

ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION - PowerPoint PPT Presentation

GIW 2016 ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Taesung Park 1 Sohee Oh 1 , Iksoo Huh 1 , and Seung-Yeoun Lee 2 1 Department of Statistics, Seoul National University, South Korea 2 Department of Applied


  1. GIW 2016 ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Taesung Park 1 Sohee Oh 1 , Iksoo Huh 1 , and Seung-Yeoun Lee 2 1 Department of Statistics, Seoul National University, South Korea 2 Department of Applied Statistics, Sejong Univeristy, South Korea 1

  2. Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion

  3. Genome Wide Association Studies (GWAS) — Studies of genetic variation across the entire genome — Single Nucleotide Polymorphism (SNP) — DNA sequence variations that occur when a single nucleotide is altered — Designed to identify associations between genetic markers & observable traits, or the presence/absence of a disease or condition — Rely on SNP chip technologies

  4. Genome Wide Association Studies (GWAS) — Successful in complex traits and diseases - height, body mass index, blood pressure - asthma, cancer, diabetes, heart disease and mental illnesses

  5. Association test — Univariate and single SNP analysis — Focus on one trait and single SNP 2 y Sex Age SNP , ~ N ( 0 , ) = β + β + β + β + ε ε σ i 0 1 i 2 i 3 i i one trait Trait 1 SNP SNP 1 1M SNP SNP 2 K … … SNP SNP I J

  6. Improving power — Common complex traits are related with many genes — Not easy to identify genetic variants with high significance at α =5 × 10 -8 — Further, these variants explain only small fraction of disease etiology — Need to develop a more powerful method for identifying genetic variants — Meta analysis by increasing sample size — Multiple SNP analysis: gene-gene interaction — Joint analysis with the correlated phenotypes

  7. Association test — Univariate + multiple SNP analysis — Focus on one trait and multiple SNPs one trait Trait 1 SNP 500K SNP 1 SNP SNP … … K 2 SNP SNP accumulated additive I J effects on multiple SNPs SNP-SNP Interaction

  8. Multivariate approach — Multivariate analysis — Focus on multiple related traits and single SNP Related traits Trait 1 Trait 2 Trait 3 Trait 4 Trait 5 SNP SNP 1 1M SNP SNP 2 K … … SNP SNP I J

  9. Multivariate approach — Examples: multiple related phenotypes — Obesity — BMI, Waist circumference, Weight, WHR, Body Fat — Hyperlipidemia — Total cholesterol, HDL/LDL cholesterol, Triglyceride — Metabolic Syndrome — Waist circumference, triglyceride, HDL cholesterol, blood pressure (SBP, DBP), Insulin resistance

  10. Multivariate approach — Existing Methods MultiPhen (O’Reilly et al., 2012) 1) — Proportional odds model Efficient algorithm for GWAS (Zhou and Stephens, 2014) 2) — Linear mixed model

  11. Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion

  12. Multivariate approach — Identify genetic variants associated with multiple related traits — Extension of the univariate linear model to the multivariate linear model with a response vector — Univariate variances are replaced by a covariance matrix — Joint analysis — Analyze several traits simultaneously — Account for correlation structure of multiple traits in the model — Allows different slopes(SNP effects) model for each trait — Different association direction => Hetrogeneous model — Common slope(SNP effect) model — Same association direction with similar effect sizes => Homogeneous model

  13. Multivariate general linear model (1) — Let y ij denote the value of trait j from subject i , for i =1,…, n , j =1,…, m — The linear model for the trait j p T y x x = ∑ β + ε = β + ε ij ik kj ij i ij j k = 1 is a vector of SNPs and covariates T x ( x ,..., x ) — = i i 1 ip is a vector of p unknown parameters — T ( ,..., ) β = β β j 1 j pj represents the effect of the k th SNP on the trait j — β kj — This models allows one SNP to have different effects on the traits

  14. The multivariate general linear model (2) is a vector of m responses from the i th subject ) T — y ( y ,..., y = i i 1 im is a vector of m residuals for the i th subject T ( ,..., ) — ε = ε ε i i 1 im ( ) — ~ N 0 , ε ∑ i m m — The vector nm × 1 ε ⎛ ⎞ 1 ⎜ ⎟ ! ( ) ~ N 0 , I ε = ⊗ Σ ⎜ ⎟ nm nm n ⎜ ⎟ ε ⎝ ⎠ n where I n denotes the n × n identity matrix and the operator is the ⊗ direct (Kronecker) product

  15. Multivariate general linear model (3) — Covariance (correlation) structure: matrix m m × — Specify how the traits within a subject are related Unstructured (UN) — 2 ! ⎛ ⎞ σ σ σ 1 12 1 m ⎜ ⎟ 2 ! ⎜ σ σ σ ⎟ 12 2 2 m ⎜ ⎟ " " # " ⎜ ⎟ ⎜ ⎟ 2 ! σ σ σ ⎝ ⎠ 1 m 2 m m Sturcutred covariane — Compound Symmetry (CS) First-order autoregressive (AR(1)) 2 2 2 2 2 2 m 1 2 ! ! − ⎛ ⎞ ⎛ ⎞ σ + σ σ σ σ ρσ ρ σ 1 1 1 ⎜ ⎟ ⎜ ⎟ 2 2 2 2 2 2 m 2 2 ! ! − σ σ + σ σ ⎜ ⎟ ρσ σ ρ σ ⎜ ⎟ 1 1 1 ⎜ ⎟ ⎜ ⎟ " " # " " " # " ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 2 2 2 2 ! m 1 2 m 2 2 2 − − ! σ σ σ + σ ρ σ ρ σ σ ⎝ ⎠ ⎝ ⎠ 1 1 1

  16. The multivariate general linear model (4) — Matrix formulation T y " y y ⎛ ⎞ ⎛ ⎞ 11 1 m 1 ⎜ ⎟ ⎜ ⎟ Y # " " ! n m data matrix ⋅ = = ⎜ ⎟ × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T y " y y ⎝ ⎠ n 1 nm ⎝ n ⎠ Y XB E , = + T x " x x ⎛ ⎞ ⎛ ⎞ 11 1 p 1 ⎜ ⎟ ⎜ ⎟ X # " " ! n p known design matrix ⋅ = = × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T x " x x y ⎛ ⎞ ⎝ n 1 np ⎠ ⎝ n ⎠ 1 ⎜ ⎟ where E ( Y ) XB and Var ! I = = ⊗ Σ ⎜ ⎟ n " ⎛ β β ⎞ ⎜ ⎟ 11 1 m y ⎜ ⎟ ⎝ ⎠ n B # " " ( " ) p m parameter matrix ⋅ = = β β × ⎜ ⎟ 1 m ⎜ ⎟ " β β ⎝ ⎠ p 1 pm T " ⎛ ⎞ ε ε ε ⎛ ⎞ 11 1 m 1 ⎜ ⎟ ⎜ ⎟ E # " " ! n m matrix of random errors ⋅ = = ⎜ ⎟ × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T " ε ε ε ⎝ ⎠ ⎝ ⎠ n 1 nm n

  17. The multivariate general linear model (5) — Consider related-phenotypes simultaneously — Allow for correlation between phenotypes in the model — Detect genetic variants which have modest effects in univariate approach — Provide some chances to capture pleiotropic genes — Model — Hetro model with separate slopes (different genetic effects on each phenotype) — Homo model with common slope (same genetic effects on all phenotypes) — Unstructured variance-covariance structure — Test statistics — Wilk’s Λ statistic | E | k 1 ∏ Λ = = | H E | 1 1 + + λ i = i

  18. Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion

  19. Korea Association Resoure (KARE) Project • To identify genetic factors of quantitative clinical traits and life-style Objective related diseases (eg. T2DM) from Genome-Wide Association Study using population-based cohorts • Over 10,000 subjects from two community-based cohorts in Korea Genotyping (Ansung & Ansan cohorts) • Affymetrix 5.0 First high density large scale GWA Study performed in the East Asian population Courtesy of KNIH

  20. KARE KARE: Characteristics Baseline study Ansung Ansan Participants 5,018 5,020 2,778/ 2,497/ Sex (women/men) 2,240 2,523 Age (mean) 55.5 49.1 40th (%) 31.2 62.8 50th (%) 29.1 23.0 60> (%) 39.6 14.3 Courtesy of KNIH

  21. KARE data — Data Description — 8,842 subjects from two community-based cohorts in Korea (Ansung& Ansan cohorts) — Filtering Threshold — HWE < 10 -6 — MAF < 0.01 — Missing Proportion in each genotype > 0.05 — Missing imputation: HapMap JPT/CHB reference panel — SNPs: 327,872

  22. Obesity — Obesity related phenotypes — BMI, Waist circumference, Weight, and WHR — BMI = Weight/Height(m) 2 — WHR = Waist / Hip circumference — Which genes are associated with obesity related phenotypes? BMI Waist Weight WHR BMI 1 Waist 0.7607 1 Weight 0.7308 0.6862 1 WHR 0.3819 0.7971 0.2920 1

  23. Obesity: Univariate Analysis — Most GWAS are conducted under this framework — Focus on one phenotype and single SNP — Obesity related phenotypes — Separate univariate analyses Y Sex Age Area SNP BMI: = β + β + β + β + β + ε 1 01 11 21 31 41 1 Y Sex Age Area SNP Waist: = β + β + β + β + β + ε 2 02 12 22 32 42 2 Y Sex Age Area SNP Weight: = β + β + β + β + β + ε 3 03 13 23 33 43 3 Y Sex Age Area SNP WHR: = β + β + β + β + β + ε 4 04 14 24 34 44 4

  24. Obesity: Univariate Analysis Results — Number of significant genetic variants at a given level of α ≤ 10 -7 10 -7 < p ≤ 10 -6 10 -6 < p ≤ 10 -5 10 -5 < p ≤ 10 -4 P-value BMI 1 0 6 23 Waist 0 0 7 39 Weight 0 3 5 32 WHR 0 4 7 25

  25. BMI Waist Weight WHR

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend