Population Structure and Association Analysis 02-715 Advanced - PowerPoint PPT Presentation

Population Structure and Association Analysis 02-‑715 ¡Advanced ¡Topics ¡in ¡Computa8onal ¡ Genomics ¡

Population Structure and Association Analysis • Popula8on ¡structure ¡in ¡data ¡causes ¡false ¡posi8ves ¡ – Samples ¡in ¡the ¡case ¡popula8on ¡are ¡usually ¡more ¡related ¡ – Any ¡SNPs ¡more ¡prevalent ¡in ¡the ¡case ¡popula8on ¡will ¡be ¡found ¡ significantly ¡associated ¡with ¡the ¡trait. ¡

Accounting for Population Structure in Association Analysis • Needs ¡to ¡account ¡for ¡popula8on ¡structure ¡in ¡associa8on ¡ mapping. ¡ • Careful ¡study ¡design ¡with ¡each ¡popula8on ¡represented ¡in ¡ case/control ¡groups ¡in ¡a ¡balanced ¡way. ¡ – Can ¡be ¡hard ¡to ¡control ¡ – The ¡effect ¡of ¡cryp8c ¡popula8on ¡structure ¡

Family-based Design vs. Population- based Design • Family-‑based ¡studies ¡ – The ¡effect ¡of ¡popula8on ¡structure ¡can ¡be ¡controlled ¡by ¡the ¡use ¡of ¡ parents’ ¡genotypes. ¡ – In ¡prac8ce, ¡collec8ng ¡genotypes ¡from ¡mul8ple ¡individuals ¡in ¡a ¡family ¡ can ¡be ¡hard. ¡(e.g., ¡late-‑onset ¡diseases) ¡ • Popula8on-‑based ¡design ¡ – Data ¡collec8on ¡is ¡easier ¡for ¡a ¡large ¡number ¡of ¡unrelated ¡individuals ¡ than ¡a ¡large ¡number ¡of ¡families. ¡ – The ¡control ¡samples ¡can ¡be ¡reused ¡in ¡different ¡studies. ¡

Accounting for Population Structure in Association Analysis • Family-‑based ¡method ¡ – Transmission ¡disequilibrium ¡test ¡(TDT) ¡ • Popula8on-‑based ¡method ¡ – Genomic ¡control ¡(Devlin ¡& ¡Roeder, ¡Biometrics ¡1999) ¡ – Structured ¡associa8on ¡(Pritchard ¡et ¡al., ¡AJHG ¡2000) ¡ – EigenStrat: ¡principal ¡component ¡analysis ¡(Price ¡et ¡al., ¡Nature ¡Gene8cs ¡ 2006) ¡

Transmission Disequilibrium Test (TDT) • Genotype ¡affected ¡individuals ¡and ¡their ¡parents ¡(trio) ¡ Non-‑transmi+ed ¡alleles ¡ Transmi+ed ¡alleles ¡ M ¡ m ¡ total ¡ M ¡ a ¡ b ¡ a+b ¡ m ¡ c ¡ d ¡ c+d ¡ Total ¡ a+c ¡ b+d ¡ 2N ¡ • Null ¡hypothesis: ¡(b/(b+c), ¡c/(b+c)) ¡is ¡compa8ble ¡with ¡(0.5, ¡0.5) ¡ • Test ¡sta8s8c ¡is ¡given ¡as ¡(b-‑c) 2 /(b+c) ¡ • The ¡non-‑transmi[ed ¡alleles ¡play ¡the ¡role ¡of ¡controls ¡

Genomic Control (GC) • Idea: ¡Use ¡the ¡SNPs ¡that ¡are ¡not ¡associated ¡with ¡the ¡trait ¡to ¡remove ¡ the ¡effect ¡of ¡popula8on ¡stra8fica8on ¡ • Genotype ¡data ¡consist ¡of ¡ – Candidate ¡genes ¡to ¡be ¡tested ¡ – L ¡supplementary ¡loci ¡(null ¡loci) ¡for ¡es8ma8ng ¡the ¡infla8on ¡factor ¡λ ¡ • GC ¡uses ¡the ¡infla8on ¡factor ¡λ ¡to ¡correct ¡the ¡associa8on ¡sta8s8c ¡of ¡ the ¡SNP ¡in ¡the ¡candidate ¡gene ¡ • Limita8on: ¡the ¡infla8on ¡factor ¡λ ¡is ¡assumed ¡to ¡be ¡the ¡same ¡across ¡ the ¡genome, ¡ignoring ¡popula8on ¡admixture ¡ Devlin ¡& ¡Roeder, ¡Biometrics ¡1999 ¡

STRAT: Structured Association (Pritchard et al., AJHG 2000) • Idea: ¡Within ¡each ¡subpopula8on, ¡an ¡associa8on ¡between ¡a ¡gene8c ¡ marker ¡and ¡the ¡trait ¡is ¡a ¡true ¡associa8on. ¡ • Two-‑stage ¡method ¡ – Step ¡1: ¡Using ¡Structure ¡(Pritchard ¡et ¡al., ¡Gene8cs ¡2000) ¡and ¡unlinked ¡ gene8c ¡markers, ¡ ¡ • es8mate ¡the ¡popula8on ¡structure ¡ • assign ¡sampled ¡individuals ¡to ¡puta8ve ¡subpopula8ons ¡ – Step ¡2: ¡ ¡ • Test ¡for ¡associa8on ¡within ¡the ¡subpopula8ons ¡inferred ¡in ¡Step ¡1 ¡ • Limita8on ¡ ¡ – Running ¡Structure ¡is ¡computa8onally ¡demanding ¡ Pritchard ¡et ¡al., ¡AJHG ¡2000 ¡

STRAT: Step 2 • Given ¡ancestry ¡propor8ons ¡ q k (i) ¡for ¡popula8on ¡ k , ¡individual ¡ i ¡ es8mated ¡by ¡STRUCTURE ¡ • H 0 : ¡The ¡probability ¡model ¡for ¡genotypes ¡ c ’s ¡under ¡the ¡null ¡ hypothesis ¡of ¡no ¡associa8on ¡ • H 1 : ¡The ¡probability ¡model ¡for ¡genotypes ¡ c ’s ¡the ¡alterna8ve ¡ hypothesis ¡of ¡associa8on ¡

STRAT: Step 2 • Likelihood ¡ra8o ¡test: ¡ – Large ¡values ¡indicate ¡that ¡the ¡alterna8ve ¡hypothesis ¡explains ¡the ¡data ¡ be[er. ¡

Simulation Studies: No Admixture • Assume ¡two ¡discrete ¡popula8ons ¡ • Simulate ¡genotypes ¡of ¡150 ¡affected ¡and ¡150 ¡control ¡ individuals ¡at ¡100 ¡unlinked ¡loci ¡ – With ¡sample ¡size ¡N, ¡we ¡have ¡2N ¡chromosomes ¡ – Assume ¡two ¡popula8ons ¡have ¡split ¡0.05N ¡genera8ons ¡ago ¡without ¡ migra8on ¡ – Controls: ¡half ¡of ¡the ¡controls ¡came ¡from ¡each ¡of ¡the ¡two ¡ subpopula8ons ¡ – Affected ¡group: ¡100 ¡from ¡popula8on ¡1, ¡50 ¡from ¡popula8on ¡2 ¡ ¡

STRAT: Simulation Results • Rejec8on ¡rates ¡under ¡ the ¡null ¡hypothesis ¡of ¡ no ¡associa8on ¡ • p 1 ,p 2 : ¡allele ¡frequencies ¡ for ¡popula8ons ¡1 ¡and ¡2 ¡ at ¡the ¡given ¡locus ¡

Simulation Studies: With Admixture • Assume ¡two ¡discrete ¡popula8ons ¡ • Simulate ¡genotypes ¡of ¡500 ¡affected ¡and ¡500 ¡control ¡ individuals ¡at ¡150 ¡unlinked ¡microsatellite ¡loci ¡ – With ¡sample ¡size ¡N, ¡we ¡have ¡2N ¡chromosomes ¡ – Assume ¡two ¡popula8ons ¡have ¡split ¡0.15N ¡genera8ons ¡ago, ¡followed ¡by ¡ two ¡genera8ons ¡of ¡admixing ¡ – Controls: ¡random ¡draws ¡from ¡the ¡whole ¡popula8on ¡ – Affected ¡group: ¡random ¡draws ¡from ¡the ¡whole ¡popula8on ¡assuming ¡a ¡ disease ¡risk ¡mode ¡for ¡grand ¡parents ¡

Structure: Simulation Results • Learning ¡popula8on ¡structure ¡using ¡genotypes ¡from ¡two ¡ recently ¡admixed ¡popula8ons ¡ – Dashed ¡line ¡– ¡case ¡group ¡

STRAT: Simulation Results • Rejec8on ¡rates ¡under ¡ the ¡null ¡hypothesis ¡ • p 1 ,p 2 : ¡allele ¡frequencies ¡ for ¡popula8ons ¡1 ¡and ¡2 ¡ at ¡the ¡given ¡locus ¡

TDT vs. STRAT • TDT ¡ – Requires ¡genotyping ¡parents ¡of ¡the ¡affected ¡offspring ¡ • STRAT ¡ – Requires ¡genotypes ¡for ¡addi8onal ¡loci ¡to ¡infer ¡popula8on ¡structure ¡ with ¡ STRUCTURE ¡

EigenStrat • Structured ¡associa8on ¡approach ¡ • Step ¡1: ¡Run ¡PCA ¡on ¡genotype ¡data ¡to ¡infer ¡the ¡popula8on ¡ structure ¡ • Step ¡2: ¡Perform ¡associa8on ¡analysis ¡afer ¡correc8ng ¡for ¡the ¡ popula8on ¡effects ¡in ¡genotype/phenotype ¡data ¡ • Advantages: ¡low ¡computa8onal ¡cost ¡compared ¡to ¡STRAT ¡

EigenStrat: Structured Association with PCA • Step ¡1: ¡(Inferring ¡Ancestry) ¡PCA ¡is ¡applied ¡to ¡genotype ¡data ¡to ¡ infer ¡con8nuous ¡axes ¡of ¡gene8c ¡varia8on ¡ ¡ Price ¡et ¡al., ¡Nature ¡Gene8cs ¡2006 ¡

What are the new axes? Original ¡Variable ¡B ¡ PC ¡2 ¡ PC ¡1 ¡ Original ¡Variable ¡A ¡ • ¡Orthogonal ¡direc8ons ¡of ¡greatest ¡variance ¡in ¡data ¡ • ¡Projec8ons ¡along ¡PC1 ¡discriminate ¡the ¡data ¡most ¡along ¡any ¡one ¡axis ¡

EigenStrat: Structured Association with PCA • Step ¡2: ¡(Removing ¡Ancestry ¡Effects) ¡Genotype ¡at ¡a ¡candidate ¡ SNP ¡and ¡phenotype ¡are ¡con8nuously ¡adjusted ¡by ¡amounts ¡ a[ributable ¡to ¡ancestry ¡along ¡each ¡axis ¡ • Step ¡3: ¡(Associa8on ¡test) ¡

Simulation Procedure • Given ¡ F ST , ¡For ¡each ¡SNP ¡ – Draw ¡an ¡ancestral ¡popula8on ¡allele ¡frequency ¡ p ¡from ¡uniform ¡ distribu8on ¡[0.1 ¡0.9] ¡ – Allele ¡frequencies ¡for ¡popula8ons ¡1 ¡and ¡2, ¡ p 1 ¡and ¡ p 2 , ¡are ¡drawn ¡from ¡ Beta( p (1-‑ F ST )/ F ST , ¡(1-‑ p )(1-‑ F ST )/ F ST ) ¡ – Draw ¡SNPs ¡using ¡popula8on ¡allele ¡frequencies ¡ p 1 ¡and ¡ p 2 ¡

Population Structure and Association Analysis 02-715 Advanced - PowerPoint PPT Presentation

Population Structure and Association Analysis 02-715 Advanced Topics in Computa8onal Genomics Population Structure and Association Analysis Popula8on structure in data causes false

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

1 Models for population structure Models for population structure Multi-group Spatial mixing

World Population Trends January 26, 2012 World Population Trends World Population Growth

Population Structure Population Structure Nonrandom Mating HWE assumes that mating is random in

Population Health Update 2.1.2019 Board of Trustee Retreat 1 2 Topics AHS Population Health

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with

Population Mean and Standard Deviation In a population with N members Population mean: = x 1 +

Math 211 Math 211 Lecture #10 Population Models September 17, 2003 2 Modeling Population

Math 211 Math 211 Lecture #10 Population Models September 17, 2003 2 Modeling Population

Tanzania Market in Numbers bankable 54 million Total population population:28million 2/3 of

to inform population measure ment Population Association of New Zealand Conference 2019 Dr Jaimie

PSS718 - Data Mining Lecture 7 - Association Analysis Asst.Prof.Dr. Burkay Gen Hacettepe

NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION NATIONAL HOME WATCH ASSOCIATION

STRUCTURE STRUCTURE Highlight the structure of Highlight the structure of material material

Part IV I/O System Chapter 12: Mass Storage Structure Chapter 12: Mass Storage Structure 1

Population and Science, , technology technology and Population and Science and innovation

Natural Selection 02-715 Advanced Topics in Computa8onal Genomics

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models Main

Stat 8931 (Aster Models) Lecture Slides Deck 5 Lande-Arnold Theory meets Aster Models Charles J.

Human origins and evolution Introduction to Evolution and Scientific Inquiry Dr. Stephanie J.

Organic compounds: contain C Organic Inorganic compounds: no C Chemistry Carbon:

Using Genetic Distance to Infer the Accuracy of Genomic Prediction (for Quantitative Traits)

Ontology Engineering Lecture 8: Bottom-up Ontology Development Maria Keet email:

Tempo and mode in language evolution Quentin D. Atkinson Institute of Cognitive and Evolutionary

Sambuz

Useful Links

Newsletter

Mail Us