Genotype Imputation in Genome-wide Association Studies
Fernando Rivadeneira 1,2
1Department of Internal Medicine
2Department of Epidemiology
Course “SNP’s and Human Diseases” Rotterdam November 12th, 2018
Genotype Imputation in Genome-wide Association Studies Fernando - - PowerPoint PPT Presentation
Genotype Imputation in Genome-wide Association Studies Fernando Rivadeneira 1,2 1 Department of Internal Medicine 2 Department of Epidemiology Course SNPs and Human Diseases Rotterdam November 12 th , 2018 Imputation Facilitates
1Department of Internal Medicine
2Department of Epidemiology
Course “SNP’s and Human Diseases” Rotterdam November 12th, 2018
GWAS data with missing genotypes Association Study of genotyped data Imputation to a reference panel GWAS imputed data Association Study of imputed data Adapted Marchini & Howie . Nat. Rev. 2010
4
Willer et al, Nat Genet 40: 161-9, 2008
Observed Genotypes
. . . . A . . . . . . . A . . . . A . . . . . . . G . . . . . . . C . . . . A . . .
9
Observed Genotypes
. . . . A . . . . . . . A . . . . A . . . . . . . G . . . . . . . C . . . . A . . .
Reference Haplotypes
C G A G A T C T C C T T C T T C T G T G C C G A G A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G G A T C T C C C G A C C T C A T G C C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T G T A C C G A G A C T C T C C G A C C T C G T G C C G A A G C T C T T T T C T C C T G T G C
10
Observed Genotypes
. . . . A . . . . . . . A . . . . A . . . . . . . G . . . . . . . C . . . . A . . .
Reference Haplotypes
C G A G A T C T C C T T C T T C T G T G C C G A G A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G G A T C T C C C G A C C T C A T G C C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T G T A C C G A G A C T C T C C G A C C T C G T G C C G A A G C T C T T T T C T C C T G T G C
11
Next step is impute missing genotypes from the reference using an algorithm which models each haplotype conditional on all others
Observed Genotypes
c g a g A t c t c c c g A c c t c A t g g c g a a G c t c t t t t C t c t c A t g c
Reference Haplotypes
C G A G A T C T C C T T C T T C T G T G C C G A G A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G G A T C T C C C G A C C T C A T G C C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T G T A C C G A G A C T C T C C G A C C T C G T G C C G A A G C T C T T T T C T C C T G T G C
12
Observed Genotypes
c g a g A t c t c c c g A c c t c A t g g c g a a G c t c t t t t C t c t c A t g c
Reference Haplotypes
C G A G A T C T C C T T C T T C T G T G C C G A G A T C T C C C G A C C T C A T G G C C A A G C T C T T T T C T T C T G T G C C G A A G C T C T T T T C T T C T G T G C C G A G A C T C T C C G A C C T T A T G C T G G G A T C T C C C G A C C T C A T G C C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T T T T C T T T T G T A C C G A G A C T C T C C G A C C T C G T G C C G A A G C T C T T T T C T C C T G T G C
missing genotypes), allowing higher LD and decreasing error
GG GA AA 0 1 2 0.68 Best guess Allele dose Example Best guess information is never used with low imputation quality scores! Little information is always better than NO information!
13
50 100 150 50 100 150
Imputed Experimental
Li et al, Nat Genet 38: 1049-54, 2006
Allele frequency P-value Odds ratio Imputed Genotyped Imputed Genotyped Imputed Genotyped .024 .021 2.5 x 10-6 6.3 x 10-6 2.57 2.20 .543 .540 5.3 x 10-6 1.1 x 10-5 1.33 1.31 .114 .136 2.0 x 10-5 4.1 x 10-5 1.47 1.41 .494 .490 6.6 x 10-5 5.5 x 10-5 1.28 1.28 .927 .924 7.5 x 10-5 9.0 x 10-5 1.72 1.65 .744 .753 1.4 x 10-4 3.9 x 10-4 1.33 1.30 .289 .291 1.7 x 10-4 1.2 x 10-4 1.27 1.28 .970 .973 1.9 x 10-4 3.6 x 10-5 2.47 2.58 .401 .361 6.3 x 10-4 1.6 x 10-3 1.26 1.22 .817 .816 9.5 x 10-4 1.0 x 10-3 1.31 1.30 .605 .605 9.9 x 10-4 1.2 x 10-3 1.23 1.22
Scott et al, Science 316: 1341-5, 2007
Hidden State Sm: The
Data Gm: Observed
Goal: Infer Sm
g A c c t c A t g g Iteration 1 t C t t t c A t g g g A c c t c A t g g Iteration 2 t C c c t c A t g c g A c c t c A t g g Iteration 3 t C c c t c A t g c g A c c t c A t g g Iteration 4 t C c c t c A t g c g A c c t c A t g g Consensus t C c c t c A t g c 1 1
3/4 3/4 1 1 1 1 1 3/4
Quality Score g A c c t c A t g g Reference Allele 1 1
7/4 7/4
2 2 2 2 2
5/4
Dosage
Estimated fractional count of the reference allele Best-guess or the most frequently
genotype guess across all iterations
Proportion of iterations where the guessed genotype agrees with the consensus
⇒ http://genome.sph.umich.edu/wiki/Minimac:_1000_Genomes_Imputation_Cookbook
Genotyping Data QC-ing Phasing & Imputing Analyzing
Genotyping Data QC-ing Phasing & Imputing Analyzing Genotyping Data QC-ing Phasing Imputing Imputing Imputing Analyzing Analyzing Analyzing
ID TYPE SNP1 SNP2 SNP3 SNP4 SNP5 RS3->232 ML_DOSE 2 2 2 2 2 RS3->2921 ML_DOSE 2 1 2 2 2 RS3->3370 ML_DOSE 1.999 1 2 2 2 RS3->3542 ML_DOSE 2 1 2 1.968 1.998 SNP Al1 Al2 Freq1 MAF Quality Rsq rs12828708 A G 0.9603 0.0397 0.9707 0.7232 rs10880855 T C 0.5149 0.4851 0.9991 0.9985 rs7979218 G A 0.9673 0.0327 0.9826 0.7903 rs7315793 C T 0.9537 0.0463 0.9554 0.6538 rs4768098 A G 0.6954 0.3046 0.9984 0.9971
.. .. .. n
2
Var(dosage) E(r with true genotypes) = E[Var(dosage)]
A 2 2 A A 2 2 2 2 2 2 2 2
where, E[Var(dosage )] under HWE = E(dosage ) [E(dosage )] 2 * 1 * (2* 1* ) 2 * 1 *2 (1 ) [2* 1*2 (1 )] 2 (1 )
AA AA AA AA A A A A A A A A
p p p p p p p p p p p p − = + − + = + − − + − = −
Original imputation Panel More comprehensive imputation Panel Larger imputation Panel
Before Imputation After Imputation
1000GP R06-2010 60 samples in reference 1000GP R08-2010 283 samples in reference APOE P= 9.7 x10-20 APOE MACH RSQ= 0.6 APOE MACH RSQ= 0.4 *Rotterdam Study I data on Alzheimer's dementia for IGAP consortium APOE P=7.1 x10-15 rs429358
– A more complete catalog of common variants and a catalog of rare variants – Using our imputation strategy to reconstruct haplotypes
– All sequence data available from Short Read Archive
1.31% 0.88% 0.52% 0.40% 0.00% 0.50% 1.00% 1.50% 60 100 200 500
1000G, Nature, 2010 Existing SNPs New SNPs
. . . . G . . . . . . . C . . . . A . . . . . . . G . . . . . . . C . . . . A . . . C G A G A C T C T C C G A C C T T A T G C C G A A G C T C T T T T C T C C T G T G C T G G G A T C T C C C G A C C T C A T G C C G A G A C T C T C C G A C C T C G T G C C G A G A T C T C C C G A C C T T G T G C C G A G A C T C T C C G A C C T T A T G C C G A G A C T C T C C G A C C T C G T G C C G A G A T C T C C C G A C C T T G T G C C G A A G C T C T T T T C T C C T G T G C T G G G A T C T C C C G A C C T C A T G C
GONL Reference Panel 1000G Reference Panel Study Sample Study Sample =
Eskil Kreiner
Increase in Power/Coverage
GENETIC INVESTIGATIONS OF ANTHROPOMETRIC TRAITS
McGill University,CA Brent Richards Vince Forgetta Houfeng Zheng (China) NIH/AGES, US/Iceland Tamara Harris Vilmundur Gudnason Albert Vernon-Smith Guðny Eiriksdottir University of Maryland, US Laura M Yerges- Armstrong Indiana University, US Daniel L Koller Michael J. Eacons Munro Peacock University of Pittsburgh, US Jane Cauley HKSC, Hong Kong Annie Kung Aarhus University, Denmark Bente Langdahl, Lisa Husted VU MC, The Netherlands Paul Lips, Natasja van Schoor Greek Osteoporosis Study Panagoula Kollia ErasmusMC, Netherlands André Uitterlinden Joyce van Meurs Pascal Arp Mila Jhamai Jeroen van Rooij Robert Kraaij Carola Zillikens Cornelia van Duiijn Albert Hofman Oscar Franco Vincent Jaddoe University of Ioannina, GRE Vangelis Evangelou Evangelia Ntzani John Ioannidis deCODE, Iceland Unnur Styrskarsdottir Unnur Thorsteinsdottir University of Edinburg, UK Stuart Ralston Omar Albagha James Wilson Nerea Alonso Oxford University, UK Jonathan Reeve Cambridge University, UK Stephen Kaptoge Bristol University, UK David Evans John Tobias John Kemp CHOP, Philadelphia, US Struan Grant Babette Zemmel Alessandra Chessi Shlgrenska Academy, SWE Claes Ohlsen Joel Eriksson Mattias Lotrenzon Liesbeth van den Put King’s College, UK Tim Spector ,Scott Wilson (UWE) Brisbane University, AUS Matthew Brown Emma Duncan University of Sidney, AUS John Eisman HSL Harvard, University, US Doug Kiel David Karasik Yi-Hsiang Hsu Boston University Adrienne Cupples Ching-Ti Liu University of Washington, US John Robbins University of Barcelona, SPA Susana Balcells & Daniel Gringberg Sheffield University, UK Eugene McCloskey University of Southampton, UK Cyrus Cooper & Elaine Denisson University of Aberdeen Lynne Hocking University of Ljubljani Janja Marc University of Western Australia Richard Prince & Joshua Lewis Women’s Health Initiative Rebecca Jackson Grazl University, Austria Barbara Obermayer-Pietsch Malmo University, Sweden Kristina Ǻkesson & Fiona McGuigan
Jose A. Riancho Quebec University, Canada Francois Rousseau Inst Biochem & Genetics, Rusia Elsa K. Khusnutdinova Phenotypes John Hopkins University, US Thomas J, Beck University of Geneve, SWI Didier Hans Functional work Rochester University, US Cheryl Ackert-Bicknell ErasmusMC, Netherlands Jerooen van den Peppel Bram van der Erden University of Oslo, Norrway Kaare Gautvik & Sjur Reppe Sanger Institute, Uk Vijay Yadav New York University, US Matthew Maurano University of Misoury, KC, US Lynda Bonewald