Topic outline - Quick look to the pioneers: HapMap - 1000 Genomes - - PowerPoint PPT Presentation
Topic outline - Quick look to the pioneers: HapMap - 1000 Genomes - - PowerPoint PPT Presentation
Carolina Medina Gomez PhD SNPs and Diseases Molecular School of Medicine Thursday, November 16 th , 2017 Topic outline - Quick look to the pioneers: HapMap - 1000 Genomes project -Description - Diversity Panel -The HRC consortium - Local
- Quick look to the pioneers: HapMap
- 1000 Genomes project
- Description
- Diversity Panel
- The HRC consortium
- Local Panels
- Acquire awareness on the implications of population diversity
- Comprehend the utility of large haplotype reference panels and
large biobank data
- Use this knowledge for the mapping of complex traits
Topic outline Learning Aims
AIM
Perform a comprehensive sampling of common genetic variation that may form the basis of phenotypic differences in humans
The HapMap Project
YRI CEU CHB+JPT
A second generation human haplotype map of over 3.1 million SNPs 2007, Nature 449: 851-861.
The HapMap Project
YRI CEU CHB+JPT
HapMap II r 22: Build 36 - 2007
270 samples
HapMap III r22 Build 36 - 2010
1,184 Samples – DEPICT, LDSC
Name Population # of samples ASW African ancestry in Southwest USA 53 CEU Utah residents with Northern and Western European ancestry from the CEPH collection 112 CHB Han Chinese in Beijing, China 137 CHD Chinese in Metropolitan Denver, Colorado 109 GIH Gujarati Indians in Houston, Texas 101 JPT Japanese in Tokyo, Japan 113 LWK Luhya in Webuye, Kenya 110 MEX Mexican ancestry in Los Angeles, California 58 MKK Maasai in Kinyawa, Kenya 156 TSI Toscani in Italia 102 YRI Yoruba in Ibadan, Nigeria 147
Integrating common and rare genetic variation in diverse populations 2010, Nature 467: 52-58.
Phase 1 – 2010 AIM Catalogue of human genetic variation sequencing whole genome of 1,092 individuals from 14 worldwide populations. Discover human genetic variations of all types (95% of variation > 1% frequency) at the population level
The 1000 Genomes project – Build 37
The 1000 Genomes Project Consortium 2010, Nature 467: 1061-1073.
Phase 3 – 2015 AIM Catalogue of human genetic variation sequencing whole genome of 2,504 individuals from 14 worldwide populations. Discover human genetic variations of all types (99% of variation > 1% frequency) at the population level
The 1000 Genomes project – Build 37
A global reference for human genetic variation 2015, Nature 526: 68-.
Phase 3 – 2014 AIM Catalogue of human genetic variation sequencing whole genome of 2,504 individuals from 14 worldwide populations. Discover human genetic variations of all types (99% of variation > 1% frequency) at the population level
The 1000 Genomes project – Build 37
A global reference for human genetic variation 2015, Nature 526: 68-.
The American Journal of Human Genetics 96, 37–53, January 8, 2015
Phased design
Generation R HapMap Imputation 3,021,329 SNPs 2,671,724 MAF>0.01 r2>=0.3 Generation R 1KG Imputation 47,072,644 SNPs 11,361,791 MAF>0.01 r2>=0.3
2012 2017
None of the two variants were present or tagged by HapMap variants (one is common)
Imputation Servers
Phase 1 – 2016 AIM To bring together as many whole-genome sequencing data sets as
- possible. This reference panel consists of 64,976 haplotypes at
39,235,157 SNPs.
The Haplotype reference consortium – Build 37
A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics 48 10
Phase 1 – 2016 AIM To bring together as many whole-genome sequencing data sets as
- possible. This reference panel consists of 64,976 haplotypes at
39,235,157 SNPs.
The Haplotype reference consortium – Build 37
A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics 48 10
34% increase r2 at 0.1% r2 at ~0.4%
http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in- the-uk-biobank
The Million Veteran Program began collecting data in 2011, and it has the goal of reaching 1 million participants by 2020
- r 2021.
Now… imagine if we combine data Predictions for next GIANT freeze 1.5 million
Analytical Issues!
Current challenges
Perception challenge: Are we using the correct multiple testing threshold, or we should change it as we are including more rare variants constituting independent test (LD low). Methodological challenge: Is it necessary to correct further for population stratification (implementation of mixed model) to avoid false-positive signals. Computational challenge: Can we store and analyze the data with our current computational power. Follow-up challenge Can we identify correctly variants/genes for follow-up studies.
The discovery of genetic variants associated with a trait
- r disease is determine by different parameters
We are surpassing the 1M barrier New imputation panels allow to explore variants with MAF ~0.1% More variants more opportunities to tag the causal variant
The new GWAS era is a treasure trove for making new fundamental discoveries in human genetics.
10 Years of GWAS Discovery: Biology, Function, and Translation. AJHG July. 2017
Resolution magnification
P<=6.6x10-9
Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis. Nature Genetics.
Original 2x2 scenario of genetic architecture needs to be redefined under the scope of the 1000G and other projects
Debatable Facts
- Understanding of human genome diversity is key for the design of
genetic studies
- Large and more comprehensive panels panels provide the best
performance and yield in terms of quality and MAF coverage resulting in greater power (even more so, in combination with MegaBiobanks!)
- Most of the novel variants to be discovered are of “low to rare” allele
frequency, highly population specific & enriched for functional aspects
- Upscaling of technology, either through interfacing with -omic data or