Polymorphic variation in the human genome and susceptibility to disease
Samuel Deutsch Samuel Deutsch PhD PhD Department of Genetic Medicine and Development Department of Genetic Medicine and Development University of Geneva University of Geneva
Polymorphic variation in the human genome and susceptibility to - - PowerPoint PPT Presentation
Polymorphic variation in the human genome and susceptibility to disease Samuel Deutsch PhD PhD Samuel Deutsch Department of Genetic Medicine and Development Department of Genetic Medicine and Development University of Geneva University of
Samuel Deutsch Samuel Deutsch PhD PhD Department of Genetic Medicine and Development Department of Genetic Medicine and Development University of Geneva University of Geneva
Very large amount of sequence variation in human populations
SNPs SNPs Microsatellites Microsatellites Large Large-
scale indels indels
Key to Key to Human Genetics Human Genetics
Phenotype (normal variation, Phenotype (normal variation, Disease Disease) ) Evolution Evolution Risk prediction, Life style Risk prediction, Life style Forensics Forensics Pharmacogenomics Pharmacogenomics, Personal medicine , Personal medicine
Clear genetic Clear genetic effect effect
Gene x environment Gene x environment interactions interactions
Affected sibling
General Population
Disease Disease frequency frequency due to due to genom e genom e sharing sharing
λs
Schizophrenia Schizophrenia 12 12 Asthma Asthma 8 8 Type I Type I diabetes diabetes 12 12 Crohn Crohn’ ’s s disease disease 25 25 Multiple Multiple sclerosis sclerosis 24 24 Aortic Aortic stenosis stenosis 59 59 Ventricular Ventricular septal septal defect defect 25 25 Cleft Cleft lip lip 40 40
λs broken further into multiple loci ! broken further into multiple loci !
Disease Disease frequency frequency I n I n Monozygotic Monozygotic versus versus Dizygotic Dizygotic tw ins tw ins Monozygotic Monozygotic Share Share 100% 100% of
alleles Dizygotic Dizygotic Share Share 50% 50% of
alleles % concordance % concordance MZ MZ DZ DZ Epilepsy Epilepsy 70 70 6 6 Multiple Multiple sclerosis sclerosis 18 18 2 2 Type 1 Type 1 diabetes diabetes 40 40 5 5 Schizophrenia Schizophrenia 53 53 15 15 Osteoarthritis Osteoarthritis 32 32 16 16 Rheumatoid Rheumatoid arthritis arthritis 12 12 3 3 Psoriasis Psoriasis 72 72 15 15
Can calculate Can calculate heritability heritability using VC methods using VC methods
Kinship (calculated as average IBD) Covariance for phenotype
Slope = h2r
Variation in number number of repeats
Multi-allelic allelic in population in population
Highly informative informative
Mostly non non-functional functional
Most useful for Family studies Family studies 5 8 2, 5 2, 5 3, 8 3, 8 2, 3 2, 3 3, 5 3, 5 2, 8 2, 8 2, 3 2, 3
Can be used for Can be used for LINKAGE ANALYSIS LINKAGE ANALYSIS
1 3 2 4 5 6 1 6 5 4 3 2 14 13 12 11 10 9 8 7 15 16 17 18 19 20 21 22
Panel of Panel of Microsatellites Microsatellites evenly spaced throughout evenly spaced throughout genome genome Look at Look at co co-segregation segregation patterns of disease with patterns of disease with alleles of specific markers alleles of specific markers
Alignement et crossing-over
CHIASMA
Co Co-
segregation of alleles with disease depends on: disease depends on: 1.Chromosomal 1.Chromosomal localisation localisation. .
between between marker marker and and disease locus. disease locus.
Minimal Minimal recombination recombination region region
LOD score calculated by maximum LOD score calculated by maximum likelihood : likelihood : Likelihood of observation / likelihood observation by chance Likelihood of observation / likelihood observation by chance LOD > 3 LOD > 3 is usually considered to be significant on a genome is usually considered to be significant on a genome-wide basis wide basis
DEC-05
1794
Examples include : Examples include :
Parkinson's disease (4q21) (4q21)
Cystic Fibrosis (7q31)
Muscular dystrophy (X)
Deafness (about 45 different loci !) (about 45 different loci !)
Variation in single position single position
bi-
allelic in population in population
Less informative informative
Can be functional functional
Most useful in population studies population studies Most common type of variation, Most common type of variation, any two chromosomes differ any two chromosomes differ every every 600 bp 600 bp. . (about (about 10 million 10 million genome genome-
wide)
(SNPs, deletions/duplications, repeats, transposable elements) Coding variation leading to protein changes Non coding variation affecting transcription of genes Non coding variation affecting chromatin structure
If and allele i allele i in in gene x gene x is involved in disease pathogenesis, is involved in disease pathogenesis,
increase in frequency in affected groups in affected groups
Genotypes Genotypes N=60 N=60 N=60 N=60
Controls Controls Patients Patients
f(i f(i) Logistic regression Logistic regression χ2 test 2 test
Candidate gene: limited set of SNPs in set of : limited set of SNPs in set of candidate genes. In general gives a incomplete picture candidate genes. In general gives a incomplete picture
Indirect association: : Genome Genome-
wide set of SNPs, no set of SNPs, no prior hypothesis, potentially could give a complete view prior hypothesis, potentially could give a complete view
. Only possible with possible with important technology advances important technology advances. .
[ ]
) ( ) ( ) ( ) ( ) ( ) ( ) (
2 2
b f B f a f A f B f A f AB f r − =
LD LD can be measured can be measured in several ways. For in several ways. For association studies association studies rsq rsq (coefficient of (coefficient of determination) is determination) is most common most common 8 8 'tag' SNPs 'tag' SNPs for 50 SNPs in for 50 SNPs in region region
www.hapmap.org www.hapmap.org
Ultimate goal Ultimate goal: find the : find the minimal set of SNPs minimal set of SNPs that capture that capture most of the sequence variation most of the sequence variation information to perform information to perform association studies association studies.
Hirschhorn Hirschhornand Daly, Nat genet rev 2005 and Daly, Nat genet rev 2005
Based on Based on affymetrix affymetrix array array technology technology New 300K bead New 300K bead array based on array based on HapMap HapMap
Ilumina Ilumina 300K array expected to 300K array expected to capture about 70% capture about 70% of common variation
Genome Genome-
wide association feasible Cost Cost
Many studies underpowered
. For diseases with complex inheritance complex inheritance (λs< < 20 )and many loci with 20 )and many loci with minor contributions minor contributions (each allele with GRR< 3.0) (each allele with GRR< 3.0) 1000s rather than 100s of samples needed 1000s rather than 100s of samples needed !
How to deal with multiple testing multiple testing problem ? problem ?
Need new methods to extract G x G and G x E interactions interactions !
DG Clayton , JA Todd et al 2 0 0 5
λ=1.2 =1.2 λ=2.0 =2.0 λ=1.3 =1.3 λ=1.5 =1.5
VITAL-IT PLATFORM NCCR Genomics Platform
Large difference in natural history of disease, two interesting groups: groups:
Exposed non infected
Infected non progressors progressors (rare, Familial segregation) (rare, Familial segregation)
Highly concordant susceptibility in twins twins
Several known polymorphisms polymorphisms known to play a role. known to play a role.
Viral Viral Co Co-
receptors Chemotactic Chemotactic molecules molecules Co Co-
receptor ligands ligands
Main aim Main aim : :
Develop cellular system cellular system in which to dissect in which to dissect genetic genetic factors factors Validation Validation: :
Can an in in-
vitro cellular system cellular system re re-
capitulate in in-
vivo situation ? situation ?
Would such a system be reproducible reproducible ? ?
(CMVpromoter)
expression by FACS (72h)
15 CEPH families = ~200 individuals Measured cellular phenotypes in triplicate Obtained information for 2600 SNPs genomewide - publicly available in DBs.
CEPH : Centre d'Etude du Polymorphisme Humain
Trait H2r p value 1 CMVGFPper 0.5367977 0.0000016 2 CMVGFPMFI all 0.4354729 0.0000087 10 CD39per 0.8036432 0.0003233 11 CD39MFI all 1 8.53E-68 12 CD39ratio 1 6.27E-55 13 LMP1per 0.496049 3.47E-10 14 LMP1MFI all 0.732135 1.24E-14 15 LMP1ratio 0.6194324 2.01E-17 16 CD11aMFI all 0.996293 0.0000185 17 CD11aper 0.1366545 0.1273509 18 CD11aratio 1 0.0002416 19 CD19MFI all 0.9050112 3.48E-13 20 CD19per 0.8286046 0.0000001 21 CD19ratio 1 0.0000579 22 CD21MFI all 1 0.000226 23 CD21per 1 2.41E-10 24 CD21ratio 0.4659657 0.0136537 25 CD23MFI all 0.6998939 0.0000449 26 CD23per 0.7914027 0.005566 27 CD23ratio 0.517473 0.0006306 28 CD39MFI all 1 2.28E-23
Susceptibility Susceptibility EBV marker EBV marker Innate immunity Innate immunity
CMVper Multipoint
0.5 1 1.5 2 2.5 3 3.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Chromosomes
Simulation Simulation threshold threshold (vital (vital-it) it)
0.5 1 1.5 2 2.5 3 3.5 4 4.5
Chromosome 8 CMVper association: Tag SNPs 3Mb centered on linkage finding
Bonferroni threshold
Tag SNPs Tag SNPs
Analysis of Variance for CMV GFPper Source DF SS MS F P rs257288 1 932.4 932.4 18.30 0.000 Error 53 2699.7 50.9 Total 54 3632.0 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev
AG 7 35.934 8.800 (--------*--------) GG 48 23.580 6.896 (--*---)
Pooled StDev = 7.137 24.0 30.0 36.0 42.0
SNP SNP
Analysis of Variance for CMV GFPper Source DF SS MS F P rs257288 1 932.4 932.4 18.30 0.000 Error 53 2699.7 50.9 Total 54 3632.0 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev
AG 7 35.934 8.800 (--------*--------) GG 48 23.580 6.896 (--*---)
Pooled StDev = 7.137 24.0 30.0 36.0 42.0
SNP SNP
Chromosome 8 CMVper association: Fine mapping using all HapMap phase 2.0 data
Candidate SNP
Gene 1 Gene 1 Gene 2 Gene 2 Gene 3 Gene 3 Gene 4 Gene 4 Gene 5 Gene 5 Gene 6 Gene 6 Gene 7 Gene 7 Gene 8 Gene 8