[PPT] - Polymorphic variation in the human genome and susceptibility to PowerPoint Presentation

SLIDE 1

Polymorphic variation in the human genome and susceptibility to disease

Samuel Deutsch Samuel Deutsch PhD PhD Department of Genetic Medicine and Development Department of Genetic Medicine and Development University of Geneva University of Geneva

SLIDE 2

Human genome sequence Human genome sequence

Only first Phase ! Only first Phase ! Consensus Consensus sequence for sequence for species species Annotation possible ! Annotation possible !

SLIDE 3

Human genome sequence: Human genome sequence: D

Di iv ve er rs si it ty y

Very large amount of sequence variation in human populations

SNPs SNPs Microsatellites Microsatellites Large Large-

scale

scale indels indels

Key to Key to Human Genetics Human Genetics

SLIDE 4

Why is sequence D Why is sequence Di iv ve er rs si it ty y important ? important ?

Phenotype (normal variation, Phenotype (normal variation, Disease Disease) ) Evolution Evolution Risk prediction, Life style Risk prediction, Life style Forensics Forensics Pharmacogenomics Pharmacogenomics, Personal medicine , Personal medicine

SLIDE 5

Genes and disease Genes and disease

Is a trait Is a trait genetically genetically determined ? determined ?

Autosomal dominant

Autosomal dominant

Fully

Fully penetrant penetrant

Clear genetic Clear genetic effect effect

SLIDE 6

Sequence Variation : Sequence Variation : most traits are not monogenic ! most traits are not monogenic !

SLIDE 7

Sequence Variation : Sequence Variation : most traits are not monogenic ! most traits are not monogenic !

Gene x environment Gene x environment interactions interactions

SLIDE 8

Affected sibling

λs =

f f gen gen pop pop f sibs f sibs of

f affecteds

affecteds

General Population

Association studies: is the trait genetically determined ? Association studies: is the trait genetically determined ?

SLIDE 9

Disease Disease frequency frequency due to due to genom e genom e sharing sharing

λs

Schizophrenia Schizophrenia 12 12 Asthma Asthma 8 8 Type I Type I diabetes diabetes 12 12 Crohn Crohn’ ’s s disease disease 25 25 Multiple Multiple sclerosis sclerosis 24 24 Aortic Aortic stenosis stenosis 59 59 Ventricular Ventricular septal septal defect defect 25 25 Cleft Cleft lip lip 40 40

Association studies: is the trait genetically determined ? Association studies: is the trait genetically determined ?

λs broken further into multiple loci ! broken further into multiple loci !

SLIDE 10

Disease Disease frequency frequency I n I n Monozygotic Monozygotic versus versus Dizygotic Dizygotic tw ins tw ins Monozygotic Monozygotic Share Share 100% 100% of

f alleles

alleles Dizygotic Dizygotic Share Share 50% 50% of

f alleles

alleles % concordance % concordance MZ MZ DZ DZ Epilepsy Epilepsy 70 70 6 6 Multiple Multiple sclerosis sclerosis 18 18 2 2 Type 1 Type 1 diabetes diabetes 40 40 5 5 Schizophrenia Schizophrenia 53 53 15 15 Osteoarthritis Osteoarthritis 32 32 16 16 Rheumatoid Rheumatoid arthritis arthritis 12 12 3 3 Psoriasis Psoriasis 72 72 15 15

Association studies: is the trait genetically determined ? Association studies: is the trait genetically determined ?

SLIDE 11

Is a quantitative trait genetically controlled ? VP = VE + VG

Total variance of a trait What fraction is genetic ?

h2 = VG / VP

Can calculate Can calculate heritability heritability using VC methods using VC methods

SLIDE 12

Heritability

Kinship (calculated as average IBD) Covariance for phenotype

Slope = h2r

SLIDE 13

Sequence Variation : Sequence Variation : Types and uses Types and uses

Microsatellites Microsatellites

Variation in

Variation in number number of repeats

f repeats
Multi

Multi-allelic allelic in population in population

Highly

Highly informative informative

Mostly

Mostly non non-functional functional

Most useful for

Most useful for Family studies Family studies 5 8 2, 5 2, 5 3, 8 3, 8 2, 3 2, 3 3, 5 3, 5 2, 8 2, 8 2, 3 2, 3

Pedigree Pedigree

Can be used for Can be used for LINKAGE ANALYSIS LINKAGE ANALYSIS

SLIDE 14

Sequence Variation : Sequence Variation : Linkage Analysis Linkage Analysis

1 3 2 4 5 6 1 6 5 4 3 2 14 13 12 11 10 9 8 7 15 16 17 18 19 20 21 22

Panel of Panel of Microsatellites Microsatellites evenly spaced throughout evenly spaced throughout genome genome Look at Look at co co-segregation segregation patterns of disease with patterns of disease with alleles of specific markers alleles of specific markers

SLIDE 15

Sequence Variation : Sequence Variation : Linkage Analysis Linkage Analysis

Alignement et crossing-over

CHIASMA

Co Co-

segregation of alleles with

segregation of alleles with disease depends on: disease depends on: 1.Chromosomal 1.Chromosomal localisation localisation. .

2. Physical/Genetic distance
2. Physical/Genetic distance

between between marker marker and and disease locus. disease locus.

SLIDE 16

Sequence Variation : Sequence Variation : Linkage Analysis Linkage Analysis

Minimal Minimal recombination recombination region region

SLIDE 17

Sequence Variation : Sequence Variation : Linkage Analysis Linkage Analysis

LOD score calculated by maximum LOD score calculated by maximum likelihood : likelihood : Likelihood of observation / likelihood observation by chance Likelihood of observation / likelihood observation by chance LOD > 3 LOD > 3 is usually considered to be significant on a genome is usually considered to be significant on a genome-wide basis wide basis

SLIDE 18

Mapping monogenic disorders: Mapping monogenic disorders: Great success story ! Great success story ! Genes with mutations causing human disorders Total ~ 25,000 genes

DEC-05

1794

Examples include : Examples include :

Parkinson's disease

Parkinson's disease (4q21) (4q21)

Cystic Fibrosis (7q31)

Cystic Fibrosis (7q31)

Muscular dystrophy (X)

Muscular dystrophy (X)

Deafness

Deafness (about 45 different loci !) (about 45 different loci !)

SLIDE 19

Linkage Analysis: Linkage Analysis: Limits Limits

SLIDE 20

Sequence Variation : Sequence Variation : SNPs SNPs

Variation in

Variation in single position single position

bi

bi-

allelic

allelic in population in population

Less

Less informative informative

Can be

Can be functional functional

Most useful in

Most useful in population studies population studies Most common type of variation, Most common type of variation, any two chromosomes differ any two chromosomes differ every every 600 bp 600 bp. . (about (about 10 million 10 million genome genome-

wide)

wide)

SLIDE 21

Functional consequences of variation

Sequence variation

(SNPs, deletions/duplications, repeats, transposable elements) Coding variation leading to protein changes Non coding variation affecting transcription of genes Non coding variation affecting chromatin structure

SLIDE 22

Sequence Variation : Sequence Variation : SNPs SNPs

SLIDE 23

Population Population-

based association studies

based association studies

If and

If and allele i allele i in in gene x gene x is involved in disease pathogenesis, is involved in disease pathogenesis,

ne expects a significant
ne expects a significant increase in frequency

increase in frequency in affected groups in affected groups

vs. control.
vs. control.

Genotypes Genotypes N=60 N=60 N=60 N=60

Controls Controls Patients Patients

f(i f(i) Logistic regression Logistic regression χ2 test 2 test

SLIDE 24

Population Population-

based association studies

based association studies

Two main approaches : Two main approaches :

Candidate gene

Candidate gene: limited set of SNPs in set of : limited set of SNPs in set of candidate genes. In general gives a incomplete picture candidate genes. In general gives a incomplete picture

f phenotype determination.
f phenotype determination.
Indirect association

Indirect association: : Genome Genome-

wide

wide set of SNPs, no set of SNPs, no prior hypothesis, potentially could give a complete view prior hypothesis, potentially could give a complete view

f phenotype determination. Depends on
f phenotype determination. Depends on LD
LD. Only

. Only possible with possible with important technology advances important technology advances. .

SLIDE 25

Association studies: Linkage disequilibrium Association studies: Linkage disequilibrium

[ ]

) ( ) ( ) ( ) ( ) ( ) ( ) (

2 2

b f B f a f A f B f A f AB f r − =

LD LD can be measured can be measured in several ways. For in several ways. For association studies association studies rsq rsq (coefficient of (coefficient of determination) is determination) is most common most common 8 8 'tag' SNPs 'tag' SNPs for 50 SNPs in for 50 SNPs in region region

SLIDE 26

Association studies: Association studies: HapMap HapMap project project

www.hapmap.org www.hapmap.org

Ultimate goal Ultimate goal: find the : find the minimal set of SNPs minimal set of SNPs that capture that capture most of the sequence variation most of the sequence variation information to perform information to perform association studies association studies.

SLIDE 27

Association studies: Genotyping technologies Association studies: Genotyping technologies

Hirschhorn Hirschhornand Daly, Nat genet rev 2005 and Daly, Nat genet rev 2005

Based on Based on affymetrix affymetrix array array technology technology New 300K bead New 300K bead array based on array based on HapMap HapMap

Ilumina Ilumina 300K array expected to 300K array expected to capture about 70% capture about 70% of common variation

f common variation

Genome Genome-

wide association feasible

wide association feasible Cost Cost

SLIDE 28

Association studies: Main problems Association studies: Main problems

Many studies

Many studies underpowered

underpowered. For diseases with

. For diseases with complex inheritance complex inheritance (λs< < 20 )and many loci with 20 )and many loci with minor contributions minor contributions (each allele with GRR< 3.0) (each allele with GRR< 3.0) 1000s rather than 100s of samples needed 1000s rather than 100s of samples needed !

How to deal with

How to deal with multiple testing multiple testing problem ? problem ?

Need new methods to extract G x G and G x E

Need new methods to extract G x G and G x E interactions interactions !

DG Clayton , JA Todd et al 2 0 0 5

λ=1.2 =1.2 λ=2.0 =2.0 λ=1.3 =1.3 λ=1.5 =1.5

SLIDE 29

Targeted drugs in the near future ? Targeted drugs in the near future ?

SLIDE 30

VITAL-IT PLATFORM NCCR Genomics Platform

Collaborative study between the labs of Collaborative study between the labs of S. S. Antonarakis Antonarakis, , A.

A. Telenti

Telenti ( (Corinne Corinne Loeuillet Loeuillet) and ) and

J. Beckmann
J. Beckmann

Mapping Mapping genetic genetic susceptibility susceptibility to to HIV infection HIV infection

SLIDE 31

Susceptibility to HIV: Genetics role ? Susceptibility to HIV: Genetics role ?

Large difference in natural history of disease, two interesting

Large difference in natural history of disease, two interesting groups: groups:

Exposed non infected

Exposed non infected

Infected non

Infected non progressors progressors (rare, Familial segregation) (rare, Familial segregation)

Highly concordant susceptibility in

Highly concordant susceptibility in twins twins

Several known

Several known polymorphisms polymorphisms known to play a role. known to play a role.

SLIDE 32

Susceptibility to HIV: known genetic factors Susceptibility to HIV: known genetic factors

Viral Viral Co Co-

receptors

receptors Chemotactic Chemotactic molecules molecules Co Co-

receptor

receptor ligands ligands

SLIDE 33

Susceptibility to HIV: viral life cycle Susceptibility to HIV: viral life cycle

SLIDE 34

Susceptibility to HIV: cellular system Susceptibility to HIV: cellular system

Main aim Main aim : :

Develop

Develop cellular system cellular system in which to dissect in which to dissect genetic genetic factors factors Validation Validation: :

Can an

Can an in in-

vitro

vitro cellular system cellular system re re-

capitulate

capitulate in in-

vivo

vivo situation ? situation ?

Would such a system be

Would such a system be reproducible reproducible ? ?

SLIDE 35

Cell transduction of β-lymphoblastoid cells
VSV-G pseudotyped lentiviral vector, expression of eGFP

(CMVpromoter)

infection by spinoculation (3000rpm, 3h) wash, detection of eGFP

expression by FACS (72h)

An in vitro system for identification of An in vitro system for identification of lentiviral lentiviral susceptibility susceptibility

SYSTEM: SYSTEM:

SLIDE 36

15 CEPH families = ~200 individuals Measured cellular phenotypes in triplicate Obtained information for 2600 SNPs genomewide - publicly available in DBs.

Susceptibility to HIV: Genetic analysis Susceptibility to HIV: Genetic analysis

SLIDE 37

CEPH families

CEPH : Centre d'Etude du Polymorphisme Humain

Created in 1984 to provide resources for human genome mapping
We used 15 families (N=200)

SLIDE 38

Trait H2r p value 1 CMVGFPper 0.5367977 0.0000016 2 CMVGFPMFI all 0.4354729 0.0000087 10 CD39per 0.8036432 0.0003233 11 CD39MFI all 1 8.53E-68 12 CD39ratio 1 6.27E-55 13 LMP1per 0.496049 3.47E-10 14 LMP1MFI all 0.732135 1.24E-14 15 LMP1ratio 0.6194324 2.01E-17 16 CD11aMFI all 0.996293 0.0000185 17 CD11aper 0.1366545 0.1273509 18 CD11aratio 1 0.0002416 19 CD19MFI all 0.9050112 3.48E-13 20 CD19per 0.8286046 0.0000001 21 CD19ratio 1 0.0000579 22 CD21MFI all 1 0.000226 23 CD21per 1 2.41E-10 24 CD21ratio 0.4659657 0.0136537 25 CD23MFI all 0.6998939 0.0000449 26 CD23per 0.7914027 0.005566 27 CD23ratio 0.517473 0.0006306 28 CD39MFI all 1 2.28E-23

Susceptibility to HIV: Genetic analysis Susceptibility to HIV: Genetic analysis Heritability Heritability: :

Susceptibility Susceptibility EBV marker EBV marker Innate immunity Innate immunity

SLIDE 39

CMVper Multipoint

0.5 1 1.5 2 2.5 3 3.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Chromosomes

Susceptibility to HIV: Linkage Results I Susceptibility to HIV: Linkage Results I

Simulation Simulation threshold threshold (vital (vital-it) it)

SLIDE 40

0.5 1 1.5 2 2.5 3 3.5 4 4.5

log(p)

Chromosome 8 CMVper association: Tag SNPs 3Mb centered on linkage finding

Bonferroni threshold

Susceptibility to HIV: Association using Susceptibility to HIV: Association using HapMap HapMap

Tag SNPs Tag SNPs

SLIDE 41

Trait distribution according to phenotype

Analysis of Variance for CMV GFPper Source DF SS MS F P rs257288 1 932.4 932.4 18.30 0.000 Error 53 2699.7 50.9 Total 54 3632.0 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev

----+---------+---------+---------+-

AG 7 35.934 8.800 (--------*--------) GG 48 23.580 6.896 (--*---)

----+---------+---------+---------+-

Pooled StDev = 7.137 24.0 30.0 36.0 42.0

SNP SNP

Analysis of Variance for CMV GFPper Source DF SS MS F P rs257288 1 932.4 932.4 18.30 0.000 Error 53 2699.7 50.9 Total 54 3632.0 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev

----+---------+---------+---------+-

AG 7 35.934 8.800 (--------*--------) GG 48 23.580 6.896 (--*---)

----+---------+---------+---------+-

Pooled StDev = 7.137 24.0 30.0 36.0 42.0

SNP SNP

SLIDE 42

Chromosome 8 CMVper association: Fine mapping using all HapMap phase 2.0 data

Candidate SNP

Gene 1 Gene 1 Gene 2 Gene 2 Gene 3 Gene 3 Gene 4 Gene 4 Gene 5 Gene 5 Gene 6 Gene 6 Gene 7 Gene 7 Gene 8 Gene 8