Introduction to Genetic Epidemiology CM van Duijn Genetic - - PowerPoint PPT Presentation

introduction to genetic epidemiology
SMART_READER_LITE
LIVE PREVIEW

Introduction to Genetic Epidemiology CM van Duijn Genetic - - PowerPoint PPT Presentation

Introduction to Genetic Epidemiology CM van Duijn Genetic Epidemiology Unit Gene Discovery Basic principles Candidate gene studies Genome screening Genome sequencing Genetic architecture disease Rationale Genetic


slide-1
SLIDE 1

Introduction to Genetic Epidemiology

CM van Duijn Genetic Epidemiology Unit

slide-2
SLIDE 2

Gene Discovery

  • Basic principles
  • Candidate gene studies
  • Genome screening
  • Genome sequencing
  • Genetic architecture disease
slide-3
SLIDE 3

Gene ⇒ Protein ⇒ Disease

Rationale Genetic Epidemiology

slide-4
SLIDE 4

Genetic code

AGGAGTCCAAAGCGCGCAGTGCGCAGCGCGCA CCAGTCGTGACTCCAAAGCGATTCGATAGCAAC CCGATCCTATGAGGGCGCAGGAGTCCAAAGCGC GCAGTGCGCGAGAGGAGTCGGAGTCCGGCAATT GCCCAATGCCGATCGAACGACGTAACCGACTTA GGCCAGAGAGCTAGCGATCCGACTCTAAGAGCA GCTAAAGACTCCAAAGCGATTCGATAGCAACCC GCCGATCGAAGGAGTCCAAAGTCGGAGTCCGGC AACAGTCGTTGCCCAATGCCGGCGATTCGAATC GAACGACGTAACGGCAACAGTCGTGACTTGCCC AATGCCCGACCAGTCGTGACACTCCAAAGTGCC CAATGCCGATCCGATTCGATAGCACCAATGCCGA TCCAAACGAACGACGTCCAAAAACCGACTT

slide-5
SLIDE 5

Genetic code

AGGAGTCCAAAGCGCGCAGTGCGCAGCGCGCA CCAGTCGTGACTCCAAAGCGATTCGATAGCAAC CCGATCCTATGAGGGCGCAGGAGTCCAAAGCGC GCAGTGCGCGAGAGGAGTCGGAGTCCGGCAATT GCCCAATGCCGATCGAACGACGTAACCGACTTA GGCCAGAGAGCTAGCGATCCGACTCTAAGAGCA GCTAAAGACTCCAAAGCGATTCGATAGCAACCC GCCGATCGAAGGAGGCCAAAGTCGGAGTCCGG CAACAGTCGTTGCCCAATGCCGGCGATTCGAATC GAACGACGTAACGGCAACAGTCGTGACTTGCCC AATGCCCGACCAGTCGTGACACTCCAAAGTGCC CAATGCCGATCCGATTCGATAGCACCAATGCCGA TCCAAACGAACGACGTCCAAAAACCGACTT

slide-6
SLIDE 6

Founder chromosome with disease associated mutation

Mutation = change in base pair

slide-7
SLIDE 7

Basic Rationale

A mutation/polymorphism causally related to the disease should be found more often in affected than unaffected individuals

slide-8
SLIDE 8

Recombination

slide-9
SLIDE 9

Recombination

slide-10
SLIDE 10

A A A A A A A A A A A A A A A

Mutation

slide-11
SLIDE 11

Region that is identical by descent (IBD) including the disease locus (haplotype) Founder chromosome with disease associated mutation Mutation

slide-12
SLIDE 12

Basic Rationale

A mutation/polymorphism not causally related to disease, but close to the disease gene should also be found more often in affected than unaffected individuals

slide-13
SLIDE 13

Association

Look for the hayfork in stead of the needle

slide-14
SLIDE 14

Candidate gene

protein disease

? Public health, Clinical decision

gene

?

Genome screen

disease gene

?

protein

?

New drug targets & biomarkers

Approaches to Gene Finding (indirect)

slide-15
SLIDE 15

Gene Discovery

  • Basic principles
  • Candidate gene studies
  • Genome screening
  • Genome sequencing
  • Genetic architecture disease
slide-16
SLIDE 16

Candidate gene approach

Unknown disease mutation,e.g. change in base pair Gene Promoter

  • Not translated into protein
  • May determine level protein
  • May determine function protein
  • May determine level protein
slide-17
SLIDE 17

Genetic code

AGGAGTCCAAAGCGCGCAGTGCGCAGCGCGCA CCAGTCGTGACTCCAAAGCGATTCGATAGCAAC CCGATCCTATGAGGGCGCAGGAGTCCAAAGCGC GCAGTGCGCGAGAGGAGTCGGAGTCCGGCAATT GCCCAATGCCGATCGAACGACGTAACCGACTTA GGCCAGAGAGCTAGCGATCCGACTCTAAGAGCA GCTAAAGACTCCAAAGCGATTCGATAGCAACCC GCCGATCGAAGGAGTCCAAAGTCGGAGTCCGGC AACAGTCGTTGCCCAATGCCGGCGATTCGAATC GAACGACGTAACGGCAACAGTCGTGACTTGCCC AATGCCCGACCAGTCGTGACACTCCAAAGTGCC CAATGCCGATCCGATTCGATAGCACCAATGCCGA TCCAAACGAACGACGTCCAAAAACCGACTT

slide-18
SLIDE 18

APOE COL2A1 Base pairs 3597 31 001 Amino Acids 299 1418 Exons 4 54

Diversity Genes

slide-19
SLIDE 19

Marker Allele 1 12 2 A 3 9 4 2 5 3

Candidate gene approach

Select markers in gene or its promoter using literature and bioinformatics and test these in affected and unaffected subjects

slide-20
SLIDE 20

Genetic Markers (SNPs)

  • Flag a locus on chromosome
  • May be located in / out gene
  • May be located in / out exon
slide-21
SLIDE 21

Example: Alzheimer’s disease (AD)

21

slide-22
SLIDE 22

Pathology Alzheimer’s disease (AD)

  • Senile plaques - amyloid Aβ
  • Neurofibrillary tangles - tau
  • Aβ amyloid angiopathy

APP MAPT

22

slide-23
SLIDE 23
  • 12,000 subjects aged 55 + years who have

been followed for 15 years

  • Screening for major diseases and risk factors

ever 5 years

  • 700 patients with Alzheimer’s disease
  • Genotyping: Taqman / Illumina 500 k
  • Basically compare the frequency of rare

variants in cases and controls

Rotterdam Study

23

slide-24
SLIDE 24

High-density genotyping >3,500,000 SNPs to:

  • validate SNPs, determine frequency, assays
  • determine the correlation structure of alleles and number of

independent haplotypes ENCODE: sequencing 10 typical 500kb regions

slide-25
SLIDE 25

HAPMAP defined blocks of linkage disequilibrium (LD) in genome

Block 1: LD Block 2: LD Block 3: LD Block 4: LD Block 5: LD Block 1: LD Block 2: LD Block 3: LD Block 4: LD Block 5: LD

Block are artificial but very useful

slide-26
SLIDE 26

Genetic variations in MAPT

26

slide-27
SLIDE 27

# Marker Position Frequency minor allele Cases Controls 1 hCV2536908 40526680 0.2371 0.2099 2 hCV341577 40538554 0.4454 0.4146 p<0.02 3 hCV9254243 40571807 0.3683 0.3736 4 hCV2032862 40598477 0.233 0.2813 p<0.01 5 hCV2032865 40603713 0.4187 0.4999 6 hCV2554844 40717672 0.4968 0.4938 7 hCV2541205 40828104 0.4708 0.4389 8 hCV2265271 41070456 0.1755 0.2126 9 hCV2544843 41235818 0.4495 0.3671 10 hCV2257689 41241147 0.459 0.3671 11 hCV2544830 41256855 0.446 0.4631 12 hCV2257669 41301901 0.1837 0.2049 13 hCV7450857 41340226 0.1887 0.235 14 hCV3202946 41350591 0.1347 0.1357 15 hCV3202949 41352389 0.4547 0.4368 16 hCV1016016 41375573 0.383 0.3536 17 hCV3202956 41381748 0.1863 0.2346 18 hCV7563692 41407682 0.1808 0.2135 19 hCV3202960 41424176 0.1695 0.1446 20 hCV2042903 41424329 0.2682 0.3078 21 hCV11936104 41439239 0.1734 0.2376 22 hCV2560317 41461242 0.4803 0.4819 23 hCV2264293 41465690 0.194 0.2194 24 hCV2560314 41472690 0.4335 0.4405 25 hCV11936132 41497167 0.1745 0.2074 26 hCV15858203 41511550 0.1912 0.2188 27 hCV7563831 41551932 0.1776 0.2084 28 hCV2560260 41560151 0.1659 0.1250 29 hCV338624 41604276 0.1565 0.1125 30 hCV2598655 41615467 0.1543 0.1325 31 hCV2554114 42150418 0.2083 0.2013 32 hCV2261778 42164185 0.1703 0.2013 33 hCV2261785 42184098 0.1733 0.2049 34 hCV2261819 42220763 0.1740 0.2063

27

slide-28
SLIDE 28

Multiple Testing

  • A large a number of tests are performed

with no strong a priori hypothesis

  • There is no a priori hypothesis which allele
  • There is no a priori hypothesis about the

direction of the effect: increase or decrease in risk

28

slide-29
SLIDE 29

Multiple Testing

Test1 Test2

  • k ok

0.95*0.95=0.90 wrong

  • k
  • k

wrong wrong wrong

1-0.90=0.10 instead of 0.05

If you do 34 tests the probability of at least 1 false + Is 1- 0.95 34 = 1 => adjust p-value 0.05/34 = 1.4*10-3 If you test with p = 0.05/2, the probability of at least 1 false + Is 1- 0.975 2 = 0.95 (Bonferroni correction)

29

slide-30
SLIDE 30

# Marker Position Frequency minor allele Cases Controls 1 hCV2536908 40526680 0.2371 0.2099 2 hCV341577 40538554 0.4454 0.4146 p<0.02 NOT SIGNIFICANT 3 hCV9254243 40571807 0.3683 0.3736 4 hCV2032862 40598477 0.233 0.2813 p<0.01 NOT SIGNIFICANT 5 hCV2032865 40603713 0.4187 0.4999 6 hCV2554844 40717672 0.4968 0.4938 7 hCV2541205 40828104 0.4708 0.4389 8 hCV2265271 41070456 0.1755 0.2126 9 hCV2544843 41235818 0.4495 0.3671 10 hCV2257689 41241147 0.459 0.3671 11 hCV2544830 41256855 0.446 0.4631 12 hCV2257669 41301901 0.1837 0.2049 13 hCV7450857 41340226 0.1887 0.235 14 hCV3202946 41350591 0.1347 0.1357 15 hCV3202949 41352389 0.4547 0.4368 16 hCV1016016 41375573 0.383 0.3536 17 hCV3202956 41381748 0.1863 0.2346 18 hCV7563692 41407682 0.1808 0.2135 19 hCV3202960 41424176 0.1695 0.1446 20 hCV2042903 41424329 0.2682 0.3078 21 hCV11936104 41439239 0.1734 0.2376 22 hCV2560317 41461242 0.4803 0.4819 23 hCV2264293 41465690 0.194 0.2194 24 hCV2560314 41472690 0.4335 0.4405 25 hCV11936132 41497167 0.1745 0.2074 26 hCV15858203 41511550 0.1912 0.2188 27 hCV7563831 41551932 0.1776 0.2084 28 hCV2560260 41560151 0.1659 0.1250 29 hCV338624 41604276 0.1565 0.1125 30 hCV2598655 41615467 0.1543 0.1325 31 hCV2554114 42150418 0.2083 0.2013 32 hCV2261778 42164185 0.1703 0.2013 33 hCV2261785 42184098 0.1733 0.2049 34 hCV2261819 42220763 0 1740 0 2063

30

slide-31
SLIDE 31

Gene Discovery

  • Basic principles
  • Candidate gene studies
  • Genome screening
  • Genome sequencing
  • Genetic architecture disease
slide-32
SLIDE 32

Human Genome

  • 3 billion base pairs
  • Average size gene: 30,000 base pairs
  • Genes make up <10% DNA
slide-33
SLIDE 33

Region that is identical by descent (IBD) including the disease locus Chromosomes from “apparently unrelated” individuals with a certain trait Founder chromosome with disease associated mutation

7 5 2 3 6 1 4 1 4 9 6 3 6 4 1 5 5 1 3 3 5 3 7 2 1 4 3 3 6 5 3 8 3 6 7 3 6 1 5 7 8 8 4 3 3 6 4 2 6 2 1 3 6 8 5 3

slide-34
SLIDE 34

Marker Allele 1 13 2 A 3 4 4 9 5 10 6 I 7 3

Genome screen

Select markers covering the full genome and test these in patients and controls or families

Unknown disease mutation,e.g. change in base pair

slide-35
SLIDE 35

Marker Allele 1 13 2 A 3 4 4 9 5 10 6 I 7 3 Unknown disease mutation

How many markers do you need?

Marker 3 or 4 should flag the block of DNA

slide-36
SLIDE 36

General EU population

  • About 500 000 block are found in

Caucasians

  • If you select 1 marker per block you

need about 500,000 markers

  • This yields a threshold for

significance of 0.05/500,000 = 5*10-8

slide-37
SLIDE 37

General African population

  • LD block are smaller
  • Are there more, less or equal number
  • f block compared to EU population?
  • Do you need more or less markers in

Africans?

slide-38
SLIDE 38

2 1.3 1.5 1.2

Wang et al., Nat RevGen, 2005

How many patients do you need?

38

Allele frequency and odds ratio determine the number of patients and controls needed Common variants are easier to find with association than rare ones

slide-39
SLIDE 39
slide-40
SLIDE 40

Genome wide association analyses (GWAs) of LDL cholesterol: p-plot

slide-41
SLIDE 41

Gene Discovery

  • Basic principles
  • Candidate gene studies
  • Genome screening
  • Genome sequencing
  • Genetic architecture disease
slide-42
SLIDE 42

Genome wide sequencing

  • Exome: target the area’s of the genome

where proteins are coded

  • Whole genome: searches through the

full genome to find a disease associated variants

  • Major advantage – finding the functionally

relevant variants implicated in disease

slide-43
SLIDE 43

Genetic code

AGGAGTCCAAAGCGCGCAGTGCGCAGCGCGCA CCAGTCGTGACTCCAAAGCGATTCGATAGCAAC CCGATCCTATGAGGGCGCAGGAGTCCAAAGCGC GCAGTGCGCGAGAGGAGTCGGAGTCCGGCAATT GCCCAATGCCGATCGAACGACGTAACCGACTTA GGCCAGAGAGCTAGCGATCCGACTCTAAGAGCA GCTAAAGACTCCAAAGCGATTCGATAGCAACCC GCCGATCGAAGGAGTCCAAAGTCGGAGTCCGGC AACAGTCGTTGCCCAATGCCGGCGATTCGAATC GAACGACGTAACGGCAACAGTCGTGACTTGCCC AATGCCCGACCAGTCGTGACACTCCAAAGTGCC CAATGCCGATCCGATTCGATAGCACCAATGCCGA TCCAAACGAACGACGTCCAAAAACCGACTT

slide-44
SLIDE 44

Genetic code

AGGAGTCCAAAGCGCGCAGTGCGCAGCGCGCA CCAGTCGTGACTCCAAAGCGATTCGATAGCAAC CCGATCCTATGAGGGCGCAGGAGTCCAAAGCGC GCAGTGCGCGAGAGGAGTCGGAGTCCGGCAATT GCCCAATGCCGATCGAACGACGTAACCGACTTA GGCCAGAGAGCTAGCGATCCGACTCTAAGAGCA GCTAAAGACTCCAAAGCGATTCGATAGCAACCC GCCGATCGAAGGAGGCCAAAGTCGGAGTCCGG CAACAGTCGTTGCCCAATGCCGGCGATTCGAATC GAACGACGTAACGGCAACAGTCGTGACTTGCCC AATGCCCGACCAGTCGTGACACTCCAAAGTGCC CAATGCCGATCCGATTCGATAGCACCAATGCCGA TCCAAACGAACGACGTCCAAAAACCGACTT

slide-45
SLIDE 45

G A T C C G A C

Association we make jumps over the genome, counting on the power of LD

Genetic association

slide-46
SLIDE 46

Problem: rare variants are common !

AGGAGTCCAAAGCGCGCAGTGCGCAGCGCGCA CCAGTCGTGACTCCAAAGCGATTCGATAGCAAC CCGATCCTATGAGGGCGCAGGAGTCCAAAGCGC GCAGTGCGCGAGAGGAGTCGGAGTCCGGCAATT GCCCAATGCCGATCGAACGACGTAACCGACTTA GGCCAGAGAGCTAGCGATCCGACTCTAAGAGCA GCTAAAGACTCCAAAGCGATTCGATAGCAACCC GCCGATCGAAGGAGGCCAAAGTCGGAGTCCGG CAACAGTCGTTGCCCAATGCCGGCGATTCGAATC GAACGACGTAACGGCAACAGTCGTGACTTGCCC AATGCCCGACCAGTCGTGACACTCCAAAGTGCC CAATGCCGATCCGATTCGATAGCACCAATGCCGA TCCAAACGAACGACGTCCAAAAACCGACTT

  • Rare variants are often new mutations=>

persons (or rather families) have a private mutation, which are mixed with others

  • ver generations
  • Due to the rarity, the sample size needed

to find a mutations of even larger than that

  • f common variants
slide-47
SLIDE 47

Sequencing

Next generation technology and statistics to find rare variants in population and family studies Many beeps are heard

slide-48
SLIDE 48

Analyses of sequence data: often gene based combing variants

Unknown disease mutations,e.g. change in base pair Gene

  • Count the number of patients

with a damaging rare variant (+, -, o)

  • Compare the frequency of

carriers to that in controls

+

  • Coding areas (genes) are

easiest to study – the candidate gene approach is popular!

slide-49
SLIDE 49

A A A A A A A A A A A A A A A

Family studies help rare variant research Rare variants in population are not rare in a family

slide-50
SLIDE 50

Gene Discovery

  • Basic principles
  • Candidate gene studies
  • Genome screening
  • Genome sequencing
  • Genetic architecture disease
slide-51
SLIDE 51

Monogenic disorders Complex diseases e.g. type 2 diabetes

Sequencing GWAS

slide-52
SLIDE 52

Environment Protein Gene Complex disease Protein Gene Protein Gene Protein Gene Protein Gene Protein Gene Protein Gene Protein Gene Complexity of complex disease: many genes

slide-53
SLIDE 53

Why did we catch only common with relatively small effects?

1) Entity of a complex disease 2) Common variant (50% carrier) cannot have large relative risk (<2) 3) Hundreds of variant should have relative risks <<<<<<<2

slide-54
SLIDE 54
slide-55
SLIDE 55

Cohort Sex Genes BMI Rotterdam Study Female 4.0 ? Rotterdam Study Male 3.5 ? Monozygotic twins Female 3.2 ? Northern Finnish cohort Female 4.8 ? Northern Finnish cohort Male 3.6 ?

Percentage of variance in total cholesterol explained by 16 genes

slide-56
SLIDE 56

Cohort Sex Genes BMI Rotterdam Study Female 4.0 1.4 Rotterdam Study Male 3.5 0.5 Monozygotic twins Female 3.2 0.2 Northern Finnish cohort Female 4.8 2.5 Northern Finnish cohort Male 3.6 4.1

Percentage of variance in total cholesterol explained by 16 genes

slide-57
SLIDE 57

Genome wide association: work in action

Up-scaling GWAS will be successful: gene discovery and impact of Journal!

slide-58
SLIDE 58

Is genome wide association coming to an end?

slide-59
SLIDE 59

Take home message

  • Genetic association is a powerful approach

to discover new genetic variant implicated in complex disease

  • Direct sequencing is less powerful approach

allows to identify causal varients

  • Both genetic association and sequencing

will occur side by side the coming years

slide-60
SLIDE 60

Questions?