1
play

1 Sampling Sampling Question to you: which samples do you select - PowerPoint PPT Presentation

Overview Sampling Effect estimation Hardy-Weinberg Equilibrium Linkage Disequilibrium Statistics and analytical issues: Haplotypes SNP and haplotype Linda Broer (l.broer@erasmusmc.nl) Genetic Laboratory Department of


  1. Overview � Sampling � Effect estimation � Hardy-Weinberg Equilibrium � Linkage Disequilibrium Statistics and analytical issues: � Haplotypes SNP and haplotype Linda Broer (l.broer@erasmusmc.nl) Genetic Laboratory Department of Internal Medicine Erasmus MC, Rotterdam Overview Sampling � Sampling � Effect estimation � Hardy-Weinberg Equilibrium � Linkage Disequilibrium � Haplotypes Research question: how much of the variance in my trait is explained by genetic differences between individuals? 1

  2. Sampling Sampling Question to you: which samples do you select for your Important to take samples representative of source study? population! Sampling Sampling in practice: the Rotterdam Study � What about case/control studies? � Objective: to study health and disease in an elderly population � Osteoporosis � Breast Cancer � We want an area in Rotterdam where individuals are mostly of European ancestry and are relatively old. � Coronary Heart Disease � Etc. � Look up in public registries information about districts in Rotterdam. � Your selected cases must represent cases from source population � Ommoord � < 10% not of European ancestry � Your selected controls must represent the source population � > 60% are 45+ years old � Who to sample? 2

  3. Sampling in practice: the Rotterdam Study Sampling in practice: the Rotterdam Study Sampling in practice: one more example Overview � You want to study the development of children, starting before birth � Sampling � Effect estimation � Where would you look for your study population? � Hardy-Weinberg Equilibrium � Linkage Disequilibrium � Gynaecologist � Haplotypes � Whom would you include? � All pregnant women entering the clinic between set time points (e.g., 1 year) 3

  4. Effect estimation in case/control studies Effect estimation in case/control studies Cases Controls Total Cases Controls Total a + b a + b Allele + a b Allele + a b c + d c + d Allele - c d Allele - c d a + c b + d a + b + c + d a + c b + d a + b + c + d � � ⁄ �∗� � Odds ratio (OR) = � = � � �∗� � Odds of Allele + in cases = a/c � No association: OR = 1 � Odds of Allele + in controls = b/d � Association: OR ≠ 1 � � ⁄ �∗� � Thus Odds ratio (OR) = � = � OR > 1: increased risk � � �∗� � OR < 1: decreased risk (protective) Effect estimation in case/control studies Effect estimation in case/control studies Cases Controls Total � Example: 49 224 273 Allele + � 122 cases, 1428 controls � 73 cases did not have the risk allele, while 49 did have the risk Allele - 73 1204 1277 allele 122 1428 1550 � 1204 controls did not have the risk allele, while 224 did �� �� ⁄ ��∗���� � Odds ratio (OR) = ���∗�� = 3.6 = ��� ���� ⁄ � Let’s fill in the formula � Carriers of this variant have an increased risk for the disease 4

  5. Effect estimation in continuous traits Effect estimation and genetic model 0.82 0.80 Mean BMD g/cm 2 AA 0.78 AG GG 0.76 � Additive model: every additional allele increases your trait � Most often tested in genetic studies 0.74 � Recessive model: only if you have two risk alleles you have increased SNP trait � Dominant model: your trait is increases with the same amount for one Compare mean values across genetic groups or two risk alleles Overview Hardy-Weinberg Equilibrium (HWE) � Sampling � In a population, allele and genotype frequencies will remain constant over generations � Effect estimation � Hardy-Weinberg Equilibrium � Assumptions: � Linkage Disequilibrium � Large population � Haplotypes � Random mating � No new mutations � No natural selection � No migration � Adequate genotyping 5

  6. Hardy-Weinberg Equilibrium (HWE) Hardy-Weinberg Equilibrium (HWE) � Controls represent the general population � Variant with 2 alleles (A and B) � Thus must be in HWE! � Frequency of allele A = p � Frequency of allele B = q � What if they are not? � Thus: � + = 1 � Most likely genotyping error occurred in lab. Can’t use the data! � Punnett square of AB x AB crossing A B � Cases can be out of HWE. Why? p 2 A pq B pq q 2 � Usually at or near the disease-causing variant � Rest of genome should be in HWE! Genotype Formula AA p 2 � Results in the formula for HWE � Thus: � � + 2� + � = 1 AB 2pq BB q 2 Hardy-Weinberg Equilibrium (HWE) Hardy-Weinberg Equilibrium (HWE) � � � + 2� + � = 1 : Use to calculate expected numbers � Example Genotype N Genotype Observed Expected (O-E) 2 /E AA (p 2 ) 30 0.55 2 *100 = 30.25 AA 30 0.002 AB (2pq) 50 AB 50 2*0.45*0.55*100=49.5 0.005 BB (q 2 ) 20 BB 20 0.45 2 *100=20.25 0.003 � � � = �� ��� = 0.30 � Chi 2 = 0.01 (1 degree of freedom) � � = 0.30 = 0.55 and = 1 − 0.55 = 0.45 � P-value = 0.92 � Let’s calculate HWE � This population is in HWE! 6

  7. HWE: another example Overview � Population of cats (n=100) � Sampling � 16 white & 84 black � Effect estimation � White is a recessive trait (bb) � Hardy-Weinberg Equilibrium � Black cats are BB and Bb � Linkage Disequilibrium � What are the genotype frequencies in this population of cats? � Haplotypes � White cats: � = 0.16 � Therefore: = 0.16 = 0.4 � Thus: � = 1 − = 1 − 0.4 = 0.6 � Frequency of BB: � � = 0.6 � = 0.36 � Frequency of Bb: 2� = 2 ∗ 0.6 ∗ 0.4 = 0.48 Mendel’s law of independent assortment What is linkage disequilibrium (LD)? � Each pair of alleles segregates independently of the other pairs and all � Mendel got lucky! possible combinations of alleles can occur in the resulting gametes � More simply put: two SNPs (or traits) will inherit independently � LD: co-occurrence of alleles at adjacent loci more frequently than expected by the allele frequencies and recombination rate SNP1 SNP2 SNP3 7

  8. What is the use of knowing about LD? Where does LD come from? Alleles that exist in the population today arose through ancient mutation � Currently ~60 Million SNPs events Before mutation known in human genome A T M1 (SNP1) � Latest array ‘only’ measures 5 Million SNPs After mutation 1 A T D (SNP, DIP, CNV) � Is your disease causing SNP T G mutation M2 (SNP2) one of the 5 Million measured? After mutation 2 A T G T mutation G C Where does LD come from? Recombination? Recombination generates new arrangements of ancestral alleles � Recombination is NOT random Before recombination � Hot spots A T G T � In between hot spots, variants are in LD G C � They are correlated � Often measured as r 2 � R 2 = 1 : two variants provide same information After recombination A T G T G C A C 8

  9. Overview What is a haplotype? � Sampling � combination of alleles at multiple loci that are transmitted together on � Effect estimation the same chromosome � Hardy-Weinberg Equilibrium Ancestor � Linkage Disequilibrium � Haplotypes Present-day Region in LD Genotype or haplotype? Genetic variation is structured in blocks of high LD SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 Maternal chr A C T A C T Genotype Haplotype Paternal chr G A C G A C 9

  10. How to use haplotypes How to use haplotypes G/A G/C T/C G/C A/T A/C Tags: A/T G/A G/C T/C G/C A/C 1 2 3 4 5 6 Tags: 2 3 4 5 1 6 SNP 1 SNP 1 SNP 3 SNP 3 A G G G T T G G A A SNP 6 A G G G T T G G A 2 in total A G G C C C C C C C A G G C C C C C C T A A G C C G G C C 3 in total T A A G C C G G C T A A C C C C C C C Test for association: T A A C C C C C C Test for association: SNP 1 captures 1+2 SNP 1 high r 2 high r 2 high r 2 SNP 3 captures 3+5 SNP 3 “AG” haplotype captures SNP SNP 6 4+6 After Carlson et al. (2004) AJHG 74 :106 How to use haplotypes: imputations So where do these reference haplotypes come from? � Correlation between variants used to ‘guess’ what the genotype of � Large sample collections genotyped/sequenced previously untyped variants is � HapMap � 1000 Genome Project � Local sequencing projects � UK10K � GoNL � … � Haplotype Reference Consortium 10

  11. So where do these reference haplotypes come from? What about haplotypes themselves? � APOE and Alzheimer’s Disease In summary / Take Home Messages Questions � Sampling must be representative of source population � Additive model is primary model for effect estimation � Hardy-Weinberg Equilibrium mostly warns you about genotyping errors � Linkage Disequilibrium is the correlation between adjacent loci � Haplotypes are the combination of alleles on adjacent loci � Haplotypes are used for imputations � Don’t forget that haplotypes themselves can also influence phenotypes 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend