multiple comparisons methods in genetic e id epidemiology
play

Multiple Comparisons Methods in Genetic E id Epidemiology Studies - PowerPoint PPT Presentation

Multiple Comparisons Methods in Genetic E id Epidemiology Studies i l St di Yi Ren Wang, MPH Department of Epidemiology UCLA School of Public Health G Genetic Epidemiology Today ti E id i l T d Genetic association studies have


  1. Multiple Comparisons Methods in Genetic E id Epidemiology Studies i l St di Yi Ren Wang, MPH Department of Epidemiology UCLA School of Public Health

  2. G Genetic Epidemiology Today ti E id i l T d • Genetic association studies have become more ambitious: more ambitious: � Early studies focused on one or a few candidate SNPs � Recent studies target many SNPs and haplotypes using high throughput platforms

  3. G Genome-wide Association Study id A i ti St d � Large number of genetic variations involved • 1 test for 500 000 SNPs • 1 test for 500,000 SNPs • 25,000 expected to be significant at p<0.05, by chance alone � To make things worse � To make things worse • Dominance (additive/dominant/recessive) • Epistasis (multiple combinations of • Epistasis (multiple combinations of SNPs) • Multiple phenotype definitions • Subgroup analyses • Subgroup analyses • Multiple analytic methods

  4. Motivating Example DNA-DSBR Pathway and Lung & DNA DSBR Pathway and Lung & UADT Cancer Study

  5. G Goal of the study l f th t d � This study intends to cover the genetic variations on the whole DNA-DSBR variations on the whole DNA-DSBR pathway, in order to systematically reveal a full picture of how genetic polymorphisms in f ll i t f h ti l hi i double-strand break pathway alters risks of lung cancer and UADT cancer � The potential gene-gene and gene- The potential gene gene and gene environment interactions will be explored

  6. St d D Study Design i � Population-based case-control study in Los Angeles Angeles � 611 new cases of lung cancer � 601 new cases of UADT cancer � 1040 cancer free controls matched to cases � 1040 cancer-free controls matched to cases by age (within 10 years category) and gender d

  7. G Gene Selection S l ti � 19 genes involved in the DNA-DSBR pathway were selected for evaluation based pathway were selected for evaluation based on evidence for their role in either the h homologous recombination repair (HR) or l bi ti i (HR) the non-homologous end joining (NHEJ) pathways.

  8. SNP S l SNPs Selection ti � Known functional SNPs within the DNA double stranded break repair pathway were double stranded break repair pathway were selected � As well as potential functional SNPs such as amino-acid-changing (nonsynonymous) g g ( y y ) SNPs (nsSNPs) � With a minor allele frequency (MAF) greater � With a minor allele frequency (MAF) greater than 5%

  9. SNP S l SNPs Selection ti � 189 SNPs analyzed are in or near one of 19 189 SNPs analyzed are in or near one of 19 DNA-DSBR genes.

  10. St d D Study Design i � SAS 9.1 software will be used for data analysis. � ORs and 95% CLs will be computed using p g unconditional logistic regression � Potential confounding factors adjusted: age, g j g gender, ethnicity, educational level and tobacco smoking for lung cancer; age, gender, ethnicity, educational level tobacco smoking alcohol educational level, tobacco smoking, alcohol drinking and diet for UADT cancer � χ 2 test is performed to evaluate Hardy � χ 2 test is performed to evaluate Hardy- Weinberg equilibrium.

  11. St Stratified Analyses tifi d A l L Lung Cancer: C � Non-small cell lung carcinoma (NSCLC) g ( ) � Small cell lung carcinoma (SCLC) Head and Neck Cancer: � Oral cancer � Oral cancer � Pharyngeal cancer � Laryngeal cancer � Esophageal cancer � Esophageal cancer

  12. Stratified and Multivariate Analyses � Interaction between DSBR and smoking for lung cancer lung cancer � Interaction between DSBR and smoking for UADT cancer � Interaction between DSBR and alcohol Interaction between DSBR and alcohol drinking for UADT cancer � Haplotype analysis H l t l i

  13. What are the Genetic Epidemiology Issues? � Population stratification • Variation of SNP frequency by ethnicity • Genomic control parameter will be calculated to assess the validity of the results � High dimensional data Hi h di i l d t • Gene-environment interactions � Interaction of host genetics with environment � Interaction of host genetics with environment • Gene-gene interactions � Interaction of different SNPs � Multiple comparisons

  14. Multiple comparisons issue

  15. Hypothesis Testing Hypothesis Testing � H0 : Null hypotheis vs. H1 : Alternative Hypothesis Hypothesis � T : test statistics C : critical value T : test statistics C : critical value � If |T|>C, H0 is rejected. Otherwise H0 is retained | | , j � Ex ) H0 : μ 1 = μ 2 vs. H1 : μ 1 ≠ μ 2 T = ( x 1 - x 2 ) / pooled μ 2 vs. H1 : μ 1 ≠ μ 2 T ( x 1 x 2 ) / pooled Ex ) H0 : μ 1 se If |T| > z (1- α /2) , H0 is rejected at the significance | | (1 α /2) , j g level α � C α

  16. Hypothesis Testing Hypothesis Testing Hypothesis Result Hypothesis Result Retained Rejected Truth H0 Type I error T th H0 T I H1 Type II error � Type I error rate = false positives ( α : significance level ) level ) � Type II error rate = false negatives � Power : 1 Type II error rate � Power : 1–Type II error rate P-values : p=inf{ α | H0 is rejected at the significance level α } •

  17. Issues in Multiple Comparison Issues in Multiple Comparison � Q : Given n treatments, which two treatments are Q G e t eat e ts, c t o t eat e ts a e significantly different ? (simultaneous testing) cf) Is treatment A different from treatment B ? ) � Ex ) m treatment means : μ 1 ,…, μ n H j : μ i = μ j where i ≠ j μ i μ j j T j = ( x i - x j ) / pooled ( j ) p j j i SE • Type I error when testing each at 0.05 significance level one by one : 1 – (0.95) n Inflated Type I error, ex) α =1 – (0.95) 10 = 0.401263 • • Remedies : Bonferroni Method Type I error rate = α / # of comparison

  18. M lti l Multiple Comparisons C i � Probability of finding a false association by chance = 1 - 0 95 n chance = 1 - 0.95 • n = 10, p = 40% • n = 100, p = 99.4% � Our data: Our data: • 189 genotypes, 2 cancer sites, 10 Subgroup analyses analyses • N = 2268, p = 99.99999%

  19. Type I Error Rates Type I Error Rates Hypothesis Result Hypothesis Result #retained #rejected Total Truth Truth H0 U V m0 H0 U V m0 H1 T S m1 Total m-R R m T t l R R � Per-comparison error rate ( PCER ) = E(V) / m p ( ) ( ) � Per-family error rate ( PFER ) = E(V) � Family-wise error rate = pr ( V ≥ 1 ) y p ( ) � False discovery rate ( FDR ) = E(Q), Q V/R , if R > 0 0, if R = 0 ,

  20. F l False Positives P iti In the absence of bias, three factors determine the probability that a statistically determine the probability that a statistically significant finding is actually a false-positive fi di finding � the magnitude of the P value g � statistical power � fraction of tested hypotheses that is true f ti f t t d h th th t i t

  21. M lti l Multiple Comparisons C i � There is a lack of consensus regarding the optimal approach to address the false- optimal approach to address the false- positive probability of single nucleotide polymorphism (SNP) associations. l hi (SNP) i ti

  22. Methods for Multiple p Comparisons � Ignore it � Adjust p-values � Adjust p-values • Familywise Error Rate (FWER) � Chance of any false positives Ch f f l iti • False discovery rate (FDR) Benjamini et al 2001 � Use Bayesian methods • False positive report probability (FPRP) Wacholder et al False positive report probability (FPRP) Wacholder et al 2004

  23. FWER FWER controlling procedures t lli d � Bonferonni • adj Pvalue = min(n*Pvalue 1) • adj Pvalue = min(n Pvalue,1) � Holm (1979) � Hochberg (1986) � Westfall & Young (1993) maxT and minP � Westfall & Young (1993) maxT and minP

  24. B Bonferroni correction f i ti � For testing 500,000 SNPs • 5,000 expected to be significant at p<0.01 5,000 e pected to be s g ca t at p 0 0 • 500 expected to be significant at p<0.001 • …… • 0.05 expected to be significant at p<0.0000001 � Suggests setting significance level to α = 10 7* � Suggests setting significance level to α = 10-7* � Bonferroni correction for m tests set significance level for p-values to α = 0.05 / m t i ifi l l f l t 0 05 /

  25. Multiple Testing Procedures based on P values Multiple Testing Procedures based on P-values that control the family-wise error rate � For a single hypothesis H 1 , p 1 =inf{ α | H 1 is rejected at the significance level α } If p 1 < α , H 1 is rejected. Otherwise H 1 is retained � Adjusted p-values for multiple testing (p*) p j *=inf{ α | H 1 is rejected at FWER= α } j If p j * < α , H j is rejected. Otherwise H j is retained � Single-Step, Step-Down and Step-Up procedure

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend