bag of na ve bayes biomarker selection and classification
play

Bag of Nave Bayes: biomarker selection and classification from - PowerPoint PPT Presentation

Bag of Nave Bayes: biomarker selection and classification from Genome-Wide SNP data Francesco Sambo Context Complex disease, with hypothesized but still not understood genetic origin Genome Wide Association Study (GWAS) O(10 6 ) Single


  1. Bag of Naïve Bayes: biomarker selection and classification from Genome-Wide SNP data Francesco Sambo

  2. Context Complex disease, with hypothesized but still not understood genetic origin Genome Wide Association Study (GWAS) • O(10 6 ) Single Nucleotyde Polymorphisms (SNPs) • O(10 3 ) case / control individuals Objectives: 1.Biomarker Selection 2.Classification

  3. Bag of Naïve Bayes (BoNB) • Both classification and biomarker selection • Based on Naïve Bayes classification • Main features: a) Ensamble of Naïve Bayes Classifiers (NBC), robustness b) Novel strategy for ranking and selecting attributes for each NBC, attribute independence c) Permutation-based procedure for biomarker selection, based on marginal utility.

  4. Bagging (Bootstrap AGGregatING) Bootstrap Ensemble of NBCs GWAS Data D 1 NBC 1 Prediction 1 SNPs subjects D Weighted Prediction D B NBC B Prediction B • B bootstrap replicates, sampled with replacement from D • B Naive Bayes Classifiers, each trained on a D b • Outcome: average of the B predictions

  5. NBC attribute selection (SNPs) Bootstrap Ensemble of NBCs GWAS Data Attribute Selection D 1 NBC 1 Prediction 1 SNPs oob 1 subjects D Weighted Prediction oob B D B NBC B Prediction B • Ranking: training error when SNP is used as single attribute • Selection: top ranked, uncorrelated SNPs ( r 2 < 0.1 if dist < 1 Mb ) • Number of selected attributes increased, as long as classification accuracy increases on the Out-Of-Bag (OOB) sets

  6. Biomarker Selection Bootstrap Ensemble of NBCs GWAS Data Attribute Selection D 1 NBC 1 Prediction 1 SNPs oob 1 subjects D Biomarker Selection Weighted Prediction oob B D B NBC B Prediction B • Random permutation of the genotype of NBC attributes in OOBs • Measure decrease in accuracy on OOBs • Wilcoxon signed-rank test for significance

  7. Results WTCCC case / control study on Type 1 Diabetes • 458376 SNPs, 1963 T1D cases, 2938 controls Biomarker Selection Predictive accuracy Matthews Correlation Coefficient rs ID chr gene rs6679677 1 RSBN1 rs9273363 6 MHC region rs3101942 6 MHC region rs492899 6 MHC region rs6936863 6 MHC region rs805301 6 MHC region rs9275418 6 MHC region rs2856688 6 MHC region

  8. Conclusions • BoNB effective for both classification and biomarker selection • Advantages of bagging:  Higher generalization ability  Sound and principled procedure for biomarker selection • Advantages of Naïve Bayes:  No pre-specified model of genetic effect  Seamless handling of missing values

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend