4 applications in computational biology
play

4. Applications in Computational Biology Karsten Borgwardt - PowerPoint PPT Presentation

4. Applications in Computational Biology Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 231 / 253 4.1 Cotraining for Phenotype Prediction based on: Damian Roqueiro, Menno Witteveen, Verneri Anttila,


  1. 4. Applications in Computational Biology Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 231 / 253

  2. 4.1 Cotraining for Phenotype Prediction based on: Damian Roqueiro, Menno Witteveen, Verneri Anttila, Gisela Terwindt, Arn van den Maagdenberg, Karsten Borgwardt. In silico phenotyping via co-training for improved phenotype prediction from genotype. ISMB 2015, Bioinformatics (2015) 31 (12): i303-i310. Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 232 / 253

  3. Goal Construction of a genotype classifier Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

  4. Goal Construction of a genotype classifier Important implications for disease diagnosis and therapy Yet, h relies on training dataset with labeled examples Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

  5. Goal Construction of a genotype classifier Important implications for disease diagnosis and therapy Yet, h relies on training dataset with labeled examples Increasingly larger availability of genotype data Not sufficient disease phenotypes for genotype samples Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

  6. Goal Construction of a genotype classifier Important implications for disease diagnosis and therapy Yet, h relies on training dataset with labeled examples Increasingly larger availability of genotype data Not sufficient disease phenotypes for genotype samples Can we boost the performance of a classifier when few labeled examples are available? → Use co-training Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

  7. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  8. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  9. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  10. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  11. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  12. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  13. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  14. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  15. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  16. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  17. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  18. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  19. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Two requirements x 1 and x 2 should be conditionally independent of each other given y X 1 or X 2 are sufficient to train h 1 or h 2 to classify data points in D Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  20. Proposed approach Apply co-training to migraine dataset Dutch cohorts, 1,938 patients Two disease phenotypes: migraine with aura (820) migraine without aura (1,118) Data available for each patient: disease phenotype (aura vs. no aura) clinical covariates (e.g. pulsating quality?) genotype data: single nucleotide polymorphisms (SNPs) Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 235 / 253

  21. Proposed approach Apply co-training to migraine dataset Dutch cohorts, 1,938 patients Two disease phenotypes: migraine with aura (820) migraine without aura (1,118) Data available for each patient: disease phenotype (aura vs. no aura) clinical covariates (e.g. pulsating quality?) genotype data: single nucleotide polymorphisms (SNPs) Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 235 / 253

  22. Assumption: implicit price-tag of data Disease phenotype (diagnosis) Clinical covariates (results of tests) Genotype data (DNA sequencing) Source: http://www.flaticon.com/authors/freepik Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 236 / 253

  23. Decaying cost of sequencing/genotyping Source: National Human Genome Research Institute http://www.genome.gov/ Cost of genotyping (array): ∼ $110 per sample HumanOmniExpress-24 BeadChips 713,014 markers Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 237 / 253

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend