4. Applications in Computational Biology Karsten Borgwardt - PowerPoint PPT Presentation

4. Applications in Computational Biology Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 231 / 253

4.1 Cotraining for Phenotype Prediction based on: Damian Roqueiro, Menno Witteveen, Verneri Anttila, Gisela Terwindt, Arn van den Maagdenberg, Karsten Borgwardt. In silico phenotyping via co-training for improved phenotype prediction from genotype. ISMB 2015, Bioinformatics (2015) 31 (12): i303-i310. Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 232 / 253

Goal Construction of a genotype classifier Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

Goal Construction of a genotype classifier Important implications for disease diagnosis and therapy Yet, h relies on training dataset with labeled examples Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

Goal Construction of a genotype classifier Important implications for disease diagnosis and therapy Yet, h relies on training dataset with labeled examples Increasingly larger availability of genotype data Not sufficient disease phenotypes for genotype samples Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

Goal Construction of a genotype classifier Important implications for disease diagnosis and therapy Yet, h relies on training dataset with labeled examples Increasingly larger availability of genotype data Not sufficient disease phenotypes for genotype samples Can we boost the performance of a classifier when few labeled examples are available? → Use co-training Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Two requirements x 1 and x 2 should be conditionally independent of each other given y X 1 or X 2 are sufficient to train h 1 or h 2 to classify data points in D Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

Proposed approach Apply co-training to migraine dataset Dutch cohorts, 1,938 patients Two disease phenotypes: migraine with aura (820) migraine without aura (1,118) Data available for each patient: disease phenotype (aura vs. no aura) clinical covariates (e.g. pulsating quality?) genotype data: single nucleotide polymorphisms (SNPs) Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 235 / 253

Assumption: implicit price-tag of data Disease phenotype (diagnosis) Clinical covariates (results of tests) Genotype data (DNA sequencing) Source: http://www.flaticon.com/authors/freepik Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 236 / 253

Decaying cost of sequencing/genotyping Source: National Human Genome Research Institute http://www.genome.gov/ Cost of genotyping (array): ∼ $110 per sample HumanOmniExpress-24 BeadChips 713,014 markers Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 237 / 253

4. Applications in Computational Biology Karsten Borgwardt - PowerPoint PPT Presentation

4. Applications in Computational Biology Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 231 / 253 4.1 Cotraining for Phenotype Prediction based on: Damian Roqueiro, Menno Witteveen, Verneri Anttila,

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Curation of computational biology models Curation of computational biology models Anand

Computational and Mathematical Biology Computational and Mathematical Biology in the Genomics

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Modern Computational Statistics Lecture 20: Applications in Computational Biology Cheng Zhang

Deciphering the Face Deciphering the Face Aleix M. Martinez Computational Biology Computational

Computational Challenges in Computational Challenges in Genomics and Molecular Biology Genomics

Synthetic Biology Considerations in Synthetic Biology Considerations in Synthetic Biology

Biology Majors Information Session Biology Advising Center NHB 2.606 Biology Advising Center

Principles of Conservation Biology Biology 462 Brook Milligan Department of Biology New Mexico

Using BlenX for Systems Biology Corrado Priami CoSBi Outline of the talk 1. Systems biology 2.

GPU Accelerated Virtual Cell Biology and SIMD Enhanced High Throughput Computational Biology

ADA Case Law Update JO Y C E W A L K ER - JO N ES SEN IO R A T T O R N EY A D V ISO R O FFIC E

TIME PREFERENCES, HEALTH BEHAVIORS, AND ENERGY CONSUMPTION David Bradford, University of Georgia

Ethical issues in international Ethical issues in international collaborative research

The Biology of Amphibians Agnes Scott College Mark Mandica Executive Director The Amphibian

UCSF Memory and Aging Center 2016 Best Practices in Mild Cognitive Impairment & Dementia

ANT01 in the Prevention of Alzheimer EXECUTIVE SUMMARY IN PRECLINCAL STAGE OF ALZHEIMER DISEASE

Workshop on the Role of Speech in Developing Robust Speech Processing Applications May 7-8, 2015

Machine Learning Algorithms for Neuroimaging-based Clinical Trials in Preclinical Alzheimers

4. Applications in Computational Biology Karsten Borgwardt - PowerPoint PPT Presentation

4. Applications in Computational Biology Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 231 / 253 4.1 Cotraining for Phenotype Prediction based on: Damian Roqueiro, Menno Witteveen, Verneri Anttila,

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Curation of computational biology models Curation of computational biology models Anand

Computational and Mathematical Biology Computational and Mathematical Biology in the Genomics

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Modern Computational Statistics Lecture 20: Applications in Computational Biology Cheng Zhang

Deciphering the Face Deciphering the Face Aleix M. Martinez Computational Biology Computational

Computational Challenges in Computational Challenges in Genomics and Molecular Biology Genomics

Synthetic Biology Considerations in Synthetic Biology Considerations in Synthetic Biology

Biology Majors Information Session Biology Advising Center NHB 2.606 Biology Advising Center

Principles of Conservation Biology Biology 462 Brook Milligan Department of Biology New Mexico

Using BlenX for Systems Biology Corrado Priami CoSBi Outline of the talk 1. Systems biology 2.

GPU Accelerated Virtual Cell Biology and SIMD Enhanced High Throughput Computational Biology

ADA Case Law Update JO Y C E W A L K ER - JO N ES SEN IO R A T T O R N EY A D V ISO R O FFIC E

TIME PREFERENCES, HEALTH BEHAVIORS, AND ENERGY CONSUMPTION David Bradford, University of Georgia

Ethical issues in international Ethical issues in international collaborative research

The Biology of Amphibians Agnes Scott College Mark Mandica Executive Director The Amphibian

UCSF Memory and Aging Center 2016 Best Practices in Mild Cognitive Impairment &amp; Dementia

ANT01 in the Prevention of Alzheimer EXECUTIVE SUMMARY IN PRECLINCAL STAGE OF ALZHEIMER DISEASE

Workshop on the Role of Speech in Developing Robust Speech Processing Applications May 7-8, 2015

Machine Learning Algorithms for Neuroimaging-based Clinical Trials in Preclinical Alzheimers

UCSF Memory and Aging Center 2016 Best Practices in Mild Cognitive Impairment & Dementia