The Analysis of Biomedical Data - The Analysis of Biomedical Data - - PowerPoint PPT Presentation

The Analysis of Biomedical Data - The Analysis of Biomedical Data - - The Analysis of Biomedical Data Caveats and Challenges Caveats and Challenges Caveats and Challenges Ray L. Somorjai Ray L. Somorjai Ray L. Somorjai Head, Biomedical Informatics Group Head, Biomedical Informatics Group Head, Biomedical Informatics Group Institute for Biodiagnostics Institute for Biodiagnostics Institute for Biodiagnostics National Research Council Canada National Research Council Canada National Research Council Canada Winnipeg, MB Winnipeg, MB Winnipeg, MB Canada Canada Canada

The Prime Caveat: The Prime Caveat: “There Are There Are No Panaceas No Panaceas in Data Analysis in Data Analysis” ” “There Are No Panaceas in Data Analysis” “ P. J. Huber, Annals of Statistics (1985) P. J. Huber, Annals of Statistics (1985) P. J. Huber, Annals of Statistics (1985)

Two Goals of Biomedical Data Classification: Two Goals of Biomedical Data Classification: 1. Develop Robust Classifiers 1. Develop Robust Classifiers - Capable of Reliably Classifying Unknown Patterns - Capable of Reliably Classifying Unknown Patterns 2. Identify Fewest Maximally Discriminatory Features 2. Identify Fewest Maximally Discriminatory Features (genes, proteins, chemical compounds) (genes, proteins, chemical compounds) - Find Biologically Relevant, Interpretable Features - Find Biologically Relevant, Interpretable Features Not All Classifiers Satisfy Both Requirements Not All Classifiers Satisfy Both Requirements

The Two Realities Two Realities of of Biomedical Data Biomedical Data The Two Realities of Biomedical Data The {Microarrays Microarrays (Genomics), (Genomics), Mass Spectra Mass Spectra (Proteomics) (Proteomics) { {Microarrays (Genomics), Mass Spectra (Proteomics) Magnetic Resonance, Raman & Infrared Spectra}: }: Magnetic Resonance, Raman & Infrared Spectra Magnetic Resonance, Raman & Infrared Spectra}: The Clinical Clinical Reality: Reality: The The Clinical Reality: Few Samples, , K = O(10) K = O(10) – – O(100) O(100) Few Samples Few Samples, K = O(10) – O(100) The “ “Acquisitional Acquisitional” ” Reality: Reality: The The “Acquisitional” Reality: Many Features (genes, M/Z values, spectral data points), (genes, M/Z values, spectral data points), Many Features Many Features (genes, M/Z values, spectral data points), N = O(1 000) – – O(10 000) O(10 000) N = O(1 000) N = O(1 000) – O(10 000)

Contrast Contrast Contrast Classical Statistics – – Classical Statistics Classical Statistics – N ∞ The Art of Asymptotics : N ∞ The Art of Asymptotics : N The Art of Asymptotics : with with with Modern “ “Statistics Statistics“ “ – – Modern “Statistics“ – Modern Methods Applicable when N N 0 ? Methods Applicable when Methods Applicable when N 0 ?

Two Realities Two Curses: Two Realities Two Curses: Two Realities Two Curses: The Curse of Dimensionality Dimensionality: : The Curse of The Curse of Dimensionality: Penalty for Too Many Features Too Many Features Penalty for Penalty for Too Many Features The Curse of Dataset Sparsity Dataset Sparsity: : The Curse of Dataset Sparsity: The Curse of Penalty for Too Few Samples Too Few Samples Penalty for Too Few Samples Penalty for

The Curse of Dimensionality Dimensionality The Curse of The Curse of Dimensionality or or or Penalty for Too Many Features: Too Many Features: Penalty for Penalty for Too Many Features: A Robust Classifier Robust Classifier Needs Needs a a A A Robust Classifier Needs a Sample to Feature Ratio (SFR) ≥ ≥ 10 Sample to Feature Ratio (SFR) ≥ 10 10 Sample to Feature Ratio (SFR) For Biomedical Data SFR SFR ~ 1/20 ~ 1/20 – – 1/200 1/200 For Biomedical Data For Biomedical Data SFR ~ 1/20 – 1/200

The Curse of Dataset Sparsity Dataset Sparsity: : The Curse of Dataset Sparsity: The Curse of If Too Few Samples, If Too Few Samples, If Too Few Samples, Trivial to Classify Them Perfectly Trivial to Classify Them Perfectly Trivial to Classify Them Perfectly More Samples, More Realistic Assessment of More Samples, More Realistic Assessment of More Samples, More Realistic Assessment of Intrinsic Class Overlap ( (Bayes Error Bayes Error) ) Intrinsic Class Overlap Intrinsic Class Overlap (Bayes Error)

Consequences of the Curses: Consequences of the Curses: 1. Curse of Dimensionality (SFR low) 1. Curse of Dimensionality (SFR low) - Danger of Overfitting - Danger of Overfitting - Conclusions Are Suspect - Conclusions Are Suspect - No Discriminatory Features Identified - No Discriminatory Features Identified 2. Curse of Dataset Sparsity 2. Curse of Dataset Sparsity Insidious: Insidious: - Practically Anything Seems to Work! - Practically Anything Seems to Work! - Several Equally Good Solutions: - Several Equally Good Solutions: Uniqueness Problematic - Uniqueness Problematic - Classifier Robustness Is Suspect Classifier Robustness Is Suspect

Steps of Classifier Development: Classifier Development: Steps of Classifier Development: 1. Partition Dataset into Training Training & & Validation Validation Sets Sets 1. Partition Dataset into Training & Validation Sets 1. Partition Dataset into 2. Create Optimal Classifier Optimal Classifier Using Using Training Training Set Only Set Only 2. Create 2. Create Optimal Classifier Using Training Set Only - Important to Use Important to Use External External Crossvalidation Crossvalidation - - Important to Use External Crossvalidation 3. Whenever Possible or Feasible, Validate Validate Classifier Classifier 3. Whenever Possible or Feasible, 3. Whenever Possible or Feasible, Validate Classifier with Independent Independent Validation Validation Set, Set, Not Involved with with Independent Validation Set, Not Involved in Not Involved in in Developing Classifier Developing Classifier Developing Classifier

A Classifier Classifier is Claimed is Claimed Robust Robust if if A Classifier is Claimed Robust if A Training and and Validation Validation Set Results Are of Set Results Are of Training Training and Validation Set Results Are of Comparable Accuracy Comparable Accuracy Comparable Accuracy Fallacious when Curses Are “Active”! when Curses Are “Active”! Fallacious Fallacious when Curses Are “Active”!

Developed Developed Developed Statistical Classification Strategy - - SCS SCS Statistical Classification Strategy Statistical Classification Strategy - SCS Divide and Conquer: Divide and Conquer: Divide and Conquer: Four- -Stage, Multivariate, Robust Stage, Multivariate, Robust Four Four-Stage, Multivariate, Robust 1. Visualization of High Visualization of High- -Dimensional Data Dimensional Data 1. Visualization of High-Dimensional Data 1. 2. Preprocessing/Feature Extraction (GA_ORS) (GA_ORS) 2. Preprocessing/Feature Extraction 2. Preprocessing/Feature Extraction (GA_ORS) 3. Robust Classifier (“Bootstrap” Aggregation) (“Bootstrap” Aggregation) 3. Robust Classifier 3. Robust Classifier (“Bootstrap” Aggregation) 4. Classifier Fusion (e.g. Stacked Generalization) (e.g. Stacked Generalization) 4. Classifier Fusion 4. Classifier Fusion (e.g. Stacked Generalization) Very Successful! Very Successful! Very Successful!

Stage 1- - Visualization (later) Visualization (later) Stage 1 Stage 1- Visualization (later) Stage 2- - Preprocessing Preprocessing Stage 2- Preprocessing Stage 2 a) Normalization Normalization (alignment, common area) (alignment, common area) a) a) Normalization (alignment, common area) b) Transformation Transformation (derivatives, rank ordering) (derivatives, rank ordering) b) b) Transformation (derivatives, rank ordering) c) “Feature Space Reduction Feature Space Reduction”: ”: c) “ c) “Feature Space Reduction”: ⇒ Optimal Feature Selector Critical ⇒ Critical ⇒ Optimal Feature Selector Optimal Feature Selector Critical

For Biomedical Spectra For Biomedical Spectra For Biomedical Spectra Optimal Feature Feature Selector Selector Optimal Optimal Feature Selector Optimal Region Region Selector Selector (ORS_GA) (ORS_GA) Optimal Optimal Region Selector (ORS_GA) Characteristics of ORS_GA: Characteristics of ORS_GA: Characteristics of ORS_GA: a) Retains Spectral Identity Retains Spectral Identity a) a) Retains Spectral Identity b) Feature Feature is Some is Some Function of Adjacent Data Points Function of Adjacent Data Points b) b) Feature is Some Function of Adjacent Data Points (e.g. Average or Variance) (e.g. Average or Variance) (e.g. Average or Variance) c) Genetic Algorithm Genetic Algorithm (GA) (GA)- - Driven Driven c) c) Genetic Algorithm (GA)- Driven ⇒ M ⇒ M < K << N Attributes ⇒ M < K << < K << N N Attributes Attributes

Stage 3- - Robust Robust Classifier Development Classifier Development Stage 3 Stage 3- Robust Classifier Development How Do We “ “Robustify Robustify” ”? ? How Do We How Do We “Robustify”? 1. Already Completed: Already Completed: Feature Selection Feature Selection [ORS] to [ORS] to 1. 1. Already Completed: Feature Selection [ORS] to Satisfy ( (S Sample / ample / F Feature eature R Ratio) atio) K / N ~ 5 K / N ~ 5 - - 10 10 Satisfy Satisfy (Sample / Feature Ratio) K / N ~ 5 - 10 2. “ “Bootstrap Bootstrap- -Inspired Classifier Aggregation Inspired Classifier Aggregation”: ”: 2. 2. “Bootstrap-Inspired Classifier Aggregation”:

The Analysis of Biomedical Data - The Analysis of Biomedical Data - - PowerPoint PPT Presentation

The Analysis of Biomedical Data - The Analysis of Biomedical Data - - The Analysis of Biomedical Data Caveats and Challenges Caveats and Challenges Caveats and Challenges Ray L. Somorjai Ray L. Somorjai Ray L. Somorjai Head, Biomedical

Image Data Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python Biomedical

Biomedical Data I Kelly Ruggles, PhD Methods in Quantitative Biology Biomedical Data Types Next

Spatial Transformation Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python

Objects and Labels Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python

Manchester Biomedical Research Centre Professor Ian Bruce, BRC Director Manchester Biomedical

Objectives Understand the breadth of biomedical informatics Know the biomedical

Intensity Values Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python Pixels

Biomedical Sciences Biomedical Sciences Year 1 Year 2 Year 3 MT HT TT MT HT TT MT HT

ME Biomedical Engineering Prof. Madeleine Lowery UCD School of Electrical and Electronic

WELCOME ALL OUR TEAM. EMILY KWONG Biomedical Engineering DMITRY MALYSHEV Biomedical

Biomedical applications based on Biomedical applications based on magnetic nanoparticles

Biomedical applications of magnetic Biomedical applications of magnetic nanoparticles:

Sparse stochastic processes and biomedical image reconstruction Michael Unser Biomedical Imaging

On the integration of On the integration of biomedical knowledge bases: biomedical knowledge

BIOMEDICAL INFORMATICS AND CLINICAL DATA ANALYSIS DIVISION OF PSYCHIATRY PRODUCTS Douglas

Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A.

SYSU-CHINA@iGEM PRESENT iPSCs SafeGuard F ang Yiming H e Dawei Z hao Yuchen C hen Haoqi S un

Turning customer satisfaction into value Maarten Edixhoven CEO Aegon the Netherlands THE

Childhood Acute Leukemia Chromosomal Translocations Chr. 12 Chr. 21 der(12) der(21)

Legal and Compliance Department Responsibilities Insurance Companies and staff Insurance

YES & YES! YES & YES! David Grimwade Dept. of Medical & Molecular Genetics,

Anhydrous Ammonia Measurements Roland Sirois 1 CONFIDENTIAL Measurement Options Technology

OCI Partners LP Corporate Presentation September 2016 Safe Harbor Provision Unless the context

Q4 2019 Results Presentation 25 February 2020 Disclaimer This presentation

The Analysis of Biomedical Data - The Analysis of Biomedical Data - - PowerPoint PPT Presentation

The Analysis of Biomedical Data - The Analysis of Biomedical Data - - The Analysis of Biomedical Data Caveats and Challenges Caveats and Challenges Caveats and Challenges Ray L. Somorjai Ray L. Somorjai Ray L. Somorjai Head, Biomedical

Image Data Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python Biomedical

Biomedical Data I Kelly Ruggles, PhD Methods in Quantitative Biology Biomedical Data Types Next

Spatial Transformation Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python

Objects and Labels Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python

Manchester Biomedical Research Centre Professor Ian Bruce, BRC Director Manchester Biomedical

Objectives Understand the breadth of biomedical informatics Know the biomedical

Intensity Values Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python Pixels

Biomedical Sciences Biomedical Sciences Year 1 Year 2 Year 3 MT HT TT MT HT TT MT HT

ME Biomedical Engineering Prof. Madeleine Lowery UCD School of Electrical and Electronic

WELCOME ALL OUR TEAM. EMILY KWONG Biomedical Engineering DMITRY MALYSHEV Biomedical

Biomedical applications based on Biomedical applications based on magnetic nanoparticles

Biomedical applications of magnetic Biomedical applications of magnetic nanoparticles:

Sparse stochastic processes and biomedical image reconstruction Michael Unser Biomedical Imaging

On the integration of On the integration of biomedical knowledge bases: biomedical knowledge

BIOMEDICAL INFORMATICS AND CLINICAL DATA ANALYSIS DIVISION OF PSYCHIATRY PRODUCTS Douglas

Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A.

SYSU-CHINA@iGEM PRESENT iPSCs SafeGuard F ang Yiming H e Dawei Z hao Yuchen C hen Haoqi S un

Turning customer satisfaction into value Maarten Edixhoven CEO Aegon the Netherlands THE

Childhood Acute Leukemia Chromosomal Translocations Chr. 12 Chr. 21 der(12) der(21)

Legal and Compliance Department Responsibilities Insurance Companies and staff Insurance

YES &amp; YES! YES &amp; YES! David Grimwade Dept. of Medical &amp; Molecular Genetics,

Anhydrous Ammonia Measurements Roland Sirois 1 CONFIDENTIAL Measurement Options Technology

OCI Partners LP Corporate Presentation September 2016 Safe Harbor Provision Unless the context

Q4 2019 Results Presentation 25 February 2020 Disclaimer This presentation

YES & YES! YES & YES! David Grimwade Dept. of Medical & Molecular Genetics,