EHR-Based Phenotyping: Bulk Learning and Evalua;on (with Infec;ous - PowerPoint PPT Presentation

EHR-Based Phenotyping: Bulk Learning and Evalua;on (with Infec;ous Diseases) Po-Hsiang (Barnett) Chiu

Phenotypes and phenotyping Physically observable traits of genotypes (and their interac;ons with environments) Biochemical or physiological proper;es, behavior, and products of behavior AFribu;ons of diseases (e.g. suscep;bility) Diseases (and disease subtypes)

Data-Driven Phenotyping Data-driven phenotyping • – Two main methodologies • Rule-based approach (e.g. eMerge, hFps://emerge.mc.vanderbilt.edu) • Predic3ve Analy3cs – Data sources: • EHRs/EMRs: Medicinal treatments, diagnoses, lab measurements, etc. • Genomic data: SNP arrays, copy number varia;on (CNVs), etc. – Phenotypes • Diseases, subtypes, or variables aFributed to disease predic;ons

Diagnos;c Concept Units • Various diseases sharing the same set of diagnos;c concept units • Infec;ous diseases – Lab tests • Microorganism, blood, urine, body ;ssues, stool – Medica;ons • An;bio;c, an;virus, anthelmin;c • Build sta;s;cal models for each diagnos;c component and combine them appropriately – Ensemble learning

Bulk Learning in a Nutshell … Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set) as a substrate for model learning and evaluation wherein (a given) medical ontology is used to perform feature selection and model stacking is used to construct abstract feature representation of low sample complexity in order to reduce training requirements. Key Concepts: 1. Build phenotyping models on top of mul;ple diseases 2. Automa;c feature selec;on using an exis;ng ontology 3. Models are combined via model stacking (a form of ensemble learning) 4. Abstract features Dimensionality reduc;on 5. Less labeled data required for model evalua;ons

Phenotyping via Bulk Learning • Under model stacking, we then arrive at the no;on of “concept-driven phenotyping” – A subset or combina;ons of lab tests are more aFributable to some diseases while the others are beFer explained by medica;ons • In this study, infec;ous diseases associated with 100 ICD-9 codes as the domain of study for bulk learning – For simplicity, consider different diagnos;c codes as different diseases … – Why 100 codes? – Code selec;on strategy?

Bulk Learning Basics I • Addresses two central issues in predic;ve analy;cal approach to computa;onal phenotyping – Feature engineering • Medical ontology for feature decomposi;on • Medical En;;es Dict (hFp://med.dmi.columbia.edu) – Data annota;on • Ensemble learning (e.g. stacked generaliza;on [Wolpert 1992]) • Feature abstrac;on for dimensionality reduc;on

Medical Ontology for Grouping Features Snapshot of Medical En;;es Dic;onary • (hFp://med.dmi.columbia.edu)

Model Stacking • Why inspec;ng mul;ple (infec;ous) diseases? – Using mul3ple diseases as substrate and iden;fy their common elements – Example stacking architecture (under stacked generaliza;on method) Attributes: Level-1 Probabilities and ICD-9 Target: True Labels (Gold Standard) Level 2 Attributes: Level-0 Probabilities and Indicators Target: Diagnostic Codes (Silver Standard) Level 1 Urinary Chemistry Measure Microbiology Measure Intravenous Chemistry Measure Antibiotic Measure Other Phenotypic Measures (e.g. Antiviral) Level 0

Surrogate Labels vs True Labels • Model stacking is used to achieve: – Improve upon base model performances – Transform EHR data to a denser form • Uses diagnos;c codes (e.g. ICD-9) as surrogate labels to establish “approximate predic;ve models.” • Why surrogate labels (e.g. ICD-9)? – Features extracted from EHR can be large – Used to derive compact representa;on of the training data – “Free” supervised signals that are sufficiently close but can be obtained without extra work • Objec;ve: Build sta;s;cal models in abstract feature space – Create a sparse annota;on set (i.e. gold standard) that serves a proxy dataset for downstream model evalua;ons – 83 annotated cases

raw (1) (1) (1) (1) m1 a1 b1 u1 logistic units features f11 f12 m1 Σ f1j f21 a1 (i-1) Σ f2j (i) (i+1) f31 b1 Σ f3j f41 u1 Σ m 1g a 1g b 1g u 1g Four Example Base Models urine Σ test microbiology global2 blood antibiotic test

Performance Evalua;ons • How well does the model predict ICD-9s (using a separate test data)? • How well does the model predict annotated data (assoc. with “true labels”)? – (Binarized) ICD-9 becomes a candidate feature among abstract features (e.g. probability scores, indicators) • Annotated sample consists of randomly selected cases in which errors of ICD-9 coding are corrected • Data annota;ons and coding procedures are two independent processes

Base Level Performances

127.4 Enterobiasis 009.1 Gastroenteri;s ... 117.9 Mycoses 047.8 (Other) viral meningi;s 053.9 Herpez zoster

Other Components • Semi-supervised learning and virtual annota;on set • The 3 rd ;er in model stacking hierarchy – Trade-off between learned abstract features and the ICD-9 codes as surrogate labels. – Performance evalua;on on predic;ng annotated labels • Ontology-based feature engineering • Proper design of treatment and control (training) data

Modeling Perspec;ve • EHR data consist of observa;ons and latent variables – Observa;ons can be directly answered via simple queries • Did the pa;ent have tests on E. Coli? • Did the pa;ent take Cekriaxon? • Latent variables represent quan;;es that cannot be directly observed in EHR or computed via simple queries – Does the pa;ent have an infec;on? – Diagnos;c ques;ons: specifically which infec;ons do the pa;ent have? • Learn classifiers to predict latent variables (with only access to observa;ons)

Medical Perspec;ve • Seemingly different infec;ous diseases may share similar sets of lab tests and medica;ons – Staph. aureus • Skin infec;ons, pneumonia, blood poisoning – Cekriaxone • Meningi;s • Infec;ons at different sites of the body (e.g. bloodstream, lungs, urinary tracts) • Mul;ple classifiers for the same disease – 4 classifiers per ICD-9 code, each of which is binary classifier • 400 classifiers at base level

Data Distribu;on Perspec;ve “Can we build a joint model applicable to all diseases?”

Abstract Feature Representa;on: Design Choices Related work in construc;ng high-level features • – PCA, unsupervised feature learning, manifold learning, etc. Design choices • – Data characteris;cs – Interpretability Deep Neural Network • – Linear combina;on – Non-linear transforma;on (e.g. sigmoid, rec;fier, etc.) Feature set: con;nuous, dense, and “homogeneous” • – Image pixels – Times series of lab measurements – word2vec EHR data however are very different • – sparse and incomplete – consist of many different types (binary, categorical, con;nuous, etc.) – Features associated with mul;ple concepts

Moving Forward … • Summary – Bulk learning is a framework with at least the following system choices • The bulk learning set (of target condi;ons) => base models • Classifica;on algorithms (guideline: probabilis;c classifiers + well-calibrated) • Stacking architecture (mul;ple ;ers => levels of abstrac;ons) • Strategy for combining individual (local) disease models to a global model – Advantage: Can use a small annotated sample for model construc;on and evalua;on within the abstract feature space (e.g. level-1 data) • 83 clinical cases were labeled in this study – Challenge: The model involving the interac;on between abstract features and ICD-9 do not generalize well into the region of the data where the ICD-9 coding was incorrect (1) (1) (1) (1) m1 a1 b1 u1 • Mul;ple types of surrogate labels m1 Σ Ongoing and future work • (i) m1 a1 (i-1) Complex decision boundary? Σ (i) a1 (i) (i) local2 (i) Σ b1 (i+1) Other surrogate labels b1 (i) u1 Σ (i-1) (i) (i+1) Semi-supervised learning u1 Σ m 1g a 1g b 1g u 1g Ac3ve learning Σ global2

Reference [1] D.H. Wolpert, Stacked generaliza;on, Neural Networks. 5 (1992) 241–259. [2] K.M. Ting, I.H. WiFen, Issues in stacked generaliza;on, J. Ar;f. Intell. Res. 10 (1999) 271–289. [3] J. Jin Chen, C. Cheng Wang, R. Runsheng Wang, Using Stacked Generaliza;on to Combine SVMs in Magnitude and Shape Feature Spaces for Classifica;on of Hyperspectral Data, IEEE Trans. Geosci. Remote Sens. 47 (2009) 2193-2205. [4] David Baorto, James Cimino, et al. Available: hFp://med.dmi.columbia.edu. Access date: Oct 20, 2016. [5] T.A. Lasko, J.C. Denny, M.A. Levy, Computa;onal Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data, PLoS One. 8 (2013) e66341.

T H A N K f11 f12 Y O U m1 Σ f1j f21 a1 Σ f2j f31 b1 Σ f3j f41 u1 Σ

Level 0 Level 1 raw logistic units features f11 f12 m1 Microbiology Σ f1j m1 f21 a1 a1 An;bio;c Σ f2j b1 Σ f31 u1 b1 Blood test Σ f3j f41 u1 Urine test Σ

Example Features

EHR-Based Phenotyping: Bulk Learning and Evalua;on (with Infec;ous - PowerPoint PPT Presentation

EHR-Based Phenotyping: Bulk Learning and Evalua;on (with Infec;ous Diseases) Po-Hsiang (Barnett) Chiu Phenotypes and phenotyping Physically observable traits of genotypes (and their interac;ons with environments) Biochemical or physiological

eHR Sharable Data Vicky Fung Senior Health Informatician eHR Information Standards Office eHR

Evalua'ng Your Medical Educa'on UME Evalua'on Office Susan Claxon Evalua-on Specialist Gretchen

turchi@<k.eu Slides from the presenta&on by MaDeo Negri and myself MT Evalua&on,

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Cup Concept with Profits Bulk Merchandising Solutions.Bulk Merchandising Solutions.Bulk

TSC Presentation EHR System Function and Information Model (EHR-S FIM) Release 3.0 Preparation (

MedStar Ambulatory Care EHR Dawn Richmond IS Ambulatory EHR Manager < Date> TOPICS

Medicaid EHR Provider Incentive Payment Program January 2011 Overview Basics of the

Workflow Plus Bulk Request Actions Tool for Synergy Enterprise What is This Tool ? Allows

Remote sensing, phenotyping and wheat improvement Presented By MD. ALI BABAR World Food Crops

Pediatric Reactor Slide 1 Nussbaum Phenotyping. It is unclear to me what is meant by

Phenotyping & Breeding MARS Dissemination event Madrid, 29 October 2015 Breeding for sharka

Low cost computer vision implementations for plant phenotyping/identification problems Pablo M.

Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD,

If it is in the EHR it must be true Using EHR data for research Keith Marsolo, PhD Jareen

Briefing on eHR Content By Veronica HUNG Health Informatician, eHRISO Domains Domains eHR

RSI in Low and Middle Income Countries Alberto Mendoza, MSCE, PHD Coordinator of Transport

Deformation analysis of the seismic and post-seismic Cheliff (Algeria) geodetic network using 2D

Managing Respiratory Symptoms of COVID-19 at End of Life Primer for Front Line Health Care in

HR CRISIS MANAGEMENT FOR CORONAVIRUS: ANSWERS TO TOUGH QUESTIONS! INTRODUCTIONS James M.

Outline Electronic Health Electronic Health definition & basics Examples Mobile

Autism Spectrum Dis isorder Series: An Overview and Introduction to the Series Sylvia J.

The New DSM 5 & Robert L. Hendren, DO, is currently a Autism member of Advisory Boards

EOHHS Listening Session Home and Community Based Services (HCBS) Waiver Renewal Application:

EHR-Based Phenotyping: Bulk Learning and Evalua;on (with Infec;ous - PowerPoint PPT Presentation

EHR-Based Phenotyping: Bulk Learning and Evalua;on (with Infec;ous Diseases) Po-Hsiang (Barnett) Chiu Phenotypes and phenotyping Physically observable traits of genotypes (and their interac;ons with environments) Biochemical or physiological

eHR Sharable Data Vicky Fung Senior Health Informatician eHR Information Standards Office eHR

Evalua'ng Your Medical Educa'on UME Evalua'on Office Susan Claxon Evalua-on Specialist Gretchen

turchi@&lt;k.eu Slides from the presenta&amp;on by MaDeo Negri and myself MT Evalua&amp;on,

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Cup Concept with Profits Bulk Merchandising Solutions.Bulk Merchandising Solutions.Bulk

TSC Presentation EHR System Function and Information Model (EHR-S FIM) Release 3.0 Preparation (

MedStar Ambulatory Care EHR Dawn Richmond IS Ambulatory EHR Manager &lt; Date&gt; TOPICS

Medicaid EHR Provider Incentive Payment Program January 2011 Overview Basics of the

Workflow Plus Bulk Request Actions Tool for Synergy Enterprise What is This Tool ? Allows

Remote sensing, phenotyping and wheat improvement Presented By MD. ALI BABAR World Food Crops

Pediatric Reactor Slide 1 Nussbaum Phenotyping. It is unclear to me what is meant by

Phenotyping &amp; Breeding MARS Dissemination event Madrid, 29 October 2015 Breeding for sharka

Low cost computer vision implementations for plant phenotyping/identification problems Pablo M.

Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD,

If it is in the EHR it must be true Using EHR data for research Keith Marsolo, PhD Jareen

Briefing on eHR Content By Veronica HUNG Health Informatician, eHRISO Domains Domains eHR

RSI in Low and Middle Income Countries Alberto Mendoza, MSCE, PHD Coordinator of Transport

Deformation analysis of the seismic and post-seismic Cheliff (Algeria) geodetic network using 2D

Managing Respiratory Symptoms of COVID-19 at End of Life Primer for Front Line Health Care in

HR CRISIS MANAGEMENT FOR CORONAVIRUS: ANSWERS TO TOUGH QUESTIONS! INTRODUCTIONS James M.

Outline Electronic Health Electronic Health definition &amp; basics Examples Mobile

Autism Spectrum Dis isorder Series: An Overview and Introduction to the Series Sylvia J.

The New DSM 5 &amp; Robert L. Hendren, DO, is currently a Autism member of Advisory Boards

EOHHS Listening Session Home and Community Based Services (HCBS) Waiver Renewal Application:

turchi@<k.eu Slides from the presenta&on by MaDeo Negri and myself MT Evalua&on,

MedStar Ambulatory Care EHR Dawn Richmond IS Ambulatory EHR Manager < Date> TOPICS

Phenotyping & Breeding MARS Dissemination event Madrid, 29 October 2015 Breeding for sharka

Outline Electronic Health Electronic Health definition & basics Examples Mobile

The New DSM 5 & Robert L. Hendren, DO, is currently a Autism member of Advisory Boards