INTRODUCING MACHINE LEARNING FOR HEALTHCARE RESEARCH Dr Stephen - PowerPoint PPT Presentation

INTRODUCING MACHINE LEARNING FOR HEALTHCARE RESEARCH Dr Stephen Weng NIHR Research Fellow (School for Primary Care Research) Primary Care Stratified Medicine (PRISM) Division of Primary Care School of Medicine University of Nottingham

What is Machine Learning? Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computation methods to “learn” information directly from data without relying on a predetermined equation to model. The algorithms adaptively improve their performance as the number of data samples available for learning increases.

Considerations: When Should We Use  Complex task or problem Machine Learning?  Large amount of data  Lots of variables  No existing formula or equation  Limited prior knowledge The nature of input and quantity Hand-written rules and Rules of the task are dynamic – of data keeps changing – hospital equations are too complex – financial transactions admissions, health care records images, speech, linguistics

 Supervised learning, which trains a model on known inputs and output data to predict future outputs How Machine Learning  Unsupervised learning, which finds hidden patterns or Works intrinsic structures in the input data  Semi-supervised learning, which uses a mixture of both techniques; some learning uses supervised data, some learning uses unsupervised learning Unsupervised Learning Clustering Group and interpret data based only on input data Machine Learning Classification Supervised learning Develop model based on both input and output data Regression

Using supervised learning to Supervised predict cardiovascular disease Learning  Suppose we want to predict whether someone will have a heart attack in the future.  To build a model that makes predictions based  We have data on previous patients on evidence in the presence of uncertainty characteristics, including biometrics, clinical history, lab tests results, co-  Takes a known set of input data and known morbidities, drug prescriptions responses to the data (output)  Importantly, your data requires “the truth”,  Trains a model to generate reasonable whether or not the patient did in fact have predictions for the response to new data a heart attack.  Classification: predict discrete responses – for instance, whether an email is genuine or spam, or whether a tumour is cancerous or not  Regression: predict continuous response – for example, change in body mass index, cholesterol levels

Predicting cardiovascular disease using electronic health records  681 UK General Practices  383,592 patients free from CVD registered 1 st of January 2005 followed up for years  Two-fold cross validation (similar to other epidemiological studies): n = 295,267 “training set”; n = 82,989 “validation set”  30 separate included features including biometrics, clinical history, lifestyle, test results, prescribing  Four types of models: logistic, random forest, gradient boosting machines, and neural networks Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N (2017) Can machine-learning improve cardiovascular risk prediction using routine clinical data?. PLOS ONE 12(4): e0174944. https://doi.org/10.1371/journal.pone.0174944

Predicting cardiovascular disease Machine Learning Algorithms ML: Gradient ML: Logistic ML: Random ML: Neural using electronic health records Boosting Regression Forest Networks Machines Ethnicity Age Age Atrial Fibrillation Age Gender Gender Ethnicity SES: Townsend Ethnicity Ethnicity Oral Deprivation Corticosteroid Index Prescribed Gender Smoking Smoking Age Smoking HDL cholesterol HDL cholesterol Severe Mental Illness Atrial Fibrillation HbA1c Triglycerides SES: Townsend Deprivation Index Chronic Kidney Triglycerides Total Cholesterol Chronic Kidney Disease Disease Rheumatoid SES: Townsend HbA1c BMI missing Arthritis Deprivation Index Family history of BMI Systolic Blood Smoking premature CHD Pressure COPD Total SES: Townsend Gender Cholesterol Deprivation Index Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N (2017) Can machine-learning improve cardiovascular risk prediction using routine clinical data?. PLOS ONE 12(4): e0174944. https://doi.org/10.1371/journal.pone.0174944

Predicting cardiovascular disease using electronic health records Green indicates positive weight Red indicates negative weight I1-I20 input variables, O1 outcome variable, H1-H3 hidden layers Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N (2017) Can machine-learning improve cardiovascular risk prediction using routine clinical data?. PLOS ONE 12(4): e0174944. https://doi.org/10.1371/journal.pone.0174944

Unsupervised Learning  To find hidden patterns or intrinsic structures in the data  Primarily used to draw inferences from datasets consisting of input data without labelled responses  Exploratory data analysis to find hidden patterns or groupings in the data  Clustering is the most common unsupervised learning technique  Genomic sequence analysis  Market research  Objective recognition  Feature selection

Improving phenotyping of heart failure patients to improve therapeutic stratifies  172 patients hospitalised with acute decompensation heart failure from the ESCAPE trial  Performed cluster analysis (hierarchical clustering) to determine similar patient groups based on combined measures characteristics  Researchers conducing analysis had no knowledge of clinical outcomes for patients  14 candidate variables, including demographics, biometrics, cardiac biomarkers Ahmad T, Desai N, Wilson F, Schulte P, Dunning A, et al. (2016) Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles. PLOS ONE 11(2): e0145881. https://doi.org/10.1371/journal.pone.0145881

Improving phenotyping of heart failure patients to improve therapeutic stratifies Ahmad T, Desai N, Wilson F, Schulte P, Dunning A, et al. (2016) Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles. PLOS ONE 11(2): e0145881. https://doi.org/10.1371/journal.pone.0145881

Improving phenotyping of heart failure patients to improve therapeutic stratifies  Cluster 1: male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest BNP levels  Cluster 2: females with non-ischemic cardiomyopathy, few co-morbidities, most favourable hemodynamics, advanced disease  Cluster 3: young African American males with non- • Cluster 2 least adverse outcomes, Cluster 4 ischemic cardiomyopathy, most adverse worst outcomes hemodynamics, advanced disease •  Cluster 4: older Caucasians with ischemic Cluster 1-3 had 45-70% lower risk of all- cause mortality cardiomyopathy, concomitant renal insufficiency, highest BNP levels Ahmad T, Desai N, Wilson F, Schulte P, Dunning A, et al. (2016) Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles. PLOS ONE 11(2): e0145881. https://doi.org/10.1371/journal.pone.0145881

Selecting an algorithm – some examples How do you decide which Machine Learning algorithm to use? Supervised Unsupervised Learning Learning Choosing the right algorithm can seem overwhelming – there are about a dozen supervised and unsupervised learning algorithms, Classification Regression Clustering each taking a different approach. K-Means, K- Considerations: Linear regression, Support vector Medoids, Fuzzy C- machines GLM Means  There is no best method or one size fits all Support vector Discriminant Hierarchical regressor analysis  Trial and error Naive Bayes Ensemble methods Gaussian mixture  Size and type of data Neural networks Decision Trees Nearest neighbour  The research question and purpose (SOM) Hidden Markov  How will the outputs be used? Neural networks Logistic regression models

Supervised Learning Supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains a model to generate reasonable predictions for the response to new input data. Use supervised learning if you have existing data for the output you are trying to predict Using larger training datasets yield models that generalise better for new data

Common classification algorithms Logistic regression k Nearest Neighbour (kNN) How it works How it works • • Fits a model that can predict the probability of a binary Categorises objects based on the classes of their response belonging to one class or the other nearest neighbours in the dataset • Simple – commonly used a starting point for binary • Assume that objects near each other are similar • classification problems Distance metrics used to determine nearness (e.g. Euclidean) Best used… • Best used… When data can be clearly separated by a single, linear • boundary When you need a simple algorithm to establish • Baseline for evaluating more complex classification benchmark learning rules • methods When memory usage and prediction speed is a lesser concern

INTRODUCING MACHINE LEARNING FOR HEALTHCARE RESEARCH Dr Stephen - PowerPoint PPT Presentation

INTRODUCING MACHINE LEARNING FOR HEALTHCARE RESEARCH Dr Stephen Weng NIHR Research Fellow (School for Primary Care Research) Primary Care Stratified Medicine (PRISM) Division of Primary Care School of Medicine University of Nottingham What

Introducing more people Introducing more people Introducing more people Introducing more people

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

RSA Half Year Results Presentation 4th August 2016 1 RSA Stephen Hester, Chief Executive

AEVIS Presentation: September 2012 Dear Reader, We are pleased to introduce you to AEVIS Holding

CEQA 201 - Case Study (With apologies to Dr. Seuss) CALAFCo Workshop April 11, 2019 P. Scott

James Phillips Beckman Institute, University of Illinois http://www.ks.uiuc.edu/Research/namd/

the Eve of the Civil War An Online Professional Development Seminar Peter Coclanis Director of

Accelerating Real Applications Best Practices for Profiling and Debugging Complex Code Beau

Speaker notes for Council Presentation Brief outline of current situation: - Tender round - any

CONNECTED AND AUTONOMOUS VEHICLES MOVING FORWARD ALONG THE EAST COAST May 15, 2018 Webcast