Effectiveness of Deep Learning Vs. Machine Learning in a Health Care - PowerPoint PPT Presentation

Effectiveness of Deep Learning Vs. Machine Learning in a Health Care Use Case RxToDx A Data Science Machine Learning/Deep Learning Show Case Dima Rekesh/Julie Zhu/Ravi Rajagopalan Optum Technology November, 2017 Analytics Machine Learning/Deep Learning Show Cases

Objective: What we have learned from a Deep Learning Health care use case? Evaluate Deep Learning on a well-known, production Machine Learning problem using an typical time series data set. Seek best practices from an industry leader (Nvidia) Impute and predict the likelihood of an individual having a medical condition using members’ previous two years prescription pharmacy claims data. 2

Deep Learning How is it different? • Multiple layers in neural network with intermediate data representations to facilitate dimensional reduction. • Interpret non-linear relationships in the data. • Derive patterns from data with very high dimensionality. Why do we care? • Ability to create value with little or no domain knowledge required. • Ability to incorporate data from across multiple, seemingly unrelated sources. • Ability to tolerate very noisy data. 3

What have we learned? Doesn’t need SME’s inputs. Eliminate manual feature engineering Predict multiple targets at a time. Higher performance with Higher volume data Capable to automate the model development 4

Results: summary and take-aways • DL proved to be more accurate than conventional ML methods • Neural nets required no manual feature engineering, pointing to a reduction in person-hours required to create and maintain them • A deep neural network was capable of predicting at least four different diseases more accurately than conventional ML models (conventional ML models can predict only one disease at a time). This points to drastic reduction in costs. • Modern GPUs are required: It takes ~24 hours to train on the full data set of 4.5 M records on the latest (Nvidia P100) GPU 5 5

Depression Impact -- $126M Rx cost per year Total annual claims at Optum - $ 965 Million (for the cohort) Depression Related claims - $ 126 Million (for the cohort) www.slideshare.net/psychiatryjfn/disorders-of-mood1 6

Deep Learning doesn’t rely on SME’s inputs SME inputs+ ML Features Deep Learning Raw Data SME inputs For Logistic Model Fpr XGB Model Old Markers Feature Importance Old Markers Feature Importance DEPR_327 0.0601 DEPR_320_32005 0.2155 number_prescribers_(10, inf] 0.0315 sum_amt_standard_cost_(621.48, inf] 0.0286 DEPR_322 0.1178 DEPR_31702 0.0258 DEPR_2 0.0977 DEPR_31705 0.0258 DEPR_32306_18_19_20_24_26 0.0512 number_rx_(9, inf] 0.0200 DEPR_31700 0.0186 DEPR_31500 0.0172 number_rx_(4, 9] 0.0172 number_rx_(3, 4] 0.0157 Drug Codes tot_drug_units_(5651, inf] 0.0143 DEPR_29702 0.0143 number_prescribers_(5, 7] 0.0129 tot_days_supply_(7, 27] 0.0129 number_prescribers_(7, 10] 0.0129 sum_amt_standard_cost_(270.11, 477.88] 0.0114 DEPR_57018 0.0114 tot_days_supply_(1027, 1551] 0.0100 DEPR_321 0.0100 DEPR_606 0.0100 Deep Learing Model takes raw data without SME inputs and Feature Engineering 7

Machine Learning Model Process Flow Chat 8 8

The revolution: Machine learn vs. Deep Learning For decades, ML relied on human-engineered features in fields as diverse as image processing (e.g. edge detection), NLP (linguistics, stop words). DL renders feature engineering obsolete. Machine Class-specific Feature Depression: Predict one Feature Learning 1 or 0 Engineering Class at a time Creation Deep Auto Depression 1 Feature Predict multiple Learning Asthma 1 Encode Creation classes at a time ATDD 1 Embedding Model … . By directly using raw data without Feature Engineering, and by predicting multiple targets at a time, Deep Learning model approach saves > 50% of model development time and resources.

Higher performance with Higher volume of data LSTM with 4.5 million records. RNN with 0.5 million records XGboost Model 10

Deep Learning vs ML model(XGBoost) – Cost Analysis (Annual Cost) Count Comparison Deep ML Model • Deep Learning identifies additional Learning 22K patients that are not identified 259K 22K 3.2K by ML model(XGBoost) Cost Comparison Deep ML model Learning • Deep Learning identifies additional USD 56 Million claims 56M 9M 806M that are not identified by ML model(XGBoost) * Includes non-depression related claims 11

Automated ML/DL platform in AI System Machine Learning/Deep Learning Robot • Autonomous, instant learning • Hyper parameter tuning • Feature Engineering/model free • Transfer sparse and highly-dimension data • Data Driven Results • Multiple Targets at a time. • Able to explain results – How & Why 1 12 2

Multi-disease predictions 13

Hypertension – 4.5M records Specialist 4 diseases 14

ATDD – 4.5M records Specialist 4 diseases 15

Depression – 4.5M records Specialist 4 diseases 16

Asthma – 4.5M records Specialist 4 diseases 17

1D Convolutional Networks: simple, fast, local Fewer weights. Observe that in images, objects are ”local” • Kernel size: 4 • Stride: 2 Feature f 1 f 2 f 3 f 4 f 5 f 6 f 7 map = 7 Input=16 time Short range 18 18

RNNs: n inputs, m outputs RNNs are pervasively used for NLP and language to language translation él fue a la esquela r r r r r r r r r state has enough He went to school information to generate one time or sequential output time 19 19

Zero paddings helpful with inputs consisting of different length sequences To neural network Input 3 3 x 8 Input 2 Input 1 zero padding Input 3 Input 2 irregular Input 1 time 20 20

Embeddings helpful with categorical, non-contiguous inputs To neural network Embedding dimension = 3 Each input is a vector of 3 numbers Hopefully ”close” vectors are really “close” becomes 3x4 Embedding transformation One-hot: Each input is a vector of 1,000 ones or zeros. Better, but a lot of 1000x4 numbers / lot of memory Each input is a number from 1 to 1,000 Input=1x4 779 and 780 are not close (e.g. codes for prescription drugs) time 21 21

Keras Learned Embeddings use a fully connected layer, learn together with rest of model To neural network Embedding dimension = 3 Each input is a vector of 3 numbers Hopefully ”close” vectors are really “close” becomes 3x4 Embedding transformation Fully connected layer Learn these weights One-hot: Each input is a vector of 1,000 ones or zeros. Better, but a lot of 1000x4 numbers / lot of memory 22 22

Word2vec embeddings: unsupervised CBOW: Continuous Bag of Words: predict the word given its context Skip-grams: predict the context (including far away words) given a word Skipgrams Window = 3 Training for co-occurrence [11,32] 11 45 Input: 11 32 Outputs [11,45] CBOW Window = 3 Input: 11 11 45 32 Output [32,45] 23 23 time

Word2vec Embeddings (CBOW) Output Layer Softmax Classifier Context Hidden Layer Probability that DCC code ∑ DCC Linear Neurons at the nearby location is “45501” (Target DCC) 0 ∑ ∑ 0 “45502” w ij 0 DCC Sequence Input w ij 0 ∑ ∑ 33907 45501 “45503” 0 . DCC = 33907 1 w ij . 0 . ∑ . 0 ∑ “83600” Word2vec Output 24 24

Embeddings: Word2Vec + LSTM • Approach 1: • Build Word2Vec model on drug sequences using gensim • Replace the drug codes with their respective vectors • Use the vectorized inputs for the LSTM model • Approach 2: • Build Word2Vec model on drug sequences using gensim • Initialize the weights of Keras Embedding layer using Word2vec output • Run Embedding + LSTM model in Keras • Observations: • Approach 1 though gave promising results wasn’t scalable – Memory constraints kicked in during Vectorization • Approach 2 gave good enough results – not enough to beat a model of pure Keras Embedding + LSTM 25 25

Rx2dx project: evaluated Network architectures Specialist as well as multi- disease networks examined Classifier (1..N classes) FC (up to 64) concatenate RNN (up to 256) Or 1-D CNN Embedding Zero padded time sequences (up to 256) Static vars time 26 26

Hardware: IBM Minsky: 4x GPU server The only server offering NVLink between CPUs and GPUs on Power Architecture • 20 cores Power 8 3.25 GHz (x8 HT) • 1024 GB RAM • 2 x 2.5” 1 TB SSDs • Mellanox QDR Infiniband This architecture will make a difference on mixed workloads with a lot of CPU to GPU communication (real time batch generation) 27 27

The Software stack nvidia-docker with frameworks enabled docker containers We predominantly used Keras + Theano Command line or web / GUI access.. Jupyter TensorFlow Theano Mxnet Torch Swift for long term reference data sets, inputs and results Swift for long term reference data sets, docker registry Nvidia-docker (base docker image with device pass-through) inputs and results Cuda drivers GPU 0 GPU 1 GPU 2 GPU 3 GPU N – The bare metal machine is loaded with cuda drivers; then one installs docker and then nvidia-docker – At this point, hundreds of open source DL – enabled containers become available for instant download – At Optum, we already have an internal docker registry that we can utilize to store and manage internal images 28 28

Effectiveness of Deep Learning Vs. Machine Learning in a Health Care - PowerPoint PPT Presentation

Effectiveness of Deep Learning Vs. Machine Learning in a Health Care Use Case RxToDx A Data Science Machine Learning/Deep Learning Show Case Dima Rekesh/Julie Zhu/Ravi Rajagopalan Optum Technology November, 2017 Analytics Machine

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

CSC Effectiveness Review CSC Effectiveness Review Team October 2018 ICANN63 Need for Review of

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

OUTAGE MANAGEMENT PROCESS REDESIGN July 27, 2016 Agenda Progress Update Overview of

Low-power motion detection and building control using Computer Vision By: Kwabena Agyeman

Communication Networks and Computer Vision Based Control Nicholas Tovar Nicholas Tovar Ventura

Computer vision technologies for visual knowledge enrichment Miriam Redi, Research Scientist

5a A&P: Introduction to the Human Body - Cells 5a A&P: Introduction to the Human Body -

Membranes, Membranes Diffusion Membranes are an arrangement of phospholipids that gather

zyxwvutsrqponmlkihgfedcbaWVUTSRQPONMLKIHGFEDCBA Characterization, Characterization, Modeling,

Does stuntingoverweightness change our understanding of the socioeconomic gradient in child

Effectiveness of Deep Learning Vs. Machine Learning in a Health Care - PowerPoint PPT Presentation

Effectiveness of Deep Learning Vs. Machine Learning in a Health Care Use Case RxToDx A Data Science Machine Learning/Deep Learning Show Case Dima Rekesh/Julie Zhu/Ravi Rajagopalan Optum Technology November, 2017 Analytics Machine

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

CSC Effectiveness Review CSC Effectiveness Review Team October 2018 ICANN63 Need for Review of

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

OUTAGE MANAGEMENT PROCESS REDESIGN July 27, 2016 Agenda Progress Update Overview of

Low-power motion detection and building control using Computer Vision By: Kwabena Agyeman

Communication Networks and Computer Vision Based Control Nicholas Tovar Nicholas Tovar Ventura

Computer vision technologies for visual knowledge enrichment Miriam Redi, Research Scientist

5a A&amp;P: Introduction to the Human Body - Cells 5a A&amp;P: Introduction to the Human Body -

Membranes, Membranes Diffusion Membranes are an arrangement of phospholipids that gather

zyxwvutsrqponmlkihgfedcbaWVUTSRQPONMLKIHGFEDCBA Characterization, Characterization, Modeling,

Does stuntingoverweightness change our understanding of the socioeconomic gradient in child

5a A&P: Introduction to the Human Body - Cells 5a A&P: Introduction to the Human Body -