Analytics Machine Learning/Deep Learning Show Cases
RxToDx
A Data Science Machine Learning/Deep Learning Show Case
Effectiveness of Deep Learning Vs. Machine Learning in a Health Care - - PowerPoint PPT Presentation
Effectiveness of Deep Learning Vs. Machine Learning in a Health Care Use Case RxToDx A Data Science Machine Learning/Deep Learning Show Case Dima Rekesh/Julie Zhu/Ravi Rajagopalan Optum Technology November, 2017 Analytics Machine
Analytics Machine Learning/Deep Learning Show Cases
A Data Science Machine Learning/Deep Learning Show Case
2
3
4
5
6
www.slideshare.net/psychiatryjfn/disorders-of-mood1
(for the cohort)
(for the cohort)
7
SME inputs+ ML Features Fpr XGB Model
Old Markers Feature Importance DEPR_327 0.0601 number_prescribers_(10, inf] 0.0315 sum_amt_standard_cost_(621.48, inf] 0.0286 DEPR_31702 0.0258 DEPR_31705 0.0258 number_rx_(9, inf] 0.0200 DEPR_31700 0.0186 DEPR_31500 0.0172 number_rx_(4, 9] 0.0172 number_rx_(3, 4] 0.0157 tot_drug_units_(5651, inf] 0.0143 DEPR_29702 0.0143 number_prescribers_(5, 7] 0.0129 tot_days_supply_(7, 27] 0.0129 number_prescribers_(7, 10] 0.0129 sum_amt_standard_cost_(270.11, 477.88] 0.0114 DEPR_57018 0.0114 tot_days_supply_(1027, 1551] 0.0100 DEPR_321 0.0100 DEPR_606 0.0100
Deep Learning Raw Data
Drug Codes Deep Learing Model takes raw data without SME inputs and Feature Engineering
Old Markers Feature Importance DEPR_320_32005 0.2155 DEPR_322 0.1178 DEPR_2 0.0977 DEPR_32306_18_19_20_24_26 0.0512
SME inputs For Logistic Model
8 8
Class-specific Feature Creation Feature Engineering Predict one Class at a time
Auto Encode Embedding
Predict multiple classes at a time By directly using raw data without Feature Engineering, and by predicting multiple targets at a time, Deep Learning model approach saves > 50% of model development time and resources.
For decades, ML relied on human-engineered features in fields as diverse as image processing (e.g. edge detection), NLP (linguistics, stop words). DL renders feature engineering obsolete. Depression: 1 or 0 Depression 1 Asthma 1 ATDD 1 ….
10
11
Deep Learning ML Model
Deep Learning ML model
* Includes non-depression related claims
12
13
14
Specialist 4 diseases
15
Specialist 4 diseases
16
Specialist 4 diseases
17
Specialist 4 diseases
18
time Short range
Input=16 Feature map = 7
Fewer weights. Observe that in images, objects are ”local”
19
r
time He went to school state has enough information to generate one time or sequential output él fue a la esquela
RNNs are pervasively used for NLP and language to language translation
20
helpful with inputs consisting of different length sequences
time Input 1 Input 2 Input 3 Input 1 Input 2 Input 3 3 x 8 irregular zero padding To neural network
21
helpful with categorical, non-contiguous inputs
time Input=1x4 Each input is a number from 1 to 1,000 779 and 780 are not close (e.g. codes for prescription drugs) One-hot: 1000x4 Embedding dimension = 3 becomes 3x4 To neural network Each input is a vector of 1,000 ones or zeros. Better, but a lot of numbers / lot of memory Each input is a vector of 3 numbers Hopefully ”close” vectors are really “close” Embedding transformation
22
use a fully connected layer, learn together with rest of model
One-hot: 1000x4 Embedding dimension = 3 becomes 3x4 To neural network Each input is a vector of 1,000 ones or zeros. Better, but a lot of numbers / lot of memory Each input is a vector of 3 numbers Hopefully ”close” vectors are really “close” Embedding transformation Fully connected layer
Learn these weights
23
32 11 45
Input: 11 [11,32] [11,45] Outputs Window = 3 Training for co-occurrence
32 11 45
Input: [32,45] time 11 Output Window = 3 Skipgrams CBOW
24
Context DCC
33907 45501 DCC Sequence Input
Hidden Layer Linear Neurons Output Layer Softmax Classifier
DCC = 33907 Probability that DCC code at the nearby location is “45501” (Target DCC) “45502” “45503” “83600”
Word2vec Output
25
26
Zero padded time sequences (up to 256) time
Embedding
concatenate Static vars FC (up to 64) RNN (up to 256) Or 1-D CNN Classifier (1..N classes) Specialist as well as multi- disease networks examined
27
The only server offering NVLink between CPUs and GPUs on Power Architecture
28
Swift for long term reference data sets, inputs and results
nvidia-docker with frameworks enabled docker containers We predominantly used Keras + Theano
Swift for long term reference data sets, inputs and results
GPU 0 GPU 1 GPU 2 GPU 3
Nvidia-docker (base docker image with device pass-through)
TensorFlow Theano Mxnet
docker registry
Cuda drivers – The bare metal machine is loaded with cuda drivers; then one installs docker and then nvidia-docker – At this point, hundreds of open source DL – enabled containers become available for instant download – At Optum, we already have an internal docker registry that we can utilize to store and manage internal images
Torch
GPU N
Command line or web / GUI access..
Jupyter
29
ID DCC Sequence 1 20 220 575 12 700 12 220 575 2 20 220 575 12 700 12 220 575 Label 101 20 220 575 12 700 12 220 575
30
27 10 27 27 30 5 85 35 40 27 30 75 27 27 30 50
27 27 27 30 40 27 30 27 27 30 50
31
32