SLIDE 1 Workshop: Machine Learning and Deep Learning
Mark Hoffman PhD Robert Hoyt MD, FACP, FAMIA, ABPM-C Kevin Lyman
@socialnamehere @socialnamehere @socialnamehere @socialnamehere
www.aimed.events/northamerica-2019/
AIMed NORTH AMERICA, CALIFORNIA 11–14 DECEMBER 2019
SLIDE 2 Sp Speaker #1 #1 Mark Hoffma man
- Presentation: The Promise and Perils of Real-World EHR Data
- Title: Chief Research Information Officer, Children’s Mercy
Hospital and Children’s Research Institute, Kansas City MO
- Bio: Dr Hoffman worked for Cerner Corp. for 16 years as Vice
President for Genomics and Research before joining Children’s Mercy Hospital in 2016. He is also faculty at the University of Missouri Kansas City and is the primary investigator on a CDC grant. His goal is to improve capabilities in genomics, public health and big data. He has delivered a TED talk and is an inventor with 19 issued patents.
SLIDE 3 Sp Speaker #2 #2 Robert Hoyt
- Presentation: Machine Learning for Non-Data Scientists
- Title: Associate Clinical Professor, Internal Medicine
Department, Virginia Commonwealth University, Richmond, VA
- Bio: Dr Hoyt has taught Health Informatics for many years
and is the co-editor and author of Health Informatics: Practical Guide, seventh edition. His second textbook Introduction to Biomedical Data Science will be published in December. His goal is to help educate clinicians and informatics students about new trends in data science, to include machine learning and artificial intelligence
SLIDE 4 Sp Speaker #3 #3 Ke Kevin Lyma man
- Presentation: Practical Applications in Clinical AI
- Title: CEO, Enlitic Corp., San Francisco, CA
- Bio: Kevin Lyman is an engineer and entrepreneur who
received a BS in Computer Science from RPI. Prior to working at Enlitic he was employed at Hasbro, SpaceX and Microsoft. As CEO of Enlitic, his focus is on integrating AI into Radiology workflow. Enlitic was twice named one of MIT Technology Review’s 50 smartest
- companies. He is also the founder of The Inventor’s
Guild and is a highly sought-after speaker on AI.
SLIDE 5 Machine Learning for Non-Data Scientists
Robert Hoyt MD, FACP, FAMIA, ABPM-CI
@socialnamehere @socialnamehere @socialnamehere @socialnamehere
www.aimed.events/northamerica-2019/
AIMed NORTH AMERICA, CALIFORNIA 11–14 DECEMBER 2019
SLIDE 6
- Discuss the importance of machine learning for clinicians
- Enumerate the challenges of learning a programming
language such as R or Python for machine learning
- List some of the open source machine learning programs
that do not require higher math or programming skills
- Use RapidMiner as an example of ML software
Le Learning O g Objectives Af After viewing participants s sh shoul uld be able to:
www.aimed.events/northamerica-2019/
AIMed NORTH AMERICA, CALIFORNIA 11–14 DECEMBER 2019
SLIDE 7
Di Disc sclaimer
I have no conflicts of interest to report
SLIDE 8 Wh Why clinicians should understand ma machine learning
- Machine learning is commonly employed for predictive
analytics, in addition to statistical approaches
- Some knowledge of ML is important in order to
intelligently read or review medical articles today
- Understanding ML is a logical step towards also
understanding deep learning and artificial intelligence
SLIDE 9
So You Want To Be a Data Scientist?
SLIDE 10 Ma Machine Learning Challenges
- To learn machine learning by using a programming
language probably means 1-2 years of education and experience
- To fully understand AI implies comfort with calculus and
linear algebra
- Pre-requisites for some data science Master’s degrees
include a programming language and higher math
SLIDE 11
- Because 60-80% of the time spent by data scientists is
spent in data preparation/exploration, some knowledge of spreadsheets, visualization and biostats is mandatory
- You must understand little data before big data and
shallow learning before deep learning
- Machine learning software provides the algorithmic phase
- f data analysis, but there is much more to know
- However, machine learning software promotes the
“democratization of data science”
Caveats
Machine Learning Challenges
SLIDE 12
Is Is This our r Curre rrent Status?
SLIDE 13 Wh What is the Path Forward?
- Masters in Data Science or Biomedical Data Science?
- Take multiple online courses on your own: Coursera,
Udacity, etc.?
- Learn Python or R?
- Use Machine Learning software?
SLIDE 14 Open Open Sou
ce or
ee for
cadem emic c Use
Name Dependency Uniqueness Limitations WEKA Windows, Mac, Linux GUI based. Associated with courses and textbook Outdated appearance. KNIME Windows, Mac, Linux Visual operators Mild-moderate learning curve Orange Windows, Mac, Linux Python based. Visual
Limited community forum H2o ai Web-based Advanced Mild-moderate learning curve BigML Web-based Advanced Mild-moderate learning curve BlueSky Statistics Windows only R based Does not include neural networks RapidMiner Windows, Mac, Linux Visual operators and GUI
- based. Automated analysis
- None. “Best of breed”?
SLIDE 15 Rapi RapidMine ner
- Web based. Free for academic use. Free 30-day trial,
after that - visual operators only
- Comprehensive: data preparation, visualization, statistics,
machine learning and deep learning
- Excellent algorithm performance matrices
- Automated steps: TurboPrepⓇ and AutoModelⓇ
- Runs multiple algorithms simultaneously
- Embedded help
- User community
SLIDE 16 Ra RapidMiner
- Turbo Prep –
- Transform (filter, sort, split)
- Clean (auto clean, PCA,
normalize, remove low quality, highly correlated variables and duplicates, create dummy codes)
- Merge datasets
- Create pivot tables
- Extensive data visualization
- Extensions: NLP, DL, Stats, and link
to Hadoop
- Extensive algorithm library
- Auto – Model
- Screens variables for quality
- Select the column of interest and
it selects the appropriate algorithms for classification or regression
- Clustering (k-means, x-means)
- Runs multiple algorithms at same
time
- Output – AUC, accuracy, F score,
sensitivity, specificity, precision, recall, classification errors
SLIDE 17
Tu Turbo boPrep - Ge General al
SLIDE 18
Tu Turbo boPrep - Tr Tran ansfo form
SLIDE 19
Tu Turbo boPrep - Ch Charts
SLIDE 20
Tu Turbo boPrep - Cl Clea eans nse
SLIDE 21
Au Auto toMode del - Ge General al
SLIDE 22
Au Auto toMode del - Pr Predict
SLIDE 23
Au Auto toMode del – Se Select Class
SLIDE 24
Au Auto toMode del – Se Select Inputs
SLIDE 25
Au Auto toMode del – Se Select Algorithms ms
SLIDE 26
Au Auto toMode del - Re Results ts
SLIDE 27
Au Auto toMode del - Pe Performance
SLIDE 28
Au Auto toMode del - We Weights
SLIDE 29 Weighted Predictors
Naïve Bayes - Simulator
SLIDE 30
Au Auto toMode del – De Desc scriptive Stats s
SLIDE 31 Au Auto toMode del – Cor Correl elation
Matrix
SLIDE 32
Pr Processe sses s Running in the Background
SLIDE 33 Clustering
Unsupervised Learning – Clustering
SLIDE 34 Concl Conclusions
- ns
- Machine learning software allows clinicians to use
supervised and unsupervised machine learning to model data, without programming languages or higher math
- Supplemental reading in stats, visualization,
performance, etc. is important
- Collaboration with experts is always advised