Designing with the Crowd: Heart Disease Diagnosis in the Data - - PowerPoint PPT Presentation

designing with the crowd heart disease diagnosis in the
SMART_READER_LITE
LIVE PREVIEW

Designing with the Crowd: Heart Disease Diagnosis in the Data - - PowerPoint PPT Presentation

Designing with the Crowd: Heart Disease Diagnosis in the Data Science Bowl with Deep Learning Aaron Sander Data Scientist Data Science for Social Good 4 Why did we start the doing data science for social good? Desire to show that data


slide-1
SLIDE 1

Aaron Sander Data Scientist

Designing with the Crowd: Heart Disease Diagnosis in the Data Science Bowl with Deep Learning

slide-2
SLIDE 2

Data Science for Social Good

4 Why did we start the doing data science for social good?

– Desire to show that data science can take

  • n a variety of problems for Social

Improvement – Consistent with many of people’s core values – Chance to tackle a problem area regardless of the bottom business impact – Focus on the intrinsic value of the work

4 1st Data Science Bowl focused on Ocean Health 4 2nd Data Science Bowl focused on Cardiac Health

slide-3
SLIDE 3

Crowdsourcing Platforms

4 Kaggle Crowd Sourcing platform

– Focus on Machine Learning – 600,000 data scientists across 200 countries – Motivations: Money, Pride, Challenge, and Feedback – Reputation for well designed problems – World Class algorithm Development – Realtime Leaderboards – Forums rich with insights at each plateau in the competition

4 Top Coder

– Focus on Algorithm development, UI Development, Design

4 Algorithmia

– Algorithm development for API and web

slide-4
SLIDE 4

Signs of a Good Competition

4 Lifecycle of a competition 4 Encouraging people to compete and keep trying

– Provide them a good start to engage the largest possible base – Answer questions and provide public guidance – Make sure that you lead them through a incomplete solution to submission – Find the largest audience for the problem

slide-5
SLIDE 5

Competition Pitfalls

4 Understand platform limitations

– Need a well defined problem, metric and right level of complexity

4 Discourage Cheating and Avoid Data Leaks

– Keep them solving your problem not another problem – Avoid leaking test data into the training set or the ground truth – Eliminate potential proxies for the ground truth – Discard metadata that might not be present in the model’s production environment

4 Encourage Openness

  • Dr. Arai specifically wanted to look for novel solutions

including solutions that may not include segmentation – In describing two approaches, in the tutorials one deep learning and another that relied on Fourier spatial- temporal decomposition led we enriched final solutions – One of the top internal teams used a parametric approach, beating out some deep learning models

slide-6
SLIDE 6

What is Deep Learning?

4 Revitalization of the neural nets from the 1980s and 1990s

– Key difference is the number of layers used in the network and the unsupervised pretraining phase – Increased computing power and GPU availability improved the training times and enabled rapid experimentation

4 An extremely effective method for creating models that work with large amounts of perception data (images, sounds, language) 4 Large training datasets needed (sometimes) 4 Encompasses a large range of networks including recurrent neural nets (RNN) and convolutional neural nets (CNN) used in image recognition

slide-7
SLIDE 7

Convolutional Neural Net

4 Each hidden neuron applies the same localized linear filter to the input 4 Convolutional layers convolve the input with a bank of filters and applies a point-wise non-linearity (usually rectified linear units like f(x) = max(0,x) or tanh) 4 First initialized with data in an unsupervised manner 4 Trained using a backpropagation algorithm that just applies the chain rule multiple times to adjust the neural net parameters to minimize the error 4 Learns the visual feature elements instead of manually creating features of interest

slide-8
SLIDE 8

2016 Data Science Bowl Challenge: Predicting EF

slide-9
SLIDE 9

2016 Data Science Bowl

4 Challenge – automate calculation of the ejection fraction of the heart Ef = 100 × &'( )&*(

&'(

(EDV = end diastolic volume, ESV = End Systolic volume)

4 Data

– 700 labeled examples provided for initial model development – 440 additional labeled examples provided in final week of competition – Labeled example = set of Cardiac MRIs

  • Approximately 30 Side Axial positions (SAX) per patient
  • Full cardiac cycle per SAX slice
  • Single 2-chamber and 4-chamber views also provided

4 Challenging computer vision problem

– many images, few labels

slide-10
SLIDE 10

2016 Data Science Bowl MRI Study

SAX 10 SAX 11 SAX 12 SAX 15 2 Chamber View 4 Chamber View

slide-11
SLIDE 11

2016 Data Science Bowl Techniques

4 1st and 3rd place finishers developed models for producing explicit segmentations of the left ventricle of the heart Single MRI Image ⟼ Segmented MRI Image U-Net Architecture Similar to 3rd place model [cite1] Fully- convolutional Architecture Similar to 1st place model [cite2] End Result

slide-12
SLIDE 12

2016 Data Science Bowl Techniques

4 2nd place finisher created models that directly predict ejection fraction {MRI study} ⟼ EDV/ESV prediction

slide-13
SLIDE 13

2016 Data Science Bowl Results

4 Results of the top three teams were quite impressive, not only in the competition metric but in metrics prevalent in medical community 4 Bland-Altman analysis demonstrates DSB comparable with inter-

  • bserver error in human measurements

4 Belenger et al [cite] list human interobserver BA LoA at 23.9 mL for Heart Failure cases, and 7.35 mL for normal cases

slide-14
SLIDE 14

Error Characteristics of Top Submissions

slide-15
SLIDE 15

Conclusions

4 We obtained three state of the art solutions to our problem with fundamentally different architectures, approaches, and performance trade-

  • ffs

4 These allow the client to choose the best combination of

– Ease of implementation – Simplicity of the Model – Scalability to the the problem in the wild

4 Status:

– Winning algorithms are currently being developed to be applied in clinical research lab of Dr. Arai

slide-16
SLIDE 16

Partners

4 Dr. Andrew Arai and Dr. Michael Hansen

– Provided the challenge, guidance, and countless hours of their time to help participants understand the context of the challenge

4 Corporate Sponsors

slide-17
SLIDE 17

Thank you!

Email: Sander_Aaron@bah.com Submit your ideas to DataScienceBowl@bah.com for the next challenge.