Designing with the Crowd: Heart Disease Diagnosis in the Data - - PowerPoint PPT Presentation
Designing with the Crowd: Heart Disease Diagnosis in the Data - - PowerPoint PPT Presentation
Designing with the Crowd: Heart Disease Diagnosis in the Data Science Bowl with Deep Learning Aaron Sander Data Scientist Data Science for Social Good 4 Why did we start the doing data science for social good? Desire to show that data
Data Science for Social Good
4 Why did we start the doing data science for social good?
– Desire to show that data science can take
- n a variety of problems for Social
Improvement – Consistent with many of people’s core values – Chance to tackle a problem area regardless of the bottom business impact – Focus on the intrinsic value of the work
4 1st Data Science Bowl focused on Ocean Health 4 2nd Data Science Bowl focused on Cardiac Health
Crowdsourcing Platforms
4 Kaggle Crowd Sourcing platform
– Focus on Machine Learning – 600,000 data scientists across 200 countries – Motivations: Money, Pride, Challenge, and Feedback – Reputation for well designed problems – World Class algorithm Development – Realtime Leaderboards – Forums rich with insights at each plateau in the competition
4 Top Coder
– Focus on Algorithm development, UI Development, Design
4 Algorithmia
– Algorithm development for API and web
Signs of a Good Competition
4 Lifecycle of a competition 4 Encouraging people to compete and keep trying
– Provide them a good start to engage the largest possible base – Answer questions and provide public guidance – Make sure that you lead them through a incomplete solution to submission – Find the largest audience for the problem
Competition Pitfalls
4 Understand platform limitations
– Need a well defined problem, metric and right level of complexity
4 Discourage Cheating and Avoid Data Leaks
– Keep them solving your problem not another problem – Avoid leaking test data into the training set or the ground truth – Eliminate potential proxies for the ground truth – Discard metadata that might not be present in the model’s production environment
4 Encourage Openness
–
- Dr. Arai specifically wanted to look for novel solutions
including solutions that may not include segmentation – In describing two approaches, in the tutorials one deep learning and another that relied on Fourier spatial- temporal decomposition led we enriched final solutions – One of the top internal teams used a parametric approach, beating out some deep learning models
What is Deep Learning?
4 Revitalization of the neural nets from the 1980s and 1990s
– Key difference is the number of layers used in the network and the unsupervised pretraining phase – Increased computing power and GPU availability improved the training times and enabled rapid experimentation
4 An extremely effective method for creating models that work with large amounts of perception data (images, sounds, language) 4 Large training datasets needed (sometimes) 4 Encompasses a large range of networks including recurrent neural nets (RNN) and convolutional neural nets (CNN) used in image recognition
Convolutional Neural Net
4 Each hidden neuron applies the same localized linear filter to the input 4 Convolutional layers convolve the input with a bank of filters and applies a point-wise non-linearity (usually rectified linear units like f(x) = max(0,x) or tanh) 4 First initialized with data in an unsupervised manner 4 Trained using a backpropagation algorithm that just applies the chain rule multiple times to adjust the neural net parameters to minimize the error 4 Learns the visual feature elements instead of manually creating features of interest
2016 Data Science Bowl Challenge: Predicting EF
2016 Data Science Bowl
4 Challenge – automate calculation of the ejection fraction of the heart Ef = 100 × &'( )&*(
&'(
(EDV = end diastolic volume, ESV = End Systolic volume)
4 Data
– 700 labeled examples provided for initial model development – 440 additional labeled examples provided in final week of competition – Labeled example = set of Cardiac MRIs
- Approximately 30 Side Axial positions (SAX) per patient
- Full cardiac cycle per SAX slice
- Single 2-chamber and 4-chamber views also provided
4 Challenging computer vision problem
– many images, few labels
2016 Data Science Bowl MRI Study
SAX 10 SAX 11 SAX 12 SAX 15 2 Chamber View 4 Chamber View
2016 Data Science Bowl Techniques
4 1st and 3rd place finishers developed models for producing explicit segmentations of the left ventricle of the heart Single MRI Image ⟼ Segmented MRI Image U-Net Architecture Similar to 3rd place model [cite1] Fully- convolutional Architecture Similar to 1st place model [cite2] End Result
2016 Data Science Bowl Techniques
4 2nd place finisher created models that directly predict ejection fraction {MRI study} ⟼ EDV/ESV prediction
2016 Data Science Bowl Results
4 Results of the top three teams were quite impressive, not only in the competition metric but in metrics prevalent in medical community 4 Bland-Altman analysis demonstrates DSB comparable with inter-
- bserver error in human measurements
4 Belenger et al [cite] list human interobserver BA LoA at 23.9 mL for Heart Failure cases, and 7.35 mL for normal cases
Error Characteristics of Top Submissions
Conclusions
4 We obtained three state of the art solutions to our problem with fundamentally different architectures, approaches, and performance trade-
- ffs
4 These allow the client to choose the best combination of
– Ease of implementation – Simplicity of the Model – Scalability to the the problem in the wild
4 Status:
– Winning algorithms are currently being developed to be applied in clinical research lab of Dr. Arai
Partners
4 Dr. Andrew Arai and Dr. Michael Hansen
– Provided the challenge, guidance, and countless hours of their time to help participants understand the context of the challenge