Designing with the Crowd: Heart Disease Diagnosis in the Data Science Bowl with Deep Learning Aaron Sander Data Scientist
Data Science for Social Good 4 Why did we start the doing data science for social good? – Desire to show that data science can take on a variety of problems for Social Improvement – Consistent with many of people’s core values – Chance to tackle a problem area regardless of the bottom business impact – Focus on the intrinsic value of the work 4 1 st Data Science Bowl focused on Ocean Health 4 2 nd Data Science Bowl focused on Cardiac Health
Crowdsourcing Platforms 4 Kaggle Crowd Sourcing platform – Focus on Machine Learning – 600,000 data scientists across 200 countries – Motivations: Money, Pride, Challenge, and Feedback – Reputation for well designed problems – World Class algorithm Development – Realtime Leaderboards – Forums rich with insights at each plateau in the competition 4 Top Coder – Focus on Algorithm development, UI Development, Design 4 Algorithmia – Algorithm development for API and web
Signs of a Good Competition 4 Lifecycle of a competition 4 Encouraging people to compete and keep trying – Provide them a good start to engage the largest possible base – Answer questions and provide public guidance – Make sure that you lead them through a incomplete solution to submission – Find the largest audience for the problem
Competition Pitfalls 4 Understand platform limitations – Need a well defined problem, metric and right level of complexity 4 Discourage Cheating and Avoid Data Leaks – Keep them solving your problem not another problem – Avoid leaking test data into the training set or the ground truth – Eliminate potential proxies for the ground truth – Discard metadata that might not be present in the model’s production environment 4 Encourage Openness – Dr. Arai specifically wanted to look for novel solutions including solutions that may not include segmentation – In describing two approaches, in the tutorials one deep learning and another that relied on Fourier spatial- temporal decomposition led we enriched final solutions – One of the top internal teams used a parametric approach, beating out some deep learning models
What is Deep Learning? 4 Revitalization of the neural nets from the 1980s and 1990s – Key difference is the number of layers used in the network and the unsupervised pretraining phase – Increased computing power and GPU availability improved the training times and enabled rapid experimentation 4 An extremely effective method for creating models that work with large amounts of perception data (images, sounds, language) 4 Large training datasets needed (sometimes) 4 Encompasses a large range of networks including recurrent neural nets (RNN) and convolutional neural nets (CNN) used in image recognition
Convolutional Neural Net Each hidden neuron applies the same localized linear filter 4 to the input Convolutional layers convolve the input with a bank of 4 filters and applies a point-wise non-linearity (usually rectified linear units like f(x) = max(0,x) or tanh) First initialized with data in an unsupervised manner 4 Trained using a backpropagation algorithm that just applies 4 the chain rule multiple times to adjust the neural net parameters to minimize the error Learns the visual feature elements instead of manually 4 creating features of interest
2016 Data Science Bowl Challenge: Predicting EF
2016 Data Science Bowl 4 Challenge – automate calculation of the ejection fraction of the heart E f = 100 × &'( )&*( (EDV = end diastolic volume, ESV = End Systolic volume) &'( 4 Data – 700 labeled examples provided for initial model development – 440 additional labeled examples provided in final week of competition – Labeled example = set of Cardiac MRIs • Approximately 30 Side Axial positions (SAX) per patient • Full cardiac cycle per SAX slice • Single 2-chamber and 4-chamber views also provided 4 Challenging computer vision problem – many images, few labels
2016 Data Science Bowl MRI Study SAX 10 SAX 11 SAX 12 SAX 15 2 Chamber View 4 Chamber View
2016 Data Science Bowl Techniques 4 1 st and 3 rd place finishers developed models for producing explicit segmentations of the left ventricle of the heart Single MRI Image ⟼ Segmented MRI Image U-Net End Architecture Result Similar to 3 rd place model [cite1] Fully- convolutional Architecture Similar to 1 st place model [cite2]
2016 Data Science Bowl Techniques 4 2 nd place finisher created models that directly predict ejection fraction {MRI study} ⟼ EDV/ESV prediction
2016 Data Science Bowl Results 4 Results of the top three teams were quite impressive, not only in the competition metric but in metrics prevalent in medical community 4 Bland-Altman analysis demonstrates DSB comparable with inter- observer error in human measurements 4 Belenger et al [cite] list human interobserver BA LoA at 23.9 mL for Heart Failure cases, and 7.35 mL for normal cases
Error Characteristics of Top Submissions
Conclusions 4 We obtained three state of the art solutions to our problem with fundamentally different architectures, approaches, and performance trade- offs 4 These allow the client to choose the best combination of – Ease of implementation – Simplicity of the Model – Scalability to the the problem in the wild 4 Status: – Winning algorithms are currently being developed to be applied in clinical research lab of Dr. Arai
Partners 4 Dr. Andrew Arai and Dr. Michael Hansen – Provided the challenge, guidance, and countless hours of their time to help participants understand the context of the challenge 4 Corporate Sponsors
Thank you! Email: Sander_Aaron@bah.com Submit your ideas to DataScienceBowl@bah.com for the next challenge .
Recommend
More recommend