designing with the crowd heart disease diagnosis in the
play

Designing with the Crowd: Heart Disease Diagnosis in the Data - PowerPoint PPT Presentation

Designing with the Crowd: Heart Disease Diagnosis in the Data Science Bowl with Deep Learning Aaron Sander Data Scientist Data Science for Social Good 4 Why did we start the doing data science for social good? Desire to show that data


  1. Designing with the Crowd: Heart Disease Diagnosis in the Data Science Bowl with Deep Learning Aaron Sander Data Scientist

  2. Data Science for Social Good 4 Why did we start the doing data science for social good? – Desire to show that data science can take on a variety of problems for Social Improvement – Consistent with many of people’s core values – Chance to tackle a problem area regardless of the bottom business impact – Focus on the intrinsic value of the work 4 1 st Data Science Bowl focused on Ocean Health 4 2 nd Data Science Bowl focused on Cardiac Health

  3. Crowdsourcing Platforms 4 Kaggle Crowd Sourcing platform – Focus on Machine Learning – 600,000 data scientists across 200 countries – Motivations: Money, Pride, Challenge, and Feedback – Reputation for well designed problems – World Class algorithm Development – Realtime Leaderboards – Forums rich with insights at each plateau in the competition 4 Top Coder – Focus on Algorithm development, UI Development, Design 4 Algorithmia – Algorithm development for API and web

  4. Signs of a Good Competition 4 Lifecycle of a competition 4 Encouraging people to compete and keep trying – Provide them a good start to engage the largest possible base – Answer questions and provide public guidance – Make sure that you lead them through a incomplete solution to submission – Find the largest audience for the problem

  5. Competition Pitfalls 4 Understand platform limitations – Need a well defined problem, metric and right level of complexity 4 Discourage Cheating and Avoid Data Leaks – Keep them solving your problem not another problem – Avoid leaking test data into the training set or the ground truth – Eliminate potential proxies for the ground truth – Discard metadata that might not be present in the model’s production environment 4 Encourage Openness – Dr. Arai specifically wanted to look for novel solutions including solutions that may not include segmentation – In describing two approaches, in the tutorials one deep learning and another that relied on Fourier spatial- temporal decomposition led we enriched final solutions – One of the top internal teams used a parametric approach, beating out some deep learning models

  6. What is Deep Learning? 4 Revitalization of the neural nets from the 1980s and 1990s – Key difference is the number of layers used in the network and the unsupervised pretraining phase – Increased computing power and GPU availability improved the training times and enabled rapid experimentation 4 An extremely effective method for creating models that work with large amounts of perception data (images, sounds, language) 4 Large training datasets needed (sometimes) 4 Encompasses a large range of networks including recurrent neural nets (RNN) and convolutional neural nets (CNN) used in image recognition

  7. Convolutional Neural Net Each hidden neuron applies the same localized linear filter 4 to the input Convolutional layers convolve the input with a bank of 4 filters and applies a point-wise non-linearity (usually rectified linear units like f(x) = max(0,x) or tanh) First initialized with data in an unsupervised manner 4 Trained using a backpropagation algorithm that just applies 4 the chain rule multiple times to adjust the neural net parameters to minimize the error Learns the visual feature elements instead of manually 4 creating features of interest

  8. 2016 Data Science Bowl Challenge: Predicting EF

  9. 2016 Data Science Bowl 4 Challenge – automate calculation of the ejection fraction of the heart E f = 100 × &'( )&*( (EDV = end diastolic volume, ESV = End Systolic volume) &'( 4 Data – 700 labeled examples provided for initial model development – 440 additional labeled examples provided in final week of competition – Labeled example = set of Cardiac MRIs • Approximately 30 Side Axial positions (SAX) per patient • Full cardiac cycle per SAX slice • Single 2-chamber and 4-chamber views also provided 4 Challenging computer vision problem – many images, few labels

  10. 2016 Data Science Bowl MRI Study SAX 10 SAX 11 SAX 12 SAX 15 2 Chamber View 4 Chamber View

  11. 2016 Data Science Bowl Techniques 4 1 st and 3 rd place finishers developed models for producing explicit segmentations of the left ventricle of the heart Single MRI Image ⟼ Segmented MRI Image U-Net End Architecture Result Similar to 3 rd place model [cite1] Fully- convolutional Architecture Similar to 1 st place model [cite2]

  12. 2016 Data Science Bowl Techniques 4 2 nd place finisher created models that directly predict ejection fraction {MRI study} ⟼ EDV/ESV prediction

  13. 2016 Data Science Bowl Results 4 Results of the top three teams were quite impressive, not only in the competition metric but in metrics prevalent in medical community 4 Bland-Altman analysis demonstrates DSB comparable with inter- observer error in human measurements 4 Belenger et al [cite] list human interobserver BA LoA at 23.9 mL for Heart Failure cases, and 7.35 mL for normal cases

  14. Error Characteristics of Top Submissions

  15. Conclusions 4 We obtained three state of the art solutions to our problem with fundamentally different architectures, approaches, and performance trade- offs 4 These allow the client to choose the best combination of – Ease of implementation – Simplicity of the Model – Scalability to the the problem in the wild 4 Status: – Winning algorithms are currently being developed to be applied in clinical research lab of Dr. Arai

  16. Partners 4 Dr. Andrew Arai and Dr. Michael Hansen – Provided the challenge, guidance, and countless hours of their time to help participants understand the context of the challenge 4 Corporate Sponsors

  17. Thank you! Email: Sander_Aaron@bah.com Submit your ideas to DataScienceBowl@bah.com for the next challenge .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend