BIOPHOTONICS With PredictionIO, Spark and Deep Learning Prajod - - PowerPoint PPT Presentation

biophotonics
SMART_READER_LITE
LIVE PREVIEW

BIOPHOTONICS With PredictionIO, Spark and Deep Learning Prajod - - PowerPoint PPT Presentation

ApacheCon Big Data North America May 2017, Miami, USA BIOPHOTONICS With PredictionIO, Spark and Deep Learning Prajod Vettiyattil, Architect, Wipro @prajods https://in.linkedin.com/in/prajod 2 ABOUT ME Architect at Wipro Big Data


slide-1
SLIDE 1

BIOPHOTONICS

With PredictionIO, Spark and Deep Learning

ApacheCon Big Data North America May 2017, Miami, USA

Prajod Vettiyattil, Architect, Wipro @prajods https://in.linkedin.com/in/prajod

slide-2
SLIDE 2

ABOUT ME

  • Architect at Wipro
  • Big Data division of Open Source Solutions team
  • Machine Learning
  • Video Analytics
  • Platform design and implementation
  • Domain solutions
  • Spark, Java, Python, DL4J, Tensorflow

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 2

slide-3
SLIDE 3

AGENDA

  • Bio photonics
  • Applications
  • PredictionIO
  • Apache Spark
  • DeepLearning4J and Tensorflow
  • Cell detection process
  • Deep learning and CNN
  • Solution Architecture

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 3

slide-4
SLIDE 4

SESSION OVERVIEW

In 4 slides

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 4

slide-5
SLIDE 5

APPLICATIONS

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 5

  • Self driving cars
  • Robots
  • Drones
  • Industrial automation
  • Physical security
  • Medical labs
  • Wherever images or videos are used
slide-6
SLIDE 6

SESSION OVERVIEW

  • Need in the healthcare domain
  • Speed up and automate, cell detections, counting and analysis
  • Diagnosis
  • Medical research
  • Solution
  • Train a Deep Learning Model using digital images of living cells
  • Recognize test images with high accuracy
  • Technology used
  • Training process

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 6

slide-7
SLIDE 7

Expected output Input from the microscope

CLASSIFICATION NEED

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 7

slide-8
SLIDE 8

CLASSIFIED OUTPUT

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 8

slide-9
SLIDE 9

BACKGROUND

How its done

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 9

slide-10
SLIDE 10

INTRODUCTION

  • Photonics: study and harness light
  • The World of Small Things
  • Microscopic life
  • High end microscopes
  • Data set scarcity
  • Accessibility

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 10

nigms.nih.gov

slide-11
SLIDE 11

LIVE CELL IMAGING

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 11

slide-12
SLIDE 12

CONFOCAL MICROSCOPE

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 12

  • Very high resolution
  • Spatial features
slide-13
SLIDE 13

IMAGE COMPARISON

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 13

meyerinst.com

slide-14
SLIDE 14

ELECTRON MICROSCOPE

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 14

Ref: emc.sc.edu

slide-15
SLIDE 15

What to do with all these images of micro stuff ?

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 15

slide-16
SLIDE 16

Spend hours peering through the lens ?

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 16

Ref: wisegeek.org

slide-17
SLIDE 17
  • Even then
  • How many cells can one count in a minute ?
  • How accurate is our ability to visually differentiate

between bacterium A vs bacterium B ?

  • How many patient blood samples can one analyze in an

hour ?

  • Can a doc detect all abnormalities with his endoscope ?
  • How accurate is human visual diagnosis ?

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 17

slide-18
SLIDE 18

AUTOMATED ANALYSIS OF CELLS

  • Detection of cells
  • Count cells
  • Distinguish cell A vs cell B
  • Detect physical abnormalities
  • Cell lifecycle analysis

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 18

slide-19
SLIDE 19

TECHNOLOGY

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 19

slide-20
SLIDE 20

PREDICTION IO

  • Simplifies Machine Learning projects
  • Data storage
  • Training
  • Evaluate models
  • Deploy models
  • Serving predictions

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 20

slide-21
SLIDE 21

PREDICTION IO

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 21

  • DASE architecture
  • Data
  • Algorithm
  • Serving
  • Evaluation
slide-22
SLIDE 22

PREDICTION IO

  • Readymade ML templates
  • Classification
  • Regression
  • Recommendation
  • NLP
  • Clustering
  • Similarity

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 22

slide-23
SLIDE 23

PredictionIO

PREDICTIONIO: LOGICAL VIEW

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 23

Event Server Evaluator

Client application

Serving Engine Training Engine

Storage

Other components Other components Other components

slide-24
SLIDE 24

PREDICTIONIO: PRODUCT VIEW

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 24

PredictionIO Event Server (Spray+Storage) Evaluator

Client application

Serving Engine (Spray+Spark) Training Engine(Spark)

Storage

(Hbase/Postgres/MySQL)

Other components Other components Other components

slide-25
SLIDE 25

APACHE SPARK

  • Fast in memory data processing
  • Real time and batch modes
  • Complements Hadoop
  • Replaces Hadoop MR
  • Adds
  • In memory processing
  • Stream processing
  • Fast for interactive queries
  • YARN or Mesos for clustering
  • Java, Scala, Python, R

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 25

slide-26
SLIDE 26

SPARK: LOGICAL VIEW

Apache Spark Core

Spark SQL Spark Streaming SparkML GraphX

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 26

slide-27
SLIDE 27

SPARK: DEPLOYMENT VIEW

Worker Node Executor Master Node

Executor

Task

Cache

Task

Task

Worker Node Executor Executor

Executor

Task Task

Task Cache Spark’s Cluster Manager Spark Driver

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 27

slide-28
SLIDE 28

DEEPLEARNING4J (DL4J)

  • Deep learning library
  • Open source
  • Apache 2.0 license
  • Java based
  • Distributed execution
  • Runs on Spark and Hadoop

Smart Manufacturing with Apache Spark and Deep Learning #apacheconbigdata @prajods 28

slide-29
SLIDE 29

TENSORFLOW

  • Deep Learning framework
  • from the Google Brain Team
  • Python and C++ SDKs
  • Dataflow graph based processing
  • Tensors and Operations
  • Numerical operations
  • Lazy evaluation
  • Distributed and parallel
  • Training and inference
  • Good documentation
  • Useful examples

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 29

Ref: tensorflow.org

slide-30
SLIDE 30

TENSORFLOW

  • CPU, GPU
  • Mobile: IOS and Android
  • Core API in C
  • Compiled models
  • Visualization using TensorBoard
  • Tensorflow Serving

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 30

slide-31
SLIDE 31

WHAT DOES IT INVOLVE ?

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 31

slide-32
SLIDE 32

THE CELL DETECTION PROCESS

  • Data gathering
  • Data preparation
  • Data extraction
  • Model training
  • Evaluation

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 32

slide-33
SLIDE 33

DATA GATHERING

  • “Google” it ?
  • Cell image data sets are not common
  • Very few youtube videos
  • Get the data set from the labs
  • Caveat: Competitive information

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 33

davidbarlowarchive.com

slide-34
SLIDE 34

DATA EXTRACTION

  • Extract your own data sets from videos
  • Different angles, lighting, perspective
  • Multiple cells
  • Image processing techniques
  • Edge detection
  • Segmentation
  • Back ground subtraction
  • Otsu
  • Watershed

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 34

slide-35
SLIDE 35

MODEL TRAINING

  • Custom models
  • Build your own
  • High difficulty in hyper parameter tuning
  • Very high training effort
  • Small sizes
  • Poor accuracy
  • Transfer learning
  • Reuse an existing image detection model
  • Tensorflow’s inception
  • Replace its final layer/s
  • Very little hyper parameter tuning
  • Involves lower training time

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 35

slide-36
SLIDE 36

EVALUATION

  • Test, test, test
  • Primary tests
  • Accuracy
  • Precision
  • Recall
  • F1 score
  • Cross validate
  • With data sets
  • With different algorithms

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 36

slide-37
SLIDE 37

DEEP LEARNING AND CNN

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 37

slide-38
SLIDE 38

DEEP LEARNING: WHAT

  • Neural Network based Machine Learning
  • Neural Network
  • 1 Input layer
  • 1 or more hidden layers
  • 1 Output layer
  • Basic unit of NN
  • Neuron
  • Combine neurons in many

ways using multiple parameters

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 38

ucalgary.ca

slide-39
SLIDE 39

A TWO LAYER NEURAL NETWORK

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 39

slide-40
SLIDE 40

DEEP LEARNING: WHY

  • For complex input patterns
  • Higher accuracy
  • for image analysis
  • For spatial data analysis
  • Higher training time
  • Parallel execution
  • Higher level architectures for evolving needs
  • CNN, RNN, LSTM

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 40

slide-41
SLIDE 41

CONVOLUTIONAL NEURAL NETWORKS

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 41

slide-42
SLIDE 42

CONVOLUTIONAL NEURAL NETWORK

  • A type of Neural Network
  • Pass many filters(kernels) over an image
  • Capture its features

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 42

slide-43
SLIDE 43

CNN: APPLICATIONS

  • Self driving cars
  • Robotics
  • Drones
  • Industrial automation
  • Physical security
  • Medical labs
  • Wherever images or videos are used

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 43

slide-44
SLIDE 44

CNN ARCHITECTURE

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 44

1st Convolution Non linear transform

(RELU, tanh etc)

Pooling Nth Convolution Non linear transform

(RELU, tanh etc)

Pooling Fully connected layers Output probabiities Feature extraction + dimension reduction Classification

…..

slide-45
SLIDE 45

SOLUTION ARCHITECTURE

For cell detection

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 45

slide-46
SLIDE 46

SOLUTION ARCHITECTURE: TRAINING

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 46

OpenCV

Extract approx. images Segment images

Tensorflow

Evaluate accuracy, precision, recall, F1 score Retrain inception

slide-47
SLIDE 47

SOLUTION ARCHITECTURE: PREDICTION

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 47

PredictionIO

Spark Spray

Client App

PySpark Tensorflow + Cell Model

slide-48
SLIDE 48

OTHER PROJECTS

  • Diabetic retinopathy
  • Google, IBM
  • Cancer cell detection
  • Many institutions
  • Startups
  • Universities
  • CT, MRI scans

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 48

slide-49
SLIDE 49

SUMMARY

  • Photonics
  • Analyzing the small stuff
  • Automating using Deep Learning
  • PredictionIO + Spark + Tensorflow
  • Distributed training and evaluation

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 49

slide-50
SLIDE 50

REFERENCES

  • Analysis of GPU tech used for biological research
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3496509/
  • Rethinking biology, by an MIT startup co-founder
  • http://nikhilbuduma.com/2014/12/27/the-cell-reimagined/
  • Live cell imaging
  • https://www.leica-microsystems.com/science-lab/topics/what-is-live-cell-imaging/topic/Topic////page_t/8/
  • Good explanation of neural nets
  • http://karpathy.github.io/neuralnets/
  • Stanford course material on CNN
  • http://cs231n.github.io/convolutional-networks/
  • Intuitive explanation of CNN
  • https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
  • The current generation of automation for cell analysis
  • https://en.wikipedia.org/wiki/Automated_analyser

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 50

slide-51
SLIDE 51

QUESTIONS

Biophotonics using Apache PredictionIO, Spark and Deep Learning #apacheconbigdata @prajods 51

Prajod Vettiyattil, Architect, Wipro @prajods https://in.linkedin.com/in/prajod