Recognizing Patterns of Cancer in Histology Imagery Using Deep - - PowerPoint PPT Presentation

recognizing patterns of cancer in histology imagery using
SMART_READER_LITE
LIVE PREVIEW

Recognizing Patterns of Cancer in Histology Imagery Using Deep - - PowerPoint PPT Presentation

= Recognizing Patterns of Cancer in Histology Imagery Using Deep Learning Ted Hromadka 1 , LCDR Niels Olson 2 MD, LT Daniel Ward 2 MD, CDR Arash Mohtashamian 2 MD, Ken Abeloe 1 1 Integrity Applications Incorporated , 2 US Navy NMCSD Presented


slide-1
SLIDE 1

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Recognizing Patterns of Cancer in Histology Imagery Using Deep Learning

Ted Hromadka1, LCDR Niels Olson2 MD, LT Daniel Ward2 MD, CDR Arash Mohtashamian2 MD, Ken Abeloe1

1Integrity Applications Incorporated℠, 2US Navy NMCSD

Presented at GTC 2016

slide-2
SLIDE 2

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Background – prostate cancer is a significant problem

  • US military’s hospitals care for disproportionately more male patients
  • Prostate cancer is second‐leading cause of cancer death in American men

– Approximately 220,000 new cases per year

  • Early screening involves a blood test for prostate‐specific antigen (PSA) or a digital rectal

exam (DRE) – If those tests generate abnormal results, then a prostate biopsy may be required http://www.va.gov/vetdata/docs/quickfacts/Population_slideshow.pdf http://seer.cancer.gov/statfacts/html/prost.html http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate‐cancer‐key‐statistics

slide-3
SLIDE 3

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Each biopsy procedure creates around 12 samples

  • Prostate biopsy is conducted by taking “core samples” using a hollow needle
  • After processing, 5 micron sections of these samples are placed on glass slides, stained, and

manually interpreted by a pathologist under a microscope.

slide-4
SLIDE 4

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Analysis is very labor-intensive

  • Digital scans are opened with custom viewing

software from the microscope vendor – Multiple zoom levels available up to 40x. This dataset was scanned at 20x.

  • Pathologist will annotate cancerous regions

with polygons drawn by hand with a mouse

  • Process requires careful judgment and is

susceptible to fatigue and stress factors. Polygons cannot be edited once drawn (e.g., at higher magnification).

slide-5
SLIDE 5

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Biopsy analysis is challenging

  • Tissues can be difficult to differentiate
  • Cancerous region may be only partially sampled by the needle
  • This is an image classification problem
slide-6
SLIDE 6

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Apply deep learning techniques to this image classification problem

  • IAI was using Caffe for ship detection and classification in maritime aerial imagery
  • Believed NVIDIA’s DIGITS software offered promising approach for the histology problem
slide-7
SLIDE 7

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Deep learning in a nutshell

  • GPU‐enabled evolution of artificial neural networks from 1990s
  • Each layer is a set of “neurons” with weighted connections
  • Each neuron responds to its unique aspect of the input data with varying degrees of strength
  • Different weights compute different functions
  • Training the network “teaches” it a complicated function

– Supervised vs unsupervised learning

  • Modern computing hardware allows more layers of neurons… “deep” learning

– Reinforcement learning

  • Several open, GPU‐enabled frameworks (Caffe, Torch, Theano, DL4J, TensorFlow)
  • Convolutional neural networks excel at image recognition
slide-8
SLIDE 8

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Puppy or bagel?

slide-9
SLIDE 9

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Specifications

  • Imagery

– 202 annotated full‐size color SVS images  106,024 image chips

  • Average full size image ~ 845 MB

– Annotated by Navy pathologists

  • System

– NVIDIA GeForce GTX980 GPU (single card) via Intel Haswell‐E PCIe 3.0

  • Maxwell architecture, 2048 CUDA cores, 4GB memory, NV driver 352.63

– 6‐core Intel Xeon E5‐2603 v3 at 1.60 GHz with 16GB DDR4 – Ubuntu 14.04, DIGITS 3.0‐rc3, CUDA 7.5, cuDNN v4, NVCaffe 0.14

slide-10
SLIDE 10

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Used MATLAB image chipper to prepare the images

  • Split SVS into image chips of size 256x256 pixels at the 4:1 zoom level
  • Chipper labels each image chip based on XML annotation polygons (50% inclusion rule)
  • Chipper 2.0 also used pixel averaging and histograms to determine if chip was a “blank” or an

“ink” smear http://caffe.berkeleyvision.org/ XML parser built on work by Andrew Janowczyk (http://www.andrewjanowczyk.com/)

slide-11
SLIDE 11

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Naïve results were terrible

  • Simple “cancer / not‐cancer” labeling was a disaster
  • Immediate 50% accuracy for a binary classifier meant that it was just a random guess
slide-12
SLIDE 12

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Solution: refine the training categories

  • Bad data (blank areas, ink marks)
  • More tissue types (fat)
  • Manually inspect the input data for anomalies
  • Still using stock GoogLeNet network
  • Additional training epochs had minimal effect
slide-13
SLIDE 13

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Cancer or not cancer?

slide-14
SLIDE 14

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

5 categories of refined training data => raised accuracy to 90%

slide-15
SLIDE 15

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

How accurate is the measure of accuracy?

  • Elmore et al – Breast Biopsy Concordance study found only 75% agreement between

expert pathologists – JAMA, 2015: http://jama.jamanetwork.com/article.aspx?articleid=2203798

  • Need protocol for the confidence levels

– What threshold to use when network gives it a substantial chance of cancer?

slide-16
SLIDE 16

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

In progress - adding more categories to improve accuracy

  • Seminal vesicles
  • Lymphocytes
  • Corpora amylacea
  • Blood
  • Nerves
  • Muscle (healthy)
  • Stroma
  • Gleason scale
  • Perineural invasion
  • Atrophic glands
  • Atrophic prostate necrosis
slide-17
SLIDE 17

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

In progress – looking for ways to handle pragmatic labeling

  • Training data suffers from inaccuracies

– Annotation was not meant for training neural networks – Not pixel‐perfect

  • Artifacts due to the scanner or tissue

preparation – Striping – Ink

  • Experimenting with statistical solutions to

noisy data

slide-18
SLIDE 18

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Project assessment: bulk of time was spent on data preparation

DIGITS greatly facilitated the DL training MATLAB time mostly spent moving data

Annotate images write MATLAB chipper run MATLAB chipper on data set Install & configure DIGITS DIGITS ‐ create database DIGITS ‐ train 1 network DIGITS ‐ run 1 chip on network Caffe ‐ run 1 full image on DNN

Labeling images

slide-19
SLIDE 19

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Automated image classification step is 50% faster than a pathologist

  • Chipper, classifier, output rendering = 29 minutes, vs “less than an hour” for a pathologist
  • Still needs a pathologist to review the output for final determination
  • Will be faster on better hardware
  • Data transport is a bottleneck to using HPC assets, but not an impossibility

– Upload raw microscope image to Navy DSRC – Run image processing on those GPU nodes – HPCMP Portal “Virtual App” for final pathologist image review

  • Also considering Google/AWS/Azure services deployment, but HIPAA complications
slide-20
SLIDE 20

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Next steps – fully automated process

  • No signs of overfitting – seek more data
  • Try 128x128 chips to reduce chance of multiple tissue types per image
  • Software pipeline

– Digitization scan > Chipper > DL Classifier > Heat Map > Viewer

slide-21
SLIDE 21

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Other approaches

  • Similar work ongoing by Hao Chen at Chinese University of Hong Kong

– https://news.developer.nvidia.com/diagnosing‐cancer‐with‐deep‐learning‐and‐gpus/

  • Multispectral imaging

– Manseld JR, Levenson RM. Paving a New Path: Multispectral Imaging in Pathology. InFocus(14). Royal Microscopical Society. 2009.

  • ImageJ – Cell morphology
  • Andy Beck’s lab at Harvard
  • Antonio Criminisi and Steve White with glioblastoma: WhiteMarshLabs
  • Aperio GENIE (uses older machine learning algorithms, entirely CPU‐based)

– http://www.leicabiosystems.com/digital‐pathology/image‐analysis‐solutions/details/product/genie/

slide-22
SLIDE 22

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378‐8672 • www.integrity‐apps.com

Conclusions

  • Vast majority of time spent on preparing the data
  • Unsupervised learning is the future
  • Works surprisingly well