Advisors: Robert Chang, Jeff Ullman, Andreas Paepcke *All - - PowerPoint PPT Presentation

advisors robert chang jeff ullman andreas paepcke all
SMART_READER_LITE
LIVE PREVIEW

Advisors: Robert Chang, Jeff Ullman, Andreas Paepcke *All - - PowerPoint PPT Presentation

Apaar Sadhwani, Leo Tam, Jason Su Advisors: Robert Chang, Jeff Ullman, Andreas Paepcke *All contributors are/were affiliated with Stanford University at time of their contributions. Leo Tam now works at Nvidia. Address correspondence to Apaar


slide-1
SLIDE 1

Apaar Sadhwani, Leo Tam, Jason Su Advisors: Robert Chang, Jeff Ullman, Andreas Paepcke

*All contributors are/were affiliated with Stanford University at time of their contributions. Leo Tam now works at

  • Nvidia. Address correspondence to Apaar Sadhwani at apaars@stanford.edu or Jason Su at sujason@stanford.edu.
slide-2
SLIDE 2

 Motivation:

 Affects ~100M, many in developed, ~45% of diabetics  Make process faster, assist ophthalmologist, self-help  Widespread disease, enable early diagnosis/care

 Given fundus image

 Rate severity of Diabetic Retinopathy  5 Classes: 0 (Normal), 1, 2, 3, 4 (Severe)  Hard classification (may solve as ordinal though)  Metric: quadratic weighted kappa, (pred – real)2 penalty

 Data from Kaggle

 ~35,000 training images, ~54,000 test images  High resolution: variable, more than 2560 x 1920  Other unlabeled data from Stanford

slide-3
SLIDE 3

 Motivation:

 Affects ~100M, many in developed, ~45% of diabetics  Make process faster, assist ophthalmologist, self-help  Widespread disease, enable early diagnosis/care

 Given fundus image

 Rate severity of Diabetic Retinopathy  5 Classes: 0 (Normal), 1, 2, 3, 4 (Severe)  Hard classification (may solve as ordinal though)  Metric: quadratic weighted kappa, (pred – real)2 penalty

 Data from Kaggle

 ~35,000 training images, ~54,000 test images  High resolution: variable, more than 2560 x 1920  Other unlabeled data from Stanford

slide-4
SLIDE 4

Class 0 (normal) Class 4 (severe)

slide-5
SLIDE 5

 Motivation:

 Affects ~100M, many in developed, ~45% of diabetics  Make process faster, assist ophthalmologist, self-help  Widespread disease, enable early diagnosis/care

 Given fundus image

 Rate severity of Diabetic Retinopathy  5 Classes: 0 (Normal), 1, 2, 3, 4 (Severe)  Hard classification (may solve as ordinal though)  Metric: quadratic weighted kappa, (pred – real)2 penalty

 Data from Kaggle

 ~35,000 training images, ~54,000 test images  High resolution: variable, more than 2560 x 1920  Other unlabeled data from Stanford

slide-6
SLIDE 6

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

Image size Batch Size 224 x 224 128 2K x 2K 2

slide-7
SLIDE 7

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

1 2 3 4

slide-8
SLIDE 8

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

Class 2

slide-9
SLIDE 9

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

slide-10
SLIDE 10

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

  • Mentioned in problem statement
  • Confirmed with doctors
slide-11
SLIDE 11

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

slide-12
SLIDE 12

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

  • Hard classification non-differentiable
  • Backprop difficult

1 Truth 2 3 4 Penalty/Loss Class

slide-13
SLIDE 13

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

  • Hard classification non-differentiable
  • Backprop difficult

1 Truth 2 3 4 Predict 1 Penalty/Loss Class

slide-14
SLIDE 14

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

  • Hard classification non-differentiable
  • Backprop difficult

1 Truth 2 3 4 Predict 2 Penalty/Loss Class

slide-15
SLIDE 15

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

  • Hard classification non-differentiable
  • Backprop difficult

1 Truth 2 3 4 Predict 3 Penalty/Loss Class

slide-16
SLIDE 16

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

  • Hard classification non-differentiable
  • Backprop difficult

1 Truth 2 3 4 Penalty/Loss Class

slide-17
SLIDE 17

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

  • Squared error approximation?
  • Differentiable

1 Truth 2 3 4 Penalty/Loss Class 2.5

slide-18
SLIDE 18

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

  • Naïve: 3 class problem, or all zeros!
  • Learn all classes separately: 1 vs All?
  • Balanced while training
  • At test time?
slide-19
SLIDE 19

 High resolution images

Atypical in vision, GPU batch size issues

 Discriminative features small  Grading criteria:

not clear (EyePACS guidelines)

learn from data

 Incorrect labeling  Artifacts in ~40% images  Optimizing approach to QWK  Severe class imbalance

class 0 dominates

 Too few training examples

  • Big learning models take more data!
  • Harness test set?
slide-20
SLIDE 20

 Literature survey:

 Hand-designed features to pick each

component

 Clean images, small datasets

 Optic disk, exudate segmentation:

fail due to artifacts

 SVM: poor performance

slide-21
SLIDE 21

 Literature survey:

 Hand-designed features to pick each

component

 Clean images, small datasets

 Optic disk, exudate segmentation:

fail due to artifacts

 SVM: poor performance

slide-22
SLIDE 22
  • 1. Registration, Pre-processing
  • 2. Convolutional Neural Nets (CNNs)
  • 3. Hybrid Architecture
slide-23
SLIDE 23

 Registration

  • Hough circles, remove outside portion
  • Downsize to common size (224 x 224, 1K x 1K)

 Color correction  Normalization (mean, variance)

slide-24
SLIDE 24

3 Conv layers (depth 96) MaxPool (stride2) 3 Conv layers (depth 384) MaxPool (stride2) 3 Conv layers (depth 1024) MaxPool (stride2) AvgPool Input Image Class probabilities 3 Conv layers (depth 256) MaxPool (stride2)

 Network in Network architecture

  • 7.5M parameters
  • No FC layers, spatial average pooling

instead

 Transfer learning (ImageNet)  Variable learning rates

  • Low for “ImageNet” layers
  • Schedule

 Combat lack of data, over-fitting

  • Dropout, Early stopping
  • Data augmentation (flips, rotation)
slide-25
SLIDE 25

3 Conv layers (depth 96) MaxPool (stride2) 3 Conv layers (depth 384) MaxPool (stride2) 3 Conv layers (depth 1024) MaxPool (stride2) AvgPool Input Image Class probabilities 3 Conv layers (depth 256) MaxPool (stride2)

 Network in Network architecture

  • 7.5M parameters
  • No FC layers, spatial average pooling

instead

 Transfer learning (ImageNet)  Variable learning rates

  • Low for “ImageNet” layers
  • Schedule

 Combat lack of data, over-fitting

  • Dropout, Early stopping
  • Data augmentation (flips, rotation)
slide-26
SLIDE 26

3 Conv layers (depth 96) MaxPool (stride2) 3 Conv layers (depth 384) MaxPool (stride2) 3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 3 Conv layers (depth 256) MaxPool (stride2)

 Network in Network architecture

  • 2.2M parameters
  • No FC layers, spatial average pooling

instead

 Transfer learning (ImageNet)  Variable learning rates

  • Low for “ImageNet” layers
  • Schedule

 Combat lack of data, over-fitting

  • Dropout, Early stopping
  • Data augmentation (flips, rotation)
slide-27
SLIDE 27

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities

 Network in Network architecture

  • 2.2M parameters
  • No FC layers, spatial average pooling

instead

 Transfer learning (ImageNet)  Variable learning rates

  • Low for “ImageNet” layers
  • Schedule

 Combat lack of data, over-fitting

  • Dropout, Early stopping
  • Data augmentation (flips, rotation)
slide-28
SLIDE 28

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities

 Network in Network architecture

  • 2.2M parameters
  • No FC layers, spatial average pooling

instead

 Transfer learning (ImageNet)  Variable learning rates

  • Low for “ImageNet” layers
  • Schedule

 Combat lack of data, over-fitting

  • Dropout, Early stopping
  • Data augmentation (flips, rotation)
slide-29
SLIDE 29

 Network in Network architecture

  • 2.2M parameters
  • No FC layers, spatial average pooling

instead

 Transfer learning (ImageNet)  Variable learning rates

  • Low for “ImageNet” layers
  • Schedule

 Combat lack of data, over-fitting

  • Dropout, Early stopping
  • Data augmentation (flips, rotation)

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities

slide-30
SLIDE 30

 Amazon EC2, GPU nodes, Starcluster

 Single GPU nodes for 224 x 224 (g2.2xlarge)  Multi-GPU nodes for 1K x 1K (g2.8xlarge)

 Used Python for processing  Torch library (Lua) for training

slide-31
SLIDE 31

 What image size to use?

  • Strategize using 224 x 224 -> extend to 1024 x 1024

 What loss function?

  • Mean squared error (MSE)
  • Negative Log Likelihood (NLL)
  • Linear Combination (annealing)

 Class imbalance

  • Even sampling -> true sampling
slide-32
SLIDE 32

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result

Image size: 224 x 224

slide-33
SLIDE 33

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result MSE Fails to learn

Image size: 224 x 224

slide-34
SLIDE 34

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result MSE Fails to learn MSE Fails to learn

Image size: 224 x 224

slide-35
SLIDE 35

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result MSE Fails to learn MSE Fails to learn NLL Kappa < 0.1

Image size: 224 x 224

slide-36
SLIDE 36

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result MSE Fails to learn MSE Fails to learn NLL Kappa < 0.1 NLL Kappa = 0.29

Image size: 224 x 224

slide-37
SLIDE 37

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 0.01x step size Loss Function Sampling Result NLL (top layers only) Kappa = 0.29

Image size: 224 x 224

slide-38
SLIDE 38

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 0.01x step size Loss Function Sampling Result NLL (top layers only) Kappa = 0.29 NLL Kappa = 0.42

Image size: 224 x 224

slide-39
SLIDE 39

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 0.01x step size Loss Function Sampling Result NLL (top layers only) Kappa = 0.29 NLL Kappa = 0.42 NLL Kappa = 0.51

Image size: 224 x 224

slide-40
SLIDE 40

3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 0.01x step size Loss Function Sampling Result NLL (top layers only) Kappa = 0.29 NLL Kappa = 0.42 NLL Kappa = 0.51 MSE Kappa = 0.56

Image size: 224 x 224

slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43

2048 1024 64 tiles of 256 x 256

Lesion Detector Main Network Fuse

Class probabilities

slide-44
SLIDE 44

 Web Viewer tool  Lesion Annotation  Extract image patches  Train lesion classifier

slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49

 Only hemorrhages so far  Positives: 1866 extracted patches from 216

images/subjects

 Negatives: ~25k class-0 images  Pre-processing/augmentation

  • Crop random 256 x 256 image from input, flips

 Pre-trained Network in Network architecture  Accuracy: 99% for Negatives, 76% for

Positives

slide-50
SLIDE 50

 Only hemorrhages so far  Positives: 1866 extracted patches from 216

images/subjects

 Negatives: ~25k class-0 images  Pre-processing/augmentation

  • Crop random 256 x 256 image from input, flips

 Pre-trained Network in Network architecture  Accuracy: 99% for Negatives, 76% for

Positives

slide-51
SLIDE 51

 Only hemorrhages so far  Positives: 1866 extracted patches from 216

images/subjects

 Negatives: ~25k class-0 images  Pre-processing/augmentation

  • Crop random 256 x 256 image from input, flips

 Pre-trained Network in Network architecture  Accuracy: 99% for Negatives, 76% for

Positives

slide-52
SLIDE 52

 Only hemorrhages so far  Positives: 1866 extracted patches from 216

images/subjects

 Negatives: ~25k class-0 images  Pre-processing/augmentation

  • Crop random 256 x 256 image from input, flips

 Pre-trained Network in Network architecture  Accuracy: 99% for Negatives, 76% for

Positives

slide-53
SLIDE 53

 Only hemorrhages so far  Positives: 1866 extracted patches from 216

images/subjects

 Negatives: ~25k class-0 images  Pre-processing/augmentation

  • Crop random 256 x 256 image from input, flips

 Pre-trained Network in Network architecture  Accuracy: 99% for Negatives, 76% for

Positives

slide-54
SLIDE 54

2048 1024 64 tiles of 256 x 256

Main Network Fuse

Class probabilities

Lesion Detector

slide-55
SLIDE 55

2048 1024 64 tiles of 256 x 256 2 Conv layers

Lesion Detector Fuse

Class probabilities

Main Network

64 x 31 x 31 2 x 31 x 31 66 x 31 x 31

slide-56
SLIDE 56

2048 1024 64 tiles of 256 x 256 2 Conv layers

Lesion Detector Fuse

Class probabilities

Main Network

64 x 31 x 31 2 x 31 x 31 66 x 31 x 31 2 x 56 x56

slide-57
SLIDE 57
slide-58
SLIDE 58

2048 1024 64 tiles of 256 x 256

Main Network Fuse

Class probabilities

Lesion Detector

slide-59
SLIDE 59

2048 1024 64 tiles of 256 x 256

Main Network Fuse

Class probabilities

Lesion Detector

Backprop

slide-60
SLIDE 60

2048 1024 64 tiles of 256 x 256

Main Network Fuse

Class probabilities

Lesion Detector

Backprop

slide-61
SLIDE 61

 Supervised-unsupervised learning  Distillation  Hard-negative mining  Other lesion detectors  Attention CNNs  Both eyes  Ensemble

slide-62
SLIDE 62

 3 class problem  True “4” problem  Combining imaging modalities (OCT)  Longitudinal analysis

slide-63
SLIDE 63

 Robert Chang  Jeff Ullman  Andreas Paepcke  Amazon