SLIDE 1 Apaar Sadhwani, Leo Tam, Jason Su Advisors: Robert Chang, Jeff Ullman, Andreas Paepcke
*All contributors are/were affiliated with Stanford University at time of their contributions. Leo Tam now works at
- Nvidia. Address correspondence to Apaar Sadhwani at apaars@stanford.edu or Jason Su at sujason@stanford.edu.
SLIDE 2
Motivation:
Affects ~100M, many in developed, ~45% of diabetics Make process faster, assist ophthalmologist, self-help Widespread disease, enable early diagnosis/care
Given fundus image
Rate severity of Diabetic Retinopathy 5 Classes: 0 (Normal), 1, 2, 3, 4 (Severe) Hard classification (may solve as ordinal though) Metric: quadratic weighted kappa, (pred – real)2 penalty
Data from Kaggle
~35,000 training images, ~54,000 test images High resolution: variable, more than 2560 x 1920 Other unlabeled data from Stanford
SLIDE 3
Motivation:
Affects ~100M, many in developed, ~45% of diabetics Make process faster, assist ophthalmologist, self-help Widespread disease, enable early diagnosis/care
Given fundus image
Rate severity of Diabetic Retinopathy 5 Classes: 0 (Normal), 1, 2, 3, 4 (Severe) Hard classification (may solve as ordinal though) Metric: quadratic weighted kappa, (pred – real)2 penalty
Data from Kaggle
~35,000 training images, ~54,000 test images High resolution: variable, more than 2560 x 1920 Other unlabeled data from Stanford
SLIDE 4
Class 0 (normal) Class 4 (severe)
SLIDE 5
Motivation:
Affects ~100M, many in developed, ~45% of diabetics Make process faster, assist ophthalmologist, self-help Widespread disease, enable early diagnosis/care
Given fundus image
Rate severity of Diabetic Retinopathy 5 Classes: 0 (Normal), 1, 2, 3, 4 (Severe) Hard classification (may solve as ordinal though) Metric: quadratic weighted kappa, (pred – real)2 penalty
Data from Kaggle
~35,000 training images, ~54,000 test images High resolution: variable, more than 2560 x 1920 Other unlabeled data from Stanford
SLIDE 6 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
Image size Batch Size 224 x 224 128 2K x 2K 2
SLIDE 7 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
1 2 3 4
SLIDE 8 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
Class 2
SLIDE 9 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
SLIDE 10 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
- Mentioned in problem statement
- Confirmed with doctors
SLIDE 11 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
SLIDE 12 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
- Hard classification non-differentiable
- Backprop difficult
1 Truth 2 3 4 Penalty/Loss Class
SLIDE 13 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
- Hard classification non-differentiable
- Backprop difficult
1 Truth 2 3 4 Predict 1 Penalty/Loss Class
SLIDE 14 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
- Hard classification non-differentiable
- Backprop difficult
1 Truth 2 3 4 Predict 2 Penalty/Loss Class
SLIDE 15 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
- Hard classification non-differentiable
- Backprop difficult
1 Truth 2 3 4 Predict 3 Penalty/Loss Class
SLIDE 16 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
- Hard classification non-differentiable
- Backprop difficult
1 Truth 2 3 4 Penalty/Loss Class
SLIDE 17 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
- Squared error approximation?
- Differentiable
1 Truth 2 3 4 Penalty/Loss Class 2.5
SLIDE 18 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
- Naïve: 3 class problem, or all zeros!
- Learn all classes separately: 1 vs All?
- Balanced while training
- At test time?
SLIDE 19 High resolution images
Atypical in vision, GPU batch size issues
Discriminative features small Grading criteria:
not clear (EyePACS guidelines)
learn from data
Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance
class 0 dominates
Too few training examples
- Big learning models take more data!
- Harness test set?
SLIDE 20
Literature survey:
Hand-designed features to pick each
component
Clean images, small datasets
Optic disk, exudate segmentation:
fail due to artifacts
SVM: poor performance
SLIDE 21
Literature survey:
Hand-designed features to pick each
component
Clean images, small datasets
Optic disk, exudate segmentation:
fail due to artifacts
SVM: poor performance
SLIDE 22
- 1. Registration, Pre-processing
- 2. Convolutional Neural Nets (CNNs)
- 3. Hybrid Architecture
SLIDE 23 Registration
- Hough circles, remove outside portion
- Downsize to common size (224 x 224, 1K x 1K)
Color correction Normalization (mean, variance)
SLIDE 24 3 Conv layers (depth 96) MaxPool (stride2) 3 Conv layers (depth 384) MaxPool (stride2) 3 Conv layers (depth 1024) MaxPool (stride2) AvgPool Input Image Class probabilities 3 Conv layers (depth 256) MaxPool (stride2)
Network in Network architecture
- 7.5M parameters
- No FC layers, spatial average pooling
instead
Transfer learning (ImageNet) Variable learning rates
- Low for “ImageNet” layers
- Schedule
Combat lack of data, over-fitting
- Dropout, Early stopping
- Data augmentation (flips, rotation)
SLIDE 25 3 Conv layers (depth 96) MaxPool (stride2) 3 Conv layers (depth 384) MaxPool (stride2) 3 Conv layers (depth 1024) MaxPool (stride2) AvgPool Input Image Class probabilities 3 Conv layers (depth 256) MaxPool (stride2)
Network in Network architecture
- 7.5M parameters
- No FC layers, spatial average pooling
instead
Transfer learning (ImageNet) Variable learning rates
- Low for “ImageNet” layers
- Schedule
Combat lack of data, over-fitting
- Dropout, Early stopping
- Data augmentation (flips, rotation)
SLIDE 26 3 Conv layers (depth 96) MaxPool (stride2) 3 Conv layers (depth 384) MaxPool (stride2) 3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 3 Conv layers (depth 256) MaxPool (stride2)
Network in Network architecture
- 2.2M parameters
- No FC layers, spatial average pooling
instead
Transfer learning (ImageNet) Variable learning rates
- Low for “ImageNet” layers
- Schedule
Combat lack of data, over-fitting
- Dropout, Early stopping
- Data augmentation (flips, rotation)
SLIDE 27 3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities
Network in Network architecture
- 2.2M parameters
- No FC layers, spatial average pooling
instead
Transfer learning (ImageNet) Variable learning rates
- Low for “ImageNet” layers
- Schedule
Combat lack of data, over-fitting
- Dropout, Early stopping
- Data augmentation (flips, rotation)
SLIDE 28 3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities
Network in Network architecture
- 2.2M parameters
- No FC layers, spatial average pooling
instead
Transfer learning (ImageNet) Variable learning rates
- Low for “ImageNet” layers
- Schedule
Combat lack of data, over-fitting
- Dropout, Early stopping
- Data augmentation (flips, rotation)
SLIDE 29 Network in Network architecture
- 2.2M parameters
- No FC layers, spatial average pooling
instead
Transfer learning (ImageNet) Variable learning rates
- Low for “ImageNet” layers
- Schedule
Combat lack of data, over-fitting
- Dropout, Early stopping
- Data augmentation (flips, rotation)
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities
SLIDE 30
Amazon EC2, GPU nodes, Starcluster
Single GPU nodes for 224 x 224 (g2.2xlarge) Multi-GPU nodes for 1K x 1K (g2.8xlarge)
Used Python for processing Torch library (Lua) for training
SLIDE 31 What image size to use?
- Strategize using 224 x 224 -> extend to 1024 x 1024
What loss function?
- Mean squared error (MSE)
- Negative Log Likelihood (NLL)
- Linear Combination (annealing)
Class imbalance
- Even sampling -> true sampling
SLIDE 32
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result
Image size: 224 x 224
SLIDE 33
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result MSE Fails to learn
Image size: 224 x 224
SLIDE 34
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result MSE Fails to learn MSE Fails to learn
Image size: 224 x 224
SLIDE 35
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result MSE Fails to learn MSE Fails to learn NLL Kappa < 0.1
Image size: 224 x 224
SLIDE 36
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities No learning Loss Function Sampling Result MSE Fails to learn MSE Fails to learn NLL Kappa < 0.1 NLL Kappa = 0.29
Image size: 224 x 224
SLIDE 37
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 0.01x step size Loss Function Sampling Result NLL (top layers only) Kappa = 0.29
Image size: 224 x 224
SLIDE 38
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 0.01x step size Loss Function Sampling Result NLL (top layers only) Kappa = 0.29 NLL Kappa = 0.42
Image size: 224 x 224
SLIDE 39
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 0.01x step size Loss Function Sampling Result NLL (top layers only) Kappa = 0.29 NLL Kappa = 0.42 NLL Kappa = 0.51
Image size: 224 x 224
SLIDE 40
3 Conv layers (depth 384, 64, 5) MaxPool (stride2) AvgPool Input Image Class probabilities 0.01x step size Loss Function Sampling Result NLL (top layers only) Kappa = 0.29 NLL Kappa = 0.42 NLL Kappa = 0.51 MSE Kappa = 0.56
Image size: 224 x 224
SLIDE 41
SLIDE 42
SLIDE 43
2048 1024 64 tiles of 256 x 256
Lesion Detector Main Network Fuse
Class probabilities
SLIDE 44
Web Viewer tool Lesion Annotation Extract image patches Train lesion classifier
SLIDE 45
SLIDE 46
SLIDE 47
SLIDE 48
SLIDE 49 Only hemorrhages so far Positives: 1866 extracted patches from 216
images/subjects
Negatives: ~25k class-0 images Pre-processing/augmentation
- Crop random 256 x 256 image from input, flips
Pre-trained Network in Network architecture Accuracy: 99% for Negatives, 76% for
Positives
SLIDE 50 Only hemorrhages so far Positives: 1866 extracted patches from 216
images/subjects
Negatives: ~25k class-0 images Pre-processing/augmentation
- Crop random 256 x 256 image from input, flips
Pre-trained Network in Network architecture Accuracy: 99% for Negatives, 76% for
Positives
SLIDE 51 Only hemorrhages so far Positives: 1866 extracted patches from 216
images/subjects
Negatives: ~25k class-0 images Pre-processing/augmentation
- Crop random 256 x 256 image from input, flips
Pre-trained Network in Network architecture Accuracy: 99% for Negatives, 76% for
Positives
SLIDE 52 Only hemorrhages so far Positives: 1866 extracted patches from 216
images/subjects
Negatives: ~25k class-0 images Pre-processing/augmentation
- Crop random 256 x 256 image from input, flips
Pre-trained Network in Network architecture Accuracy: 99% for Negatives, 76% for
Positives
SLIDE 53 Only hemorrhages so far Positives: 1866 extracted patches from 216
images/subjects
Negatives: ~25k class-0 images Pre-processing/augmentation
- Crop random 256 x 256 image from input, flips
Pre-trained Network in Network architecture Accuracy: 99% for Negatives, 76% for
Positives
SLIDE 54
2048 1024 64 tiles of 256 x 256
Main Network Fuse
Class probabilities
Lesion Detector
SLIDE 55
2048 1024 64 tiles of 256 x 256 2 Conv layers
Lesion Detector Fuse
Class probabilities
Main Network
64 x 31 x 31 2 x 31 x 31 66 x 31 x 31
SLIDE 56
2048 1024 64 tiles of 256 x 256 2 Conv layers
Lesion Detector Fuse
Class probabilities
Main Network
64 x 31 x 31 2 x 31 x 31 66 x 31 x 31 2 x 56 x56
SLIDE 57
SLIDE 58
2048 1024 64 tiles of 256 x 256
Main Network Fuse
Class probabilities
Lesion Detector
SLIDE 59
2048 1024 64 tiles of 256 x 256
Main Network Fuse
Class probabilities
Lesion Detector
Backprop
SLIDE 60
2048 1024 64 tiles of 256 x 256
Main Network Fuse
Class probabilities
Lesion Detector
Backprop
SLIDE 61
Supervised-unsupervised learning Distillation Hard-negative mining Other lesion detectors Attention CNNs Both eyes Ensemble
SLIDE 62
3 class problem True “4” problem Combining imaging modalities (OCT) Longitudinal analysis
SLIDE 63
Robert Chang Jeff Ullman Andreas Paepcke Amazon