1 Image Classification BVM 2018 Tutorial: Advanced Deep Learning - - PowerPoint PPT Presentation

1 image classification
SMART_READER_LITE
LIVE PREVIEW

1 Image Classification BVM 2018 Tutorial: Advanced Deep Learning - - PowerPoint PPT Presentation

1 Image Classification BVM 2018 Tutorial: Advanced Deep Learning Methods Jakob Wasserthal, Division of Medical Image Computing Author Division Classification of skin cancer 02.11.16 | vs Esteva et al., Dermatologist-level classification of


slide-1
SLIDE 1

1 Image Classification

BVM 2018 Tutorial: Advanced Deep Learning Methods Jakob Wasserthal, Division of Medical Image Computing

slide-2
SLIDE 2

02.11.16 | Author Division | Jakob Wasserthal 2

Esteva et al., Dermatologist-level classification of skin cancer with deep neural networks, Nature, 2017

vs

Classification of skin cancer

slide-3
SLIDE 3

02.11.16 | Author Division | Jakob Wasserthal 3

malignant benign

Esteva et al., Dermatologist-level classification of skin cancer with deep neural networks, Nature, 2017

vs

Classification of skin cancer

slide-4
SLIDE 4

02.11.16 | Author Division | Jakob Wasserthal 4

p(malignant) p(benign)

0.98 0.02

Classification

slide-5
SLIDE 5

02.11.16 | Author Division | Jakob Wasserthal

ILSVRC challenge / ImageNet

5

2012 2013 2014 2015 top-5 
 error

AlexNet
 15.3% ZFNet
 11.2% GoogLeNet
 6.67% ResNet
 3.57% Human
 5.1% second best
 26.2% Inception v3
 3.5%

2017

DenseNet
 ~3.5% VGG
 7.3%

slide-6
SLIDE 6

02.11.16 | Author Division | Jakob Wasserthal

ILSVRC challenge / ImageNet

6

2012 2013 2014 2015 top-5 
 error

AlexNet
 15.3% ZFNet
 11.2% GoogLeNet
 6.67% ResNet
 3.57% Human
 5.1% second best
 26.2% Inception v3
 3.5%

2017

DenseNet
 ~3.5% VGG
 7.3%

slide-7
SLIDE 7

02.11.16 | Author Division | Jakob Wasserthal

VGG

7

Simonyan et al.,Very deep convolutional networks for 
 large-scale image recognition, arXiv, 2014 He et al., Deep Residual Learning for Image Recognition, arXiv, 2015

  • simple structure
  • 160M parameters
slide-8
SLIDE 8

02.11.16 | Author Division | Jakob Wasserthal

ILSVRC challenge / ImageNet

8

2012 2013 2014 2015 top-5 
 error

AlexNet
 15.3% ZFNet
 11.2% ResNet
 3.57% Human
 5.1% second best
 26.2% Inception v3
 3.5%

2017

DenseNet
 ~3.5% GoogLeNet
 6.67% VGG
 7.3%

slide-9
SLIDE 9

02.11.16 | Author Division | Jakob Wasserthal

GoogLeNet

9

Inception module

Szegedy et al., Going Deeper with Convolutions, arXiv, 2014

slide-10
SLIDE 10

02.11.16 | Author Division | Jakob Wasserthal

GoogLeNet

10 stride=1

Szegedy et al., Going Deeper with Convolutions, arXiv, 2014

slide-11
SLIDE 11

02.11.16 | Author Division | Jakob Wasserthal

GoogLeNet

11 stride=1

WxHx256 WxHx256 WxHx256 WxHx256 WxHx256 WxHx(256+256+256+256) = WxHx1024

[Width x Height x Nr of Filters]

Szegedy et al., Going Deeper with Convolutions, arXiv, 2014

slide-12
SLIDE 12

02.11.16 | Author Division | Jakob Wasserthal

GoogLeNet

12

WxHx256 WxHx128 WxHx128 WxHx192 WxHx32 WxHx96 WxHx64 WxHx256 WxHx(128+192+96+64) = WxHx480

stride=1 stride=1

[Width x Height x Nr of Filters]

WxHx256 WxHx256 WxHx256 WxHx256 WxHx256 WxHx(256+256+256+256) = WxHx1024

Szegedy et al., 2014

slide-13
SLIDE 13

02.11.16 | Author Division | Jakob Wasserthal

GoogLeNet

13

Szegedy et al., Going Deeper with Convolutions, arXiv, 2014

VGG GoogLeNet

14x14x512 1x1x1024 1000 1000*1024=1M 7x 7x512=25.088 4094 25088*4094=102M Data dimensions #Parameters 7x7x1024 Data dimensions #Parameters

slide-14
SLIDE 14

02.11.16 | Author Division | Jakob Wasserthal

GoogLeNet

14

Inception module

  • 4M parameters (VGG: 160M)
  • 22 trained layers

Szegedy et al., Going Deeper with Convolutions, arXiv, 2014

slide-15
SLIDE 15

02.11.16 | Author Division | Jakob Wasserthal

Inception v3 - Improvement 1

15

Szegedy et al., Rethinking the Inception Architecture for Computer Vision, arXiv, 2015

Parameters: 5x5-convolution: 5*5=25 2* 3x3-convolution: 2* (3*3)=18 => ~30% less parameters and 
 computations

slide-16
SLIDE 16

02.11.16 | Author Division | Jakob Wasserthal

Inception v3 - Improvement 1

16

Szegedy et al., Rethinking the Inception Architecture for Computer Vision, arXiv, 2015

slide-17
SLIDE 17

02.11.16 | Author Division | Jakob Wasserthal

Inception v3 - Improvement 2

17

Szegedy et al., Rethinking the Inception Architecture for Computer Vision, arXiv, 2015

Parameters: 3x3-convolution: 3*3=9 2* 1x3-convolution: 2* (1*3)=6 => ~33% less parameters and 
 computations

slide-18
SLIDE 18

02.11.16 | Author Division | Jakob Wasserthal

Inception v3 - Improvement 2

18

Szegedy et al., Rethinking the Inception Architecture for Computer Vision, arXiv, 2015

slide-19
SLIDE 19

02.11.16 | Author Division | Jakob Wasserthal

Inception v3 - Improvement 3

19

Representational bottleneck 3x more computations

Szegedy et al., Rethinking the Inception Architecture for Computer Vision, arXiv, 2015

slide-20
SLIDE 20

02.11.16 | Author Division | Jakob Wasserthal

Inception v3 - Improvement 3

20

Representational bottleneck 3x more computations

Szegedy et al., Rethinking the Inception Architecture for Computer Vision, arXiv, 2015

  • No bottleneck
  • 1x computations
slide-21
SLIDE 21

02.11.16 | Author Division | Jakob Wasserthal

Inception v3 - Improvement 3

21

Optimised Inception module

Szegedy et al., Rethinking the Inception Architecture for Computer Vision, arXiv, 2015

slide-22
SLIDE 22

02.11.16 | Author Division | Jakob Wasserthal

Inception v3

22

  • 3.5% top-5 error
  • 42 Layers
  • 2.5x number of parameters of GoogLeNet

Szegedy et al., Rethinking the Inception Architecture for Computer Vision, arXiv, 2015

slide-23
SLIDE 23

02.11.16 | Author Division | Jakob Wasserthal

ILSVRC challenge / ImageNet

23

2012 2013 2014 2015 top-5 
 error

AlexNet
 15.3% ZFNet
 11.2% ResNet
 3.57% Human
 5.1% second best
 26.2% Inception v3
 3.5%

2017

DenseNet
 ~3.5% GoogLeNet
 6.67% VGG
 7.3%

slide-24
SLIDE 24

02.11.16 | Author Division | Jakob Wasserthal

Classification of skin cancer

24

  • Inception v3 pretained on ImageNet
  • Dermatologist-level accuracy

Esteva et al., Dermatologist-level classification of skin cancer with deep neural networks, Nature, 2017

slide-25
SLIDE 25

02.11.16 | Author Division | Jakob Wasserthal

Classification of diabetic retinopathy

25

  • Inception v3 pretained on ImageNet
  • Expert-level accuracy

Gulshan et al., Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs, JAMA, 2016

slide-26
SLIDE 26

02.11.16 | Author Division | Jakob Wasserthal

ILSVRC challenge / ImageNet

26

2012 2013 2014 2015 top-5 
 error

AlexNet
 15.3% ZFNet
 11.2% ResNet
 3.57% Human
 5.1% second best
 26.2% Inception v3
 3.5%

2017

DenseNet
 ~3.5% GoogLeNet
 6.67% VGG
 7.3%

slide-27
SLIDE 27

02.11.16 | Author Division | Jakob Wasserthal

ResNet

27

He et al., Deep Residual Learning for Image Recognition, arXiv, 2015

slide-28
SLIDE 28

02.11.16 | Author Division | Jakob Wasserthal

ResNet

28

He et al., Deep Residual Learning for Image Recognition, arXiv, 2015

slide-29
SLIDE 29

02.11.16 | Author Division | Jakob Wasserthal

ResNet

29

  • 152 Layers
slide-30
SLIDE 30

02.11.16 | Author Division | Jakob Wasserthal

ResNet

30

He et al., Deep Residual Learning for Image Recognition, arXiv, 2015

slide-31
SLIDE 31

02.11.16 | Author Division | Jakob Wasserthal

ILSVRC challenge / ImageNet

31

2012 2013 2014 2015 top-5 
 error

AlexNet
 15.3% ZFNet
 11.2% ResNet
 3.57% Human
 5.1% second best
 26.2% Inception v3
 3.5%

2017

DenseNet
 ~3.5% GoogLeNet
 6.67% VGG
 7.3%

slide-32
SLIDE 32

02.11.16 | Author Division | Jakob Wasserthal

DenseNet

32

Huang et al., Densely Connected Convolutional Networks, CVPR, 2017

slide-33
SLIDE 33

02.11.16 | Author Division | Jakob Wasserthal

DenseNet

33

Huang et al., Densely Connected Convolutional Networks, CVPR, 2017

slide-34
SLIDE 34

02.11.16 | Author Division | Jakob Wasserthal

Challenges in medical image classification

34

Source: The Radiology Assistant : Bi-RADS for Mammography and Ultrasound 2013

  • few training data
  • no RGB images
  • small lesions
  • big images
  • interpretability
slide-35
SLIDE 35

02.11.16 | Author Division | Jakob Wasserthal

Interpretability of predictions

35

p(normal) p(diabetic)

0.98 0.02

?

A deep neural network is often considered as a “black box”.

slide-36
SLIDE 36

02.11.16 | Author Division | Jakob Wasserthal

Interpretability of predictions

36

“What parts of the input image affect the decision?”

Gulshan et al., Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs, 2016

slide-37
SLIDE 37

02.11.16 | Author Division | Jakob Wasserthal

Recap: Training via Backpropagation

37

0.98 0.02

dc dwij wij

pdog(x) pcat(x)

c = −log(pdog(x))

x

Slides by courtesy of Paul Jäger

slide-38
SLIDE 38

02.11.16 | Author Division | Jakob Wasserthal

Saliency maps

38

pdog(x)

0.98 0.02

“What parts of the input image affect the decision?” “backprop into image”:

xij

pcat(x)

dpdog(x) dxij

Slides by courtesy of Paul Jäger

slide-39
SLIDE 39

02.11.16 | Author Division | Jakob Wasserthal

Saliency maps

39

xij dpdog(x) dxij

Slides by courtesy of Paul Jäger

slide-40
SLIDE 40

02.11.16 | Author Division | Jakob Wasserthal

Interpretability of predictions

40

Jamaludin et al., SpineNet: Automated classification and evidence visualization in spinal MRIs, Medical image analysis, 2017

slide-41
SLIDE 41

02.11.16 | Author Division | Jakob Wasserthal

Questions

41

slide-42
SLIDE 42

02.11.16 | Author Division | Jakob Wasserthal

Backup

42

slide-43
SLIDE 43

02.11.16 | Author Division | Jakob Wasserthal

Advanced: Saliency via Perturbation

43

Trick: Backprop into a mask m multiplied with the image to be the “minimal destroying region”. “Interpretable Explanations of Black Boxes by Meaningful Perturbation” Ruth et al., arXiv, 2018

d[w ∗ (x ∗ m)] dm = w ∗ x

slide-44
SLIDE 44

02.11.16 | Author Division | Jakob Wasserthal

Saliency via Perturbation

44

network training: image perturbation: vs

  • bject: find the smallest destroying region.

w

ij = wij − αdcdog(p(x))

dwij m

ij = mij − αdc∗(pdog(x), m)

dmij

c∗ = λ1 kmk + pdog(φ(x; m)) + λ2TV (m)

slide-45
SLIDE 45

02.11.16 | Author Division | Jakob Wasserthal

Saliency via Perturbation

45

Avoid high frequency artefacts by enforcing a smooth structure:

c∗ = λ1 k1 mk + pdog(φ(x; m)) + λ2TV (m)

slide-46
SLIDE 46

02.11.16 | Author Division | Jakob Wasserthal

Saliency via Perturbation

46

result: ability to verify the underlying functionality

slide-47
SLIDE 47

02.11.16 | Author Division | Jakob Wasserthal

Why can CNNs be fooled so easily?

47

  • Trake wrong class probability as cost function
  • Backprop into image -> Gradients for optimal “fooling”
  • Optimization on image pixels

(source: Fei-Fei Li & Justin Johnson & Serena Young, cs231n 2017, Lecture 12)

slide-48
SLIDE 48

02.11.16 | Author Division | Jakob Wasserthal

Why can CNNs be fooled so easily?

48

“Primary cause of NN’s vulnerability to adversarial perturbations is their [piecewise] linear nature”

(Explaining and Harnessing Adversarial Examples, Goodfellow et al., 2015) (source: Ian Goodfellow, cs231n 2017, Lecture 16)

ReLU Sigmoid