Unsupervised learning in medical imaging Discovering phenotypes and - - PowerPoint PPT Presentation

unsupervised learning in medical imaging
SMART_READER_LITE
LIVE PREVIEW

Unsupervised learning in medical imaging Discovering phenotypes and - - PowerPoint PPT Presentation

Unsupervised learning in medical imaging Discovering phenotypes and detecting anomalies Johannes Hofmanninger many slides by Georg Langs Medical University of Vienna Department for Biomedical Imaging and Image-guided Therapy Computational


slide-1
SLIDE 1

Johannes Hofmanninger

many slides by Georg Langs

Medical University of Vienna Department for Biomedical Imaging and Image-guided Therapy Computational Imaging Research Lab www.cir.meduniwien.ac.at

Unsupervised learning in medical imaging

Discovering phenotypes and detecting anomalies

slide-2
SLIDE 2

Computational Imaging Research Lab

CIR Lab Department of Biomedical Imaging and Image Guided Therapy Medical University of Vienna

  • Neuroimaging
  • Computer Aided Diagnosis
  • Machine Learning / Data analysis

www.cir.meduniwien.ac.at

Johannes Hofmanninger www.cir.meduniwien.ac.at

2

slide-3
SLIDE 3

Outline

  • Machine learning in medical imaging
  • Image segmentation
  • Detection of disease
  • Prediction of disease course
  • Unsupervised learning
  • Deep generative models
  • (Variational) Auto encoders
  • GANs
  • Big Data/Routine medical imaging data

Johannes Hofmanninger www.cir.meduniwien.ac.at

3

slide-4
SLIDE 4

Applications in medical imaging

  • Image Segmentation
  • Detection of disease patterns
  • Prediction of disease course

Johannes Hofmanninger www.cir.meduniwien.ac.at

4

slide-5
SLIDE 5

Applications in medical imaging

  • Image Segmentation
  • Detection of disease patterns
  • Prediction of disease course

Johannes Hofmanninger www.cir.meduniwien.ac.at

5

slide-6
SLIDE 6

Segmentation

  • Segmentation of anatomical structures
  • Liver, Brain, Vessels, Cells, Bones, …

2015 Olaf Ronneberger et al.

U-NET

Johannes Hofmanninger www.cir.meduniwien.ac.at

6

slide-7
SLIDE 7

Applications in medical imaging

  • Image Segmentation
  • Detection of disease patterns
  • Prediction of disease course

Johannes Hofmanninger www.cir.meduniwien.ac.at

7

slide-8
SLIDE 8

Detecting disease patterns

  • Training data needs experts
  • We can learn with minimal supervision
  • Using pretrained network
  • Unsupervised learning

Inject unlabelled data to improve representation

Have a small set of labelled data to train classification [Thomas Schlegl et al.]

Johannes Hofmanninger www.cir.meduniwien.ac.at

8

slide-9
SLIDE 9

Pretraining

  • We can pretrain lower layers in an

unsupervised fashion: objective is to become very good at representing the data

  • 1. pretrain
  • 2. finetune
  • It turns out you can transfer this across

visual problems: the lower the layer, the easier the transfer

  • Currently many approaches start from a

pretrained network, and the optimize for a specific problem

Figures from Gonzales & Woods Chap 13

Johannes Hofmanninger www.cir.meduniwien.ac.at

9

slide-10
SLIDE 10

Lung pattern classification

[Schlegl et al. MICCAI-MCV 2014]

Johannes Hofmanninger www.cir.meduniwien.ac.at

10

slide-11
SLIDE 11

Quantify multi-modal imaging markers: breast imaging

Internal Medicine / Oncology, Dep. Surgery, Dep. Biomed. Imag. & Img-guided Th., Molecular and Gender Imaging

Multi-modal, multi-parametric imaging Breast lesion detection and segmentation Probability map Computational segmentation

Johannes Hofmanninger www.cir.meduniwien.ac.at

11

slide-12
SLIDE 12

Applications in medical imaging

  • Image Segmentation
  • Detection of disease patterns
  • Prediction of disease course

Johannes Hofmanninger www.cir.meduniwien.ac.at

12

slide-13
SLIDE 13

High-level objective: predict individual disease course

Now

Prediction

Image/report s Lab reports

Clinical informatio n

Johannes Hofmanninger www.cir.meduniwien.ac.at

13

slide-14
SLIDE 14

… predict individual treatment response

Treatment

Prediction

Image/report s Lab reports

Clinical informatio n

Johannes Hofmanninger www.cir.meduniwien.ac.at

14

slide-15
SLIDE 15

Retinal disease

Johannes Hofmanninger www.cir.meduniwien.ac.at

15

slide-16
SLIDE 16

Predict progression and response

Time

?

  • Can we predict outcome from available information?
  • Can we predict course of disease and treatment?
  • Identify the predictive features

[Vogl et al. 2015]

OCT-Scan Macula Edema

Johannes Hofmanninger www.cir.meduniwien.ac.at

16

slide-17
SLIDE 17

Predict progression and response

Data we observe Future we want to predict

Johannes Hofmanninger www.cir.meduniwien.ac.at

17

slide-18
SLIDE 18

Predict progression and response

Predict if recurrence

  • ccurs

Predict time to recurrence to ensure timely treatment

Predicting recurrence and time of recurrence

Signatures

Vogl 2017, Bogunovic 2017

Johannes Hofmanninger www.cir.meduniwien.ac.at

18

slide-19
SLIDE 19

Identify predictive markers in the data

Predicting recurrence and time of recurrence

Signatures

Most informative region: predictive signature of response Vogl 2017, Bogunovic 2017

Identify predictive markers

Johannes Hofmanninger www.cir.meduniwien.ac.at

19

slide-20
SLIDE 20

Learn from large-scale population - and lots of data

Integrate multivariate data: we need AI to link observation to prediction

Treatment Imaging Genomics Clinical information …

Johannes Hofmanninger www.cir.meduniwien.ac.at

20

slide-21
SLIDE 21

What is a diagnosis? A name for an observation

Treatment decision We give a name to this set of

  • bservations if it happens
  • ften and is useful during

decision making In supervised learning we use known diagnosis, or markers as labels

Johannes Hofmanninger www.cir.meduniwien.ac.at

21

slide-22
SLIDE 22

What is outcome?

?

Johannes Hofmanninger www.cir.meduniwien.ac.at

22

slide-23
SLIDE 23

Limitations of supervision: scale vs. effort It‟s a lot of expert work

Johannes Hofmanninger www.cir.meduniwien.ac.at

23

slide-24
SLIDE 24

Limitations of supervision: interreader variability

  • Sometimes the labels are

not very reliable

  • Interobserver agreement

even with experts can be low

  • Possible reasons:
  • Patterns are difficult to

assess or detect

  • The definition of

patterns/names is hard to replicate

Curtesy Helmut Prosch, results: Walsh S et al. Thorax. 2016 Jan;71(1):45-51.

Johannes Hofmanninger www.cir.meduniwien.ac.at

24

slide-25
SLIDE 25

A few objectives of un-supervised learning

  • Find structure in the data: find groups of patients with similarly progression or

response paths

  • Identify relationships between different variables
  • Extend the vocabulary of markers or signatures we use for prediction
  • Revisit and advance diagnostic categories
  • Semi-supervised learning: exploit both un-labeled and labeled data
  • Weakly-supervised learning: exploit clinical information that exists

Johannes Hofmanninger www.cir.meduniwien.ac.at

25

slide-26
SLIDE 26

Exploiting large-scale medical data

One patient Identifying predictive markers in heterogeneous clinical routine data

Johannes Hofmanninger www.cir.meduniwien.ac.at

26

slide-27
SLIDE 27

Supervised / Unsupervised

Langs et al. 2018 Radiologe

Johannes Hofmanninger www.cir.meduniwien.ac.at

27

slide-28
SLIDE 28

Ian Goodfellow 2016 - GAN Tutorial - https://arxiv.org/abs/1701.00160

Generative Models

Johannes Hofmanninger www.cir.meduniwien.ac.at

28

slide-29
SLIDE 29

29

Generative modelling

𝑞𝑒𝑏𝑢𝑏

  • We train a model from samples drawn

from a distribution:

  • It learns an estimate of this

distribution:

  • Task 1: learns explicit representation of
  • Task 2: learns to generate samples from

𝑞𝑒𝑏𝑢𝑏 𝑞𝑛𝑝𝑒𝑓𝑚 𝑞𝑛𝑝𝑒𝑓𝑚 𝑞𝑛𝑝𝑒𝑓𝑚

Johannes Hofmanninger www.cir.meduniwien.ac.at

A distribution in the data / observation space:

slide-30
SLIDE 30

Generative modelling

30

𝑞𝑒𝑏𝑢𝑏

  • We train a model from samples drawn

from a distribution:

  • It learns an estimate of this

distribution:

  • Task 1: learns explicit representation of
  • Task 2: learns to generate samples from

𝑞𝑒𝑏𝑢𝑏 𝑞𝑛𝑝𝑒𝑓𝑚 𝑞𝑛𝑝𝑒𝑓𝑚 𝑞𝑛𝑝𝑒𝑓𝑚

A distribution in the data / observation space:

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-31
SLIDE 31

Generative modelling

31

𝑞𝑛𝑝𝑒𝑓𝑚

A distribution in the data / observation space: 𝑞𝑒𝑏𝑢𝑏

  • We train a model from samples drawn

from a distribution:

  • It learns an estimate of this

distribution:

  • Task 1: learns explicit representation of
  • Task 2: learns to generate samples from

𝑞𝑒𝑏𝑢𝑏 𝑞𝑛𝑝𝑒𝑓𝑚 𝑞𝑛𝑝𝑒𝑓𝑚 𝑞𝑛𝑝𝑒𝑓𝑚

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-32
SLIDE 32

Generative modelling

32

𝑞𝑛𝑝𝑒𝑓𝑚

A distribution in the data / observation space: 𝑞𝑒𝑏𝑢𝑏

  • We train a model from samples drawn

from a distribution:

  • It learns an estimate of this

distribution:

  • Task 1: learns explicit representation of
  • Task 2: learns to generate samples from

𝑞𝑒𝑏𝑢𝑏 𝑞𝑛𝑝𝑒𝑓𝑚 𝑞𝑛𝑝𝑒𝑓𝑚 𝑞𝑛𝑝𝑒𝑓𝑚

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-33
SLIDE 33

Maximum likelihood

  • Given training data and a model with model

parameters, we choose the parameters to maximize the likelihood of the training examples

33

𝑞𝑛𝑝𝑒𝑓𝑚(𝑦; 𝜄) 𝜄∗ = 𝑏𝑠𝑕𝑛𝑏𝑦

𝜄

𝑗=1 𝑛

𝑞𝑛𝑝𝑒𝑓𝑚(𝑦(𝑗); 𝜄)

model distribution parameters of the model training examples

𝑦(𝑗)

A distribution in the data / observation space: 𝑞𝑒𝑏𝑢𝑏

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-34
SLIDE 34

Maximum likelihood

  • Given training data and a model with model

parameters, we choose the parameters to maximize the likelihood of the training examples

34

𝑞𝑛𝑝𝑒𝑓𝑚(𝑦; 𝜄) 𝜄∗ = 𝑏𝑠𝑕𝑛𝑏𝑦

𝜄

𝑗=1 𝑛

𝑞𝑛𝑝𝑒𝑓𝑚(𝑦(𝑗); 𝜄)

𝑦(𝑗)

A distribution in the data / observation space: 𝑞𝑒𝑏𝑢𝑏

𝜄∗ = 𝑏𝑠𝑕𝑛𝑏𝑦

𝜄

log⁡ (𝑞𝑛𝑝𝑒𝑓𝑚(𝑦 𝑗 ; 𝜄))

𝑛 𝑗=1

In practice …. log-space:

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-35
SLIDE 35

Explicit density models

  • Explicit representation of the model density
  • Examples:
  • Gaussian mixture models
  • Variational autoencoders

35

𝑞𝑛𝑝𝑒𝑓𝑚(𝑦; 𝜄) 𝑞𝑒𝑏𝑢𝑏(𝑦) ~

Explicit density Implicit density Maximum likelihood

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-36
SLIDE 36

Implicit density models

  • Implicit density model that can

generate samples from its density

  • Examples: GANs, GSN

36

Explicit density Implicit density Maximum likelihood

𝑨

𝑞𝑛𝑝𝑒𝑓𝑚(𝑦; 𝜄) 𝑞𝑒𝑏𝑢𝑏(𝑦) ~

Generator

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-37
SLIDE 37

Example: Clustering / Gaussian mixture model (GMM)

37

Observations Model distribution

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-38
SLIDE 38

Taxonomy of generative models

38

Taxonomy following I. Goodfellow 2016

Explicit density Implicit density Maximum likelihood Approximate density Variational Markov Chain Tractable density Markov Chain Direct

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-39
SLIDE 39

Taxonomy of generative models

39

Explicit density Implicit density Maximum likelihood Approximate density Variational Markov Chain Tractable density Markov Chain Direct

Variational Autoencoder

Taxonomy following I. Goodfellow 2016

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-40
SLIDE 40

Taxonomy of generative models

40

Taxonomy following I. Goodfellow 2016

Explicit density Implicit density Maximum likelihood Approximate density Variational Markov Chain Tractable density Markov Chain Direct

GAN

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-41
SLIDE 41

Auto Encoders

Ian Goodfellow 2016 - GAN Tutorial - https://arxiv.org/abs/1701.00160

Johannes Hofmanninger www.cir.meduniwien.ac.at

41

slide-42
SLIDE 42

Convolutional Neural Network

Johannes Hofmanninger www.cir.meduniwien.ac.at

42

slide-43
SLIDE 43

Autoencoder

Johannes Hofmanninger www.cir.meduniwien.ac.at

43

Figure from [Guo et al. 2017]

slide-44
SLIDE 44

Autoencoder

High-dimensional Image representation Low dimensional representation (bottleneck neurons) Encoder Decoder 𝑦 𝑦 𝑦 𝑁𝑇𝐹(𝑦, 𝑦 ) Loss function

Johannes Hofmanninger www.cir.meduniwien.ac.at

44

slide-45
SLIDE 45

Stacked Autoencoder

Layerwise pretraining Finetuning

Johannes Hofmanninger www.cir.meduniwien.ac.at

45

slide-46
SLIDE 46

#TBT ... Lung pattern classification

[Schlegl et al. MICCAI-MCV 2014]

Johannes Hofmanninger www.cir.meduniwien.ac.at

46

slide-47
SLIDE 47

Example: faces

47

Input Output autoencoder (30 dim) Output PCA (30 dim) Figure from [Hinton & Salakhutdinov Science 2006]

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-48
SLIDE 48

The code layer represents structure

  • Autoencoder: 784-1000-500-250-2 layers.

48

Figure from [Hinton & Salakhutdinov Science 2006]

Look at the code layer

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-49
SLIDE 49

Variational Autoencoder

49

Johannes Hofmanninger www.cir.meduniwien.ac.at

Loss function: 𝑁𝑇𝐹 𝑦, 𝑦 + 𝛾⁡ 𝐿𝑀(𝑟𝑘(𝑨 𝑦) 𝑂(0,1))

𝑘

reconstruction property of latent space

slide-50
SLIDE 50

Variational Autoencoder

Generative Model we can sample new cases

Johannes Hofmanninger www.cir.meduniwien.ac.at

50

slide-51
SLIDE 51

Generative adversarial networks

Goodfellow et al. 2014 NIPS - arXiv:1406.2661 Ian Goodfellow 2016 - GAN Tutorial - arXiv:1701.00160

Johannes Hofmanninger www.cir.meduniwien.ac.at

51

slide-52
SLIDE 52

A generative model: generates observations from a latent variable

52

𝐇(⋅; 𝜄𝐻) implicitely defines a model distribution 𝑞𝑛𝑝𝑒𝑓𝑚(𝑦; 𝜄𝐻)

has a latent prior in the z space 𝑨 ∼ 𝑞𝑨(𝑨), 𝑨 ∈ 𝒶

𝑨

z x

Generator: G

𝐇: 𝒶 → 𝒴 𝐇: 𝐴 ↦ 𝐲

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-53
SLIDE 53

A generative model: generates observations from a latent variable

53

Generator: G How do we train it to become good at sampling? Game: The Generator generates fakes The Discriminator has to tell fakes and real examples apart

𝐇: 𝒶 → 𝒴 𝐇: 𝐴 ↦ 𝐲

z x

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-54
SLIDE 54

A generative model: generates observations from latent variable

54

Generator: G

𝐇: 𝒶 → 𝒴 𝐄(⋅; 𝜄𝐸) scores how fake/real a sample looks like

Discriminator: D

𝐄: 𝒴 → ℝ

z x x d

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-55
SLIDE 55

Adsversarial learning

55

Latent variable Observed variable

𝜄(𝐸) Discriminator: D Parameters: Cost function: 𝐾(𝐸)(𝜄(𝐸), 𝜄(𝐻)) 𝜄(𝐻) Generator: G Parameters: Cost function: 𝐾(𝐻)(𝜄(𝐸), 𝜄(𝐻))

e.g., image real or faked image Decision: is the input real or fake

Both, generator and discriminator are differentiable

z x x d

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-56
SLIDE 56

Discriminator: D Parameters: Cost function:

Adversarial learning

56

𝜄(𝐸) 𝐾(𝐸)(𝜄(𝐸), 𝜄(𝐻)) 𝜄(𝐻) Generator: G Parameters: Cost function: 𝐾(𝐻)(𝜄(𝐸), 𝜄(𝐻))

The discriminator learns to discriminate between real examples and generated samples . Minimize J(D) through changing 𝜄(𝐸)

? z x x d

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-57
SLIDE 57

Discriminator: D Parameters: Cost function:

Adversarial learning

57

𝜄(𝐸) 𝐾(𝐸)(𝜄(𝐸), 𝜄(𝐻)) 𝜄(𝐻) Generator: G Parameters: Cost function: 𝐾(𝐻)(𝜄(𝐸), 𝜄(𝐻))

Its primary purpose is to provide the cost function of the generator with a reward function to evaluate its quality

? z x x d

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-58
SLIDE 58

Discriminator: D Parameters: Cost function:

Adversarial learning

58

𝜄(𝐸) 𝐾(𝐸)(𝜄(𝐸), 𝜄(𝐻)) 𝜄(𝐻) Generator: G Parameters: Cost function: 𝐾(𝐻)(𝜄(𝐸), 𝜄(𝐻))

?

The generator learns to generate samples that are hard to discern from real examples. Its cost function is penalized by the discriminator.

z x x d

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-59
SLIDE 59

Generator: G Parameters: Cost function: Discriminator: D Parameters: Cost function:

Training: simultaneous stochastic gradient descent (SDG)

59

𝜄(𝐸) 𝐾(𝐸)(𝜄(𝐸), 𝜄(𝐻)) 𝜄(𝐻) 𝐾(𝐻)(𝜄(𝐸), 𝜄(𝐻))

2 minibatches of samples:

  • z values drawn from model prior in z space

generating x

  • x from the training example set

Two gradient steps:

  • Update to get better at discriminating

generated from real data

  • Update to minimize which can

be e.g., 𝜄(𝐸) 𝜄(𝐻) 𝐾(𝐻) 𝐾(𝐻) = −𝐾(𝐸)

z x x d

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-60
SLIDE 60

Training

60

G D

Random samples in z space Generated “faked”

  • bservations x

Real examples x real / fake

𝜄(𝐸) 𝜄(𝐻)

(Sampled from prior)

argmin

𝜄(𝐻)

max

𝜄(𝐸) 𝑊(𝜄(𝐸), 𝜄(𝐻))

A minimax game using a value function

𝑊(𝜄(𝐸), 𝜄(𝐻)) = −𝐾(𝐸)(𝜄(𝐸), 𝜄(𝐻))

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-61
SLIDE 61

Training to reach equilibrium

61

z x x d

This is a game, where each player wishes to minimize the cost function that depends on parameters of both players, while only having control over its own parameters. The solution to this game is a Nash equilibrium A Nash equilibrium is a tuple so that is a local minimum w.r.t. , and is a local minimum w.r.t.

(𝜄(𝐸), 𝜄(𝐻))

𝐾(𝐸) 𝜄(𝐸) 𝐾(𝐻) 𝜄(𝐻)

𝑒 𝐲 𝐴 𝐲

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-62
SLIDE 62

Example: adversarial learning in 1D

62

Figure from Goodfellow et al. 2014 Generative Adversarial Nets arXiv:1406.2661

𝑞𝑒𝑏𝑢𝑏 𝑞𝑛𝑝𝑒𝑓𝑚

Init Updated D Updated G Equilibrium

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-63
SLIDE 63

Deep Convolutional GANs (DCGAN)

63

Goodfellow et al. 2014; Radford et al. 2015

z

4x4x1024 8x8x512 16x16x256 32x32x128 64x64x3

x

DeConv DeConv DeConv DeConv

d

Generator Discriminator

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-64
SLIDE 64

Learning the distribution of data

  • We learn a manifold of plausible
  • We can produce plausible data

64

[Goodfellow et al. 2014 Generative Adversarial Nets] Figure from [Karras et al. 2017]

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-65
SLIDE 65

Disentangling concepts - vector arithmetic in z space

65

Radford et al. 2015

In z-space, vector arithmetic is feasible to some extent

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-66
SLIDE 66

Conditional GANs - cGANs

  • A condition c as an additional input to

both the generator and the discriminator

66

𝑞(𝑦 𝑑) 𝐇: 𝒶 → 𝒴 𝐇: 𝐴 ↦ 𝐲 𝐇: (𝒶, 𝒟) → 𝒴 𝐇: (𝐴, 𝐝) ↦ 𝐲 𝐄: (𝐲, 𝐝) ↦ 𝑒

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-67
SLIDE 67

conditional GAN

67

𝑒 𝐲 𝐝 𝐲 𝐴 𝐝

Generator Discriminator

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-68
SLIDE 68

Conditional GANs: image generation from labels

68

Odena et al. 2016 - Conditional image synthesis - arXiv:1610.09585

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-69
SLIDE 69

www.cir.meduniwien.ac.at Georg Langs

Image to image translation

Isola et al. 2016 Image to Image Translation - https://arxiv.org/abs/1611.07004

  • Map from image c to image x.
  • Use an image c as the condition for

the generator and discriminator

𝑒 (𝐲, 𝐝)

Encoder - decoder

𝐝 𝐲

U-net style skip connections

𝐝 𝐲

Discriminator

69

slide-70
SLIDE 70

Image to image translation: label map to image

70

Isola et al. 2016 Image to Image Translation - https://arxiv.org/abs/1611.07004

https://phillipi.github.io/pix2pix/

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-71
SLIDE 71

Problem: mode collapse

71

Metz et al. 2016 Instead of covering the entire data distribution, the generator has extremely reduced output diversity … hopping from one narrow area to the next while the discriminator catches up Arjovsky et al. 2017

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-72
SLIDE 72

Wasserstein GANs - WGANs

  • Critic instead of discriminator: instead of

divergence, we use an approximation of the earth movers distance

  • If the data is actually on a low-dimensional

manifold, divergence can saturate, and gradients can vanish

  • Wasserstein distance as EM distance

approximation does not suffer from this

  • Less prone to mode collapse

72

Arjovsky et al 2017 - Theory - arXiv:1701.04862 Figure from Arjovsky et al 2017 Wasserstein GAN - arXiv:1701.07875v3

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-73
SLIDE 73

Detecting anomalies with GANs

https://www.cir.meduniwien.ac.at/team/thomas-schlegl Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Work by Thomas Schlegl et al.

Johannes Hofmanninger www.cir.meduniwien.ac.at

73

slide-74
SLIDE 74

Detect anomalies by having a good model of normal

74

Unseen data Anomalies

Model Model

Normal data Look at residual

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-75
SLIDE 75

Normality mapping of a query image

75

Generator G

z

G(z) µ(x)

Anomalous Normal

z

G(z) µ(x)

Query image Generated image

Loss

backpropagation

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-76
SLIDE 76

Normality mapping: Ingredient 1

76

Query image

Residual loss ℒ𝑆 𝒜Γ

ℒ𝑆 𝒜𝛿 = ∑ 𝒚 − 𝐻 𝒜𝛿

ℒ𝑆 𝒜𝛿

G

z

G(z)

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-77
SLIDE 77

Normality mapping: Ingredient 2

77

Discrimination loss ℒ𝐸

𝒜Γ

ℒ𝐸

𝒜𝛿 = −log 𝐸 𝐻 𝒜

Query image

D

G

z

G(z)

ℒ𝐸

𝒜𝛿

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-78
SLIDE 78

Normality mapping: Ingredient 2 (revised)

78

ℒ𝐸 𝒜𝛿 = ∑ f 𝒚 − f 𝐻 𝒜𝛿

Query image

Discrimination loss ℒ𝐸 𝒜Γ ℒ𝐸 𝒜𝛿 D

G

z

G(z)

Feature matching [Salimans et al., 2016] Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-79
SLIDE 79

Normality mapping: Combined loss function

79

Combined loss ℒ 𝒜𝛿 = 1 − 𝜇 ∙ ℒ𝑆 𝒜𝛿 + 𝜇 ∙ ℒ𝐸 𝒜𝛿

Query image

D

G

z

G(z)

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-80
SLIDE 80

Anomaly detection

  • 1. Anomaly score :

Detection of anomalous images

  • residual score and
  • discrimination score
  • 2. Residual image:

Detection of anomalous regions within images

80

𝒚𝑆 = 𝒚 − 𝐻 𝒜Γ

A(x)=(1-λ)∙R(x)+λ∙D(x) ‘anomalous’ ‘normal’

?

𝒚 𝒚𝑆 𝐻 𝒜Γ

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-81
SLIDE 81

Experiments – Data

81

  • Unsupervised GAN Training:
  • 270 OCT scans of „healthy’ subjects (non-fluid)
  • 1.000.000 2D image patches
  • Testing – Detecting anomalies:
  • 10 OCT „healthy’ ; 10 OCT „pathological’ (macular fluid)
  • In total: 8.192 image patches

Preprocessing Input data

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-82
SLIDE 82

Training process

82

iter 1

epoch 3 epoch 5 epoch 10 epoch 20

iter 1.000 iter 16.000

epoch 1 Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-83
SLIDE 83

Can the model generate realistic images?

83

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-84
SLIDE 84

Can the model generate similar images?

84

Training set (normal) Test set (normal) Test set (diseased)

Generated image Query image

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-85
SLIDE 85

Pixel-level detection of anomalies

85

Training set (normal) Test set (normal) Test set (diseased)

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-86
SLIDE 86

Pixel-level detection of anomalies

86

Anomalous

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-87
SLIDE 87

Pixel-level detection of anomalies

87

Normal

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-88
SLIDE 88

Image-level detection of anomalies: Anomaly score components

88

ROC Residual score Discrimination score

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-89
SLIDE 89

Image-level detection of anomalies: Model comparison

89

ROC

PD

D

GANR

G D

aCAE

D CAE

[Pathak et al., 2016]

G D

AnoGAN

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/1703.05921

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-90
SLIDE 90

Identifying phenotypes Routine Radiology Imaging Data

Johannes Hofmanninger www.cir.meduniwien.ac.at

90

slide-91
SLIDE 91

The reason for big data analytics

  • Why real world datasets?
  • Learn from a representative sample to

identify robust marker patterns

  • Capture natural variability
  • High variability reality but limited training set
  • Nobody has time to annotate 1000s of cases
  • Inter-rater concordance may be low

Johannes Hofmanninger www.cir.meduniwien.ac.at

91

slide-92
SLIDE 92

Typical study data

  • 100 Cases (10MB/case)
  • Carefully selected
  • Evaluated
  • Annotated
  • Homogeneous cohorts

Johannes Hofmanninger www.cir.meduniwien.ac.at

92

slide-93
SLIDE 93

Collected within one month...

>4TB CT/MR Data

Johannes Hofmanninger www.cir.meduniwien.ac.at

93

slide-94
SLIDE 94

Handle heterogeneity in real life data: correspondence

  • Algorithmic localization of anatomical structures
  • Mapping and comparison of positions across individuals
  • Tracking of positions over time

Hofmanninger et al. 2017

Johannes Hofmanninger www.cir.meduniwien.ac.at

94

slide-95
SLIDE 95

Multi-template normalization

Hofmanninger et al. 2017

Johannes Hofmanninger www.cir.meduniwien.ac.at

95

slide-96
SLIDE 96

9

… in clinical data: imaging + semantic information.

Rich but unstructured information

Johannes Hofmanninger www.cir.meduniwien.ac.at

96

slide-97
SLIDE 97

Reports as weak annotations

[Thomas Schlegl et al.]

Johannes Hofmanninger www.cir.meduniwien.ac.at

97

slide-98
SLIDE 98

Linking semantics and imaging to map reported markers to new images

  • Machine learning can extract

structured information from unstructured reports

  • Link this to imaging data
  • Algorithms can learn maps of

findings only based on imaging data an reports

Hofmanninger, Langs 2015

Johannes Hofmanninger www.cir.meduniwien.ac.at

98

slide-99
SLIDE 99

Mapping report terms to imaging data

Algorithm

Image information can be used to capture variability in the data

Hofmanninger, Langs 2015

Johannes Hofmanninger www.cir.meduniwien.ac.at

99

slide-100
SLIDE 100

Mapping report terms to imaging data

Expert

Hofmanninger, Langs 2015

Image information can be used to capture variability in the data

Johannes Hofmanninger www.cir.meduniwien.ac.at

100

slide-101
SLIDE 101

Search based on learned features

Semantic re-mapping of features improves retrieval accuracy. It links the visual representation closer to diagnostically relevant categories

[Hofmanninger et al. CVPR 2015]

Johannes Hofmanninger www.cir.meduniwien.ac.at

101

slide-102
SLIDE 102
  • Why
  • Discover hidden disease subtypes
  • Discover unknown imaging markers
  • Fully unsupervised
  • Data driven grouping of patients

Can we detect patient subgroups?

Johannes Hofmanninger www.cir.meduniwien.ac.at

102

slide-103
SLIDE 103

Local low-level features / Bag of visual words

Query Neirest neighbors

Haralick 3DSIFT

Query Neirest neighbors

Johannes Hofmanninger www.cir.meduniwien.ac.at

103

slide-104
SLIDE 104

Volumes leading to highest activation in bottleneck-neurons Deep stacked autoencoder

Johannes Hofmanninger www.cir.meduniwien.ac.at

104

slide-105
SLIDE 105

Population Landscape

5000 mapped into a metric space based on visual cues in volumetric lung-CT

Hofmanninger et al. 2016

Johannes Hofmanninger www.cir.meduniwien.ac.at

105

slide-106
SLIDE 106

Population phenotypes

Hofmanninger et al.

Johannes Hofmanninger www.cir.meduniwien.ac.at

106

slide-107
SLIDE 107

Identify phenotypes: clusters in the routine population

Terms in reports

Data-driven signatures of individuals reveal clusters of patients

Hofmanninger et al. 2016, Hofmanninger et al. 2017

N = 5000 chest CT volumes

Johannes Hofmanninger www.cir.meduniwien.ac.at

107

slide-108
SLIDE 108

Imaging phenotypes encode clinical information

Johannes Hofmanninger www.cir.meduniwien.ac.at

108

slide-109
SLIDE 109

Two LUTX phenotypes

109

Johannes Hofmanninger www.cir.meduniwien.ac.at

slide-110
SLIDE 110

Two LUTX phenotypes

Cystic fibrosis Other specified disease of pancreas Effusion Phenotype I Phenotype II Survival Acute respiratory failure Atelectasis Ground glass opacity Pneumonia Chronic kidney dis. LUTX Congestion Acute renal failure Sepsis

Johannes Hofmanninger www.cir.meduniwien.ac.at

110

slide-111
SLIDE 111

Outlook

  • We will have more powerful marker patterns and prediction models at our

disposal

  • We will discover novel disease and response phenotypes
  • Diagnosis categories might loose relevance, while a more continuous landscape
  • f predictive phenotype patterns drives individual treatment decisions
  • We will use imaging technologies more effectively to predict disease and

response

  • These markers will be available to a broader population of patients

www.cir.meduniwien.ac.at

Johannes Hofmanninger www.cir.meduniwien.ac.at

111

slide-112
SLIDE 112

112

Questions?

Johannes Hofmanninger www.cir.meduniwien.ac.at