Deep Learning Theory with Application to Cancer Research Leonie - - PowerPoint PPT Presentation

deep learning theory with application to cancer research
SMART_READER_LITE
LIVE PREVIEW

Deep Learning Theory with Application to Cancer Research Leonie - - PowerPoint PPT Presentation

Deep Learning Theory with Application to Cancer Research Leonie Zeune, Stephan van Gils, Guus van Dalum, Leon Terstappen, Christoph Brune Inverse Problems and Machine Learning, Pasadena, USA, Feb 9-11, 2018 Deep Learning as a Black Box


slide-1
SLIDE 1

Deep Learning Theory with Application to Cancer Research

Leonie Zeune, Stephan van Gils, Guus van Dalum, Leon Terstappen, Christoph Brune

Inverse Problems and Machine Learning, Pasadena, USA, Feb 9-11, 2018

slide-2
SLIDE 2

Deep Learning as a Black Box

April 2017: ”No one really knows how the most advanced algorithms do what they do. That could be a problem.” June 2017: ”Artificial intelligence is a black box that thinks in ways we don’t understand. That’s thrilling and scary.” Breakthrough Technologies 2013 Deep Learning 2017 Reinforcement Learning Google Trend Benchmark 2010-2017 2010 2012 2014 2016

slide-3
SLIDE 3

Closing the Gap between Math and Machine Learning Machine Learning Mathematics

Calculus of Variations

Partial Differential Equations Inverse Problems Optimization Uncertainty Quantification Regularization MODEL-BASED

Theory

DATA-BASED

Application

Graphs / Networks Deep Learning Big Data Biomedical Imaging Segmentation Clustering Classification Life Sciences

slide-4
SLIDE 4

Deeper Insights into Deep Inversion

slide-5
SLIDE 5

Deep Learning for Inverse Problems

Inverse Problem: Ku = f

Supervised Learning (u∗, f ∗) available

◮ Learning variational networks ◮ Learning unrolled, proximal schemes (LISTA, learned PD)

Semi-supervised Learning u∗ available

min

θ

[D(Ku, f) − log(µθ(u))], with µθ = Pθ(U = u)

Unsupervised Learning f ∗ available

min

θ

Ef[K(K †

θ (f)) − f] ◮ related:

Autoencoder (AE), i.e. K TK(u) ≈ u and GANs

slide-6
SLIDE 6

Challenges in Deep Learning

Mathematical / ML questions:

◮ Network architecture: Which activation functions (nonlinearities, norms)

should be used? What is the importance of depth (scale), width (fully connected) and convolution (diffusion)?

◮ Network as a generalized ODE: How can we add robustness to the learning of

a network? Can deep learning be viewed as a metric learning problem? What are the statistical properties of images (patterns) captured by deep learning networks?

◮ Nonconvex network learning: What is the optimal selection and amount of

training data? How to deal with the nonconvexity and that many local minima share a similar performance?

Structure/Patterns Parameters/Design Optimization/Learning

slide-7
SLIDE 7

Cancer-ID Project

Cancer-ID aims to validate blood-based biomarkers for cancer

◮ cells dissociate from primary tumor

and invade blood circulation

◮ rare cell events, challenging to detect ◮ circulating tumor cell (CTC) count has

prognostic value for survival outcome

◮ no overall CTC definition exists yet

7

slide-8
SLIDE 8

Automatic and Platform Independent CTC Definition

8

Find and classify CTCs in various data sets!

slide-9
SLIDE 9

Semi- or Unsupervised Analysis of Structure and Scale?

Idea: Artefacts, intact cells and fragments of cells have different sizes and

  • intensities. Can we detect that automatically?

→ scale information might help to improve classification results.

9

slide-10
SLIDE 10

Goal

Variational Methods Deep Learning

High-level task Lower-level task Find Similarities Denoising by Nonlinear Diffusion Segmentation using Nonlinear Diffusion Denoising by CNN Autoencoder Classification using CNN Autoencoder

10

slide-11
SLIDE 11

Spectral Transformation and Filtering (Fourier, Wavelet)

11

More informative signal representation!

slide-12
SLIDE 12

Spectral Analysis for TV Denoising

Forward Total Variation (TV) flow ut = −p for p ∈ ∂TV(u) u|t=0 = f δ

→ discrete case: solving in every step the ROF [Rudin et al. 92] problem:

1 2||u − un||2 2 + αTV(u) → minu

Idea: Solution of nonlinear eigenvalue problem λu ∈ ∂J(u) with J(u) = TV(u) transformed to peak in spectral domain.

12

[Gilboa, 2013] [Gilboa, 2014] [Horesh, Gilboa 15] [Burger et al., 2015],[Gilboa et al., 2015],[Burger et al., 2016]

slide-13
SLIDE 13

Spectral Transform

Spectral Transform and Response (acc.to [Gilboa 13/14]) φ(t) = uttt S(t) := ||φ(t; x)||L1(Ω)

Example taken from [Burger et al., 2015].

Signal representation: f(x) = ∞ φ(t; x)dt + ¯ f Filtering: fH(x) = ∞ φH(t; x)dt + H(∞)¯ f with φH(t; x) = H(t)φ(t; x). Parseval id. also available!

13

slide-14
SLIDE 14

Variational Methods for Segmentation

Which variational models can be used to partition an image into two regions?

Active Contour without Edges model (Chan-Vese)

JCV(c1, c2, C) =

  • Ωin

(f(x) − c1)2dx +

  • Ωout

(f(x) − c2)2dx + α · Length(C) → min

C,c1,c2

14 [Osher, Sethian, 88], [Mumford, Shah 89], [Chan, Vese 01], [Ambrosio, Tortorelli 90] → related to Level-set method (Hamilton-Jacobi)

slide-15
SLIDE 15

Relation of Total Variation and Perimeter

Function Space of Bounded Variation

BV(Ω) := {u ∈ L1(Ω)|TV(u) < ∞} with TV(u) := sup

ϕ∈C∞ (Ω;R2) ||ϕ||∞<1

  • Ω u∇ · ϕ dµ

Relation with CV segmentation model? Length(C) = TV(u) with u(x) =

  • 1

if x ∈ Ωin ∪ C if x ∈ Ωout

TV-based formulation of CV model

JCV2(c1, c2, u) =

u

  • (f(x) − c1)2 − (f(x) − c2)2

dx + α TV(u) → min

u∈BV(Ω),c1,c2 u(x)∈{0,1}

For fixed c1, c2 corresponds to ROF with binary constraint ([Burger et al. 12]): min u∈BV(Ω)

u(x)∈{0,1} 1 2 ||u(x) − r(x)||2 2 + αTV(u) with r(x) = (f(x) − c2)2−(f(x) − c1)2− 1 2.

15

slide-16
SLIDE 16

Scale spaces for Segmentation

”Forward scale space” for filtering and segmentation Nonlinear Filtering (ROF)

1 2||u − f||2 2 + αTV(u)

with scale parameter α Nonlinear Segmentation (CV)

  • Ω u
  • (f − c1)2 − (f − c2)2

+ αTV(u) with scale parameter α

◮ Inverse scale space for nonlocal filtering through Bregman iterations

”Inverse scale space” for filtering uk+1 = arg min

u∈BV(Ω)

1 2||u − f||2

2 + α(TV(u)− < u, pk >)

with pk ∈ ∂TV(uk), p0 = 0 and scale parameter k.

◮ How can we construct an ”inverse scale space” for segmentation?

16 [Osher et al. 05]

slide-17
SLIDE 17

Spectral Transform for Segmentation

Spectral Transform and Response φ(t; x) =

  • −ut(x)

(forward case) ut(x) (inverse case) S(t) = ||φ(t; x)||L1(Ω)

S(t) =

  • Φ(t), 2tp(t) better? ([Burger et al., 2015])

Segmentation representation via: f seg(x) = ∞ φ(t; x)dt Filtering via: f seg

H

(x) = ∞ φH(t; x)dt with φH(t; x) = H(t)φ(t; x)

17

slide-18
SLIDE 18

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 1

18

slide-19
SLIDE 19

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 2

18

slide-20
SLIDE 20

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 3

18

slide-21
SLIDE 21

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 4

18

slide-22
SLIDE 22

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 5

18

slide-23
SLIDE 23

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 6

18

slide-24
SLIDE 24

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 7

18

slide-25
SLIDE 25

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 8

18

slide-26
SLIDE 26

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 9

18

slide-27
SLIDE 27

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 10

18

slide-28
SLIDE 28

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 11

18

slide-29
SLIDE 29

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 12

18

slide-30
SLIDE 30

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 13

18

slide-31
SLIDE 31

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 14

18

slide-32
SLIDE 32

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 15

18

slide-33
SLIDE 33

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 16

18

slide-34
SLIDE 34

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 17

18

slide-35
SLIDE 35

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 18

18

slide-36
SLIDE 36

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 19

18

slide-37
SLIDE 37

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 20

18

slide-38
SLIDE 38

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 21

18

slide-39
SLIDE 39

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 22

18

slide-40
SLIDE 40

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 23

18

slide-41
SLIDE 41

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 24

18

slide-42
SLIDE 42

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 25

18

slide-43
SLIDE 43

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 26

18

slide-44
SLIDE 44

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 27

18

slide-45
SLIDE 45

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 28

18

slide-46
SLIDE 46

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 29

18

slide-47
SLIDE 47

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Bregman-Iteration 30

18

slide-48
SLIDE 48

Detection of Different Sizes

Bregman-CV with α = 100, discs with fixed intensity and varying size.

Color coded segmentation (

  • t Φ(x, t) · tdt)

18

slide-49
SLIDE 49

Detection of Different Intensities

Bregman-CV with α = 30, discs with fixed size and varying intensity.

Bregman-Iteration 1

19

slide-50
SLIDE 50

Detection of Different Intensities

Bregman-CV with α = 30, discs with fixed size and varying intensity.

Bregman-Iteration 4

19

slide-51
SLIDE 51

Detection of Different Intensities

Bregman-CV with α = 30, discs with fixed size and varying intensity.

Bregman-Iteration 6

19

slide-52
SLIDE 52

Detection of Different Intensities

Bregman-CV with α = 30, discs with fixed size and varying intensity.

Bregman-Iteration 11

19

slide-53
SLIDE 53

Detection of Different Intensities

Bregman-CV with α = 30, discs with fixed size and varying intensity.

Bregman-Iteration 27

19

slide-54
SLIDE 54

Detection of Different Intensities

Bregman-CV with α = 30, discs with fixed size and varying intensity.

Color coded segmentation (

  • t Φ(x, t) · tdt)

19

slide-55
SLIDE 55

Robustness Against Noise

Bregman-CV with α = 100 and 30 Bregman Iterations.

σ = 0.25, c1 = 1.01 σ = 0.5, c1 = 1.09 σ = 0.75, c1 = 1.25 σ = 1, c1 = 1.44

Very high robustness against noise. No parameter adaptation needed!

20

slide-56
SLIDE 56

Finding different shapes What determines the shape of the eigenfunctions?

Generalized Definition of TV TV(u) :=

γ(∇u)dx (Primal) TV(u) := sup

ϕ∈C1

C(Ω;RD) ϕ(x)·n<γ(n)∀n∈Rd −

u∇ · ϕdx (Dual)

Franck diagram Fγ := {z ∈ Rd : γ(z) ≤ 1} Wulff shape Wγ :={z ∈ Rd : z · x ≤ γ(x) for all x ∈ Rd} ={z ∈ Rd : γ∗(z) := sup

x∈Rd

z · x γ(x) ≤ 1}

21

slide-57
SLIDE 57

Finding Different Shapes

For different choices of γ we can find different eigenfunctions. γ∗ = · 2 γ∗ = · ∞ γ∗ = · 1 Results for Bregman-CV with α = 100.

22

slide-58
SLIDE 58

Mixture of Shapes and Scales

Data and Segmentation Scales Spectral response S(t)

◮ successive peaks (usually) correspond to comparable scales. ◮ reconstruction of different segmentation scales based on S ◮ non-eigenfunctions reshaped over time

[Florack, Kuijper, Topological structure of scale-space images, 2000] 23

slide-59
SLIDE 59

Convolutional Neural Network

activation function (differentiable) bias convolution kernel input “image” pixel pos.

! ", $; &, ', ( = &( ' ∗ " $ + (($))

slide-60
SLIDE 60

CNN Layer Visualization

slide-61
SLIDE 61

Autoencoder

  • generalization of dimensionality and filter approach
  • Fourier filtering and PCA are special cases of tied-weight AE

! ≈ # $ !; &' ; &(

  • special choice of auto encoder (AE): fix
  • called AE with tied weights

θC = θF Id = W |φ(W) à Learn the identity map:

! )(+!) +-)(+!)

slide-62
SLIDE 62

Autoencoder and Gradient Flow

  • Assume a trained AE with tied weights, i.e.
  • u and f(u) live in the same vector space, i.e.

is a vector field pointing from u to reconstructed f(u)

  • Under assumptions, G(u) is the gradient field of an energy E(u)

with ! " = $%&($") G " = ! " − " G " = ! " − " ~ − ,-.

) 9 energy E(u; W) with f(u) = W T φ(Wu) = u − @uE(u; W)

E(u) =

N 2

X

i=1

φ((Wu)i) + 1 2||u||2

2

φ(u) = −Φ0(u)

slide-63
SLIDE 63

CNN Denoising Autoencoder - Scale

Trained on noise-free data, 6000 training sets, 1000 test sets 3 Convolution (32 filters in total) + Pooling blocks for Encoder + Decoder à 4963 parameters GT Std.dev 0.5 Std.dev 0.2 Std.dev 0.1 Std.dev 0.05 No noise

slide-64
SLIDE 64

CNN Denoising Autoencoder - Scale

Trained on noisy data (std dev 0.2), 6000 training sets, 1000 test sets 3 Convolution (32 filters in total) + Pooling blocks for Encoder + Decoder à 4963 parameters GT Std.dev 0.5 Std.dev 0.2 Std.dev 0.1 Std.dev 0.05 No noise

slide-65
SLIDE 65

CNN Denoising Autoencoder - Shape

Trained on noise-free data, 6000 training sets, 1000 test sets 3 Convolution (32 filters in total) + Pooling blocks for Encoder + Decoder à 4963 parameters GT No noise Different radius Different radius, trained on correct

slide-66
SLIDE 66

Deep Learning for Circulating Tumor Cells

GT data: CTC | non-CTC (4910 each) Network: 15-Layer CNN 80%|20% Training|Testing SGD, 75 epochs, mini-batch 64

96,5% accuracy

slide-67
SLIDE 67

Deep Learning for Circulating Tumor Cells

GT data: CTC (10415) | non-CTC (41523) Network: ResNet 18 75%|25% Training|Testing Adadelta

98,6% accuracy

97,2% sensitivity 99,0% specificity

slide-68
SLIDE 68

Deep Learning for Circulating Tumor Cells

GT data: CTC | tdEV | WBC | “New” | Rest (1467 each) Network: 15-Layer CNN 80%|20% Training|Testing SGD, 75 epochs, mini-batch 64

78,8% accuracy

slide-69
SLIDE 69

Deep Learning for Circulating Tumor Cells

GT data: CTC | tdEV | WBC | “New” | Rest Network: ResNet 18 75%|25% Training|Testing Adadelta

86,6% accuracy

slide-70
SLIDE 70
slide-71
SLIDE 71

EU-IMI key opinion leaders: "A notable highlight is the development of the open source image analysis program ACCEPT for automated comparison of CTC images. As proof of principle, the consortium has demonstrated that the tool has improved the concordance between operators in assigning CTCs in patients with MBC as either HER2 positive or HER2 negative.”

slide-72
SLIDE 72

Summary & Open Questions

◮ Spectral decomposition for nonlinear denoising & segmentation ◮ Scale/Stability: Multiscale properties of CNNs ◮ Eigenfunctions/Invariants: Inverse NF equations, kernel properties of CNN ◮ ACCEPT open source software tool for CANCER research ◮ Deep Learning for Cancer ID is possible, however multimodal and multiclass

representation in CNNs is challenging

◮ Outlook: Combination of CNN autoencoder, classifier and GAN

Thanks to my collaborators: Leonie Zeune, Stephan van Gils, Guus van Dalum, Leon Terstappen

slide-73
SLIDE 73

Papers:

◮ Zeune, et al., Multiscale Segmentation via Bregman Distances and Nonlinear

Spectral Analysis, SIAM J Imaging Sciences, 10(1), 2017.

◮ Zeune, et al., Combining Contrast Invariant L1 Data Fidelities with Nonlinear

Spectral Image Decomposition, Springer LNCS, SSVM 2017, 80-93, 2017.

◮ Zeune, et al., Quantifying HER-2 expression on Circulating Tumor Cells by

ACCEPT, PLOS ONE, 12(10), 2017.

◮ Eissa, et al., Cross-scale effects of neural interactions during human neocortical

seizure activity, PNAS, 114(40), 2017.

◮ Boink, et al., A framework for directional and higher-order reconstruction in

photoacoustic tomography, Physics in Medicine & Biology, accepted, Jan 2018. This work is supported by the IMI EU program #115749 CANCER-ID.

37

Thanks for your attention!

slide-74
SLIDE 74

Spectral decomposition for p = 2

L2-norm L1-norm Scale-spaces: Doubly nonlinear evolution equation: ,

[Nikolova 02, Chan 05, Darbon 05, Duval 09, Zeune 17] 38

slide-75
SLIDE 75

L2-norm L1-norm