How intelligent is artificial intelligence? On the surprising and - - PowerPoint PPT Presentation

how intelligent is artificial intelligence on the
SMART_READER_LITE
LIVE PREVIEW

How intelligent is artificial intelligence? On the surprising and - - PowerPoint PPT Presentation

How intelligent is artificial intelligence? On the surprising and mysterious secrets of deep learning Vegard Antun (UiO) Anders C. Hansen (Cambridge, UiO) Joint work with: B. Adcock (SFU), M. Colbrook (Cambridge) N. Gottschling


slide-1
SLIDE 1

How intelligent is artificial intelligence? – On the surprising and mysterious secrets of deep learning

Vegard Antun (UiO) Anders C. Hansen (Cambridge, UiO)

Joint work with:

  • B. Adcock (SFU),
  • M. Colbrook (Cambridge)
  • N. Gottschling (Cambridge)
  • C. Poon (Bath),
  • F. Renna (Porto)

22 May 2019

1 / 67

slide-2
SLIDE 2
slide-3
SLIDE 3

What could possibly go wrong? AI replacing standard algorithms

slide-4
SLIDE 4

Mathematical setup

Image reconstruction in medical imaging ◮ x ∈ CN the true image (interpreted as a vector). ◮ A ∈ Cm×N measurement matrix (m < N). ◮ y = Ax the measurements.

4 / 67

slide-5
SLIDE 5

True image x Sampling pattern Ω |˜ x| = |A∗y| Sinogram y = Ax Back proj. ˆ x1 = A∗y FBP: ˆ x2 = By

5 / 67

slide-6
SLIDE 6

Image reconstruction methods

◮ Deep learning approach: For a given a set {x1, . . . , xn}, train a neural network f : Cm → CN such that f (Axi) − xi ≪ A∗xi − xi

slide-7
SLIDE 7

Image reconstruction methods

◮ Deep learning approach: For a given a set {x1, . . . , xn}, train a neural network f : Cm → CN such that f (Axi) − xi ≪ A∗xi − xi ◮ Sparse regularization minimizez∈CN Wzℓ1 subject to Az − yℓ2 ≤ η Image x Wx Wx W = Wavelets W = ∇

slide-8
SLIDE 8

Sparse regularization reconstruction

DB4 wavelets TV

8 / 67

slide-9
SLIDE 9

Typical sparse regularization result

Let A ∈ Cm×N with m < N and y = Ax + e, with e2 ≤ η. Let W ∈ CN×N be unitary, and suppose that AW −1 satisfies the restricted isometry property in levels (RIPL). Then any minimizer ˆ x

  • f

minimizez∈CN Wz1 subject to Az − y ≤ η satisfies ||ˆ x − x||2 σs,M(Wx)1 √s + η where σs,M(Wx)1 = inf{||Wx − z||1 : z is (s, M)-sparse}

9 / 67

slide-10
SLIDE 10

Neural network image reconstruction approaches

◮ Pure denoisers. Train a neural network φ to learn the noise. f (y) = A∗y − φ(A∗y)

10 / 67

slide-11
SLIDE 11

Neural network image reconstruction approaches

◮ Pure denoisers. Train a neural network φ to learn the noise. f (y) = A∗y − φ(A∗y) ◮ Data consistent denoisers. Train n networks φi, i = 1, . . . , n and ensure that the final image is consistent with your data

1: Pick α ∈ [0, 1]. 2: Set ˜

y1 = y.

3: for i = 1, . . . n do 4:

˜ xi = A∗˜ yi − φi(A∗˜ yi)

5:

ˆ y = A˜ xi

6:

˜ yi+1 = αˆ y + (1 − α)y, (Enforce data consistency)

7: Return:

˜ xn.

11 / 67

slide-12
SLIDE 12

Neural network image reconstruction approaches

◮ Learn the physics. Do not warm start your network with A∗. Rather learn f (yi) = xi, i = 1, . . . , n directly

12 / 67

slide-13
SLIDE 13

Neural network image reconstruction approaches

◮ Learn the physics. Do not warm start your network with A∗. Rather learn f (yi) = xi, i = 1, . . . , n directly ◮ Unravel n steps with sparse regularization solver. Learn λi, Ki, and Ψi for i = 1, . . . , n.

1: x1 = A∗y 2: for i = 1, . . . n do 3:

˜ xi+1 = ˜ xi − (Ki)TΨi(Ki ˜ xi) + λiA∗(A˜ xi − y)

4: Return:

˜ xn+1. (omitting some details here)

13 / 67

slide-14
SLIDE 14

Networks considered

◮ AUTOMAP

◮ Low resolution images, 60% subsampling, single coil MRI. ◮ B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen and M. S. Rosen, ’Image reconstruction by domain-transform manifold learning’, Nature, vol. 555, no. 7697, p. 487, Mar. 2018.

◮ DAGAN

◮ Medium resolution, 20% subsampling, single coil MRI. ◮ G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye,

  • F. Liu, S. Arridge, J. Keegan, Y. Guo et al., DAGAN: Deep

de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction, IEEE Transactions on Medical Imaging, 2017.

◮ Deep MRI

◮ Medium resolution, 33% subsampling, single coil MRI. ◮ J. Schlemper, J. Caballero, J. V. Hajnal, A. Price and D. Rueckert, A deep cascade of convolutional neural networks for MR image reconstruction, in International conference on information processing in medical imaging, Springer, 2017, pp. 647–658.

14 / 67

slide-15
SLIDE 15

Networks considered

◮ Ell 50 and Med 50 (FBPConvNet)

◮ CT or any Radon transform based inverse problem, with 50 uniformly spaced lines. ◮ K. H. Jin, M. T. McCann, E. Froustey and M. Unser, ’Deep convolutional neural network for inverse problems in imaging’, IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4509–4522, 2017.

◮ MRI-VN

◮ Medium to high resolution, parallel MRI with 15 coil elements and 15% subsampling. ◮ K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock and F. Knoll, ’Learning a variational network for reconstruction of accelerated MRI data’, Magnetic resonance in medicine, vol. 79, no. 6, pp. 3055–3071, 2018.

15 / 67

slide-16
SLIDE 16

How to measure image quality?

(1) Base image

(2) Translated

(3) Add ǫ = 0.32 to pixels (4) Nosiy

(5) Another bird

(6) Different image

Image (1) (2) (3) (4) (5) (6) ℓ2 − distance 215.04 204.80 167.26 216.44 193.15

Figure from: M. Lohne, Parseval Reconstruction Networks, Master thesis, UiO, 2019

slide-17
SLIDE 17

Three types of instabilities

(1) Instabilities with respect to tiny perturbations. That is ˜ y = A(x + r) with r very small. (2) Instabilities with respect to small structural changes, for example a tumour, may not be captured in the reconstructed image (3) Instabilities with respect to changes in the number of samples. Having more information should increase performance.

  • V. Antun, F. Renna, C. Poon, B.Adcock, A. Hansen. On instabilities of

deep learning in image reconstruction - Does AI come at a cost? (arXiv 2019)

17 / 67

slide-18
SLIDE 18

Finding tiny perturbations

Try to maximize Qx(r) = 1 2f (A(x + r)) − f (Ax)2

ℓ2 − λ

2 r2

ℓ2,

λ > 0 using a gradient ascent proceedure.

18 / 67

slide-19
SLIDE 19

Tiny perturbation – Deep MRI net

|x| |f (Ax)|

19 / 67

slide-20
SLIDE 20

Tiny perturbation – Deep MRI net

|x + r1| |f (A(x + r1))|

20 / 67

slide-21
SLIDE 21

Tiny perturbation – Deep MRI net

|x + r2| |f (A(x + r2))|

21 / 67

slide-22
SLIDE 22

Tiny perturbation – Deep MRI net

|x + r3| |f (A(x + r3))|

22 / 67

slide-23
SLIDE 23

Tiny perturbation – Deep MRI net

SoA from Ax SoA from A(x + r3)

23 / 67

slide-24
SLIDE 24

Tiny perturbation – AUTOMAP

|x| |f (Ax)| SoA from Ax

24 / 67

slide-25
SLIDE 25

Tiny perturbation – AUTOMAP

|x + r1| |f (A(x + r1))| SoA from A(x + r1)

25 / 67

slide-26
SLIDE 26

Tiny perturbation – AUTOMAP

|x + r2| |f (A(x + r2))| SoA from A(x + r2)

26 / 67

slide-27
SLIDE 27

Tiny perturbation – AUTOMAP

|x + r3| |f (A(x + r3))| SoA from A(x + r3)

27 / 67

slide-28
SLIDE 28

Tiny perturbation – AUTOMAP

|x + r4| |f (A(x + r4))| SoA from A(x + r4)

28 / 67

slide-29
SLIDE 29

Finding tiny perturbations

What if we tried to maximize Qx(r) = 1 2f (A(x + r)) − x2

ℓ2 − λ

2 r2

ℓ2,

λ > 0 instead

29 / 67

slide-30
SLIDE 30

Tiny perturbation – AUTOMAP

|x| |f (Ax)| SoA from Ax

30 / 67

slide-31
SLIDE 31

Tiny perturbation – AUTOMAP

|x + r1| |f (A(x + r1))| SoA from A(x + r1)

31 / 67

slide-32
SLIDE 32

Tiny perturbation – AUTOMAP

|x + r2| |f (A(x + r2))| SoA from A(x + r2)

32 / 67

slide-33
SLIDE 33

Tiny perturbation – AUTOMAP

|x + r3| |f (A(x + r3))| SoA from A(x + r3)

33 / 67

slide-34
SLIDE 34

Tiny perturbation – AUTOMAP

|x + r4| |f (A(x + r4))| SoA from A(x + r4)

34 / 67

slide-35
SLIDE 35

Tiny perturbation – MRI-VN

Original x x + r1

35 / 67

slide-36
SLIDE 36

Tiny perturbation – MRI-VN

f (Ax) (zoomed) f (A(x + r1)) (zoomed)

36 / 67

slide-37
SLIDE 37

Tiny perturbation – MRI-VN

SoA from Ax (zoomed) SoA from A(x + r1) (zoomed)

37 / 67

slide-38
SLIDE 38

Tiny perturbation – Med 50

Original x x + r1

38 / 67

slide-39
SLIDE 39

Tiny perturbation – Med 50

f (Ax) (zoomed) f (A(x + r1)) (zoomed)

39 / 67

slide-40
SLIDE 40

Tiny perturbation – Med 50

SoA from Ax (zoomed) SoA from A(x + r1) (zoomed)

40 / 67

slide-41
SLIDE 41

Small structural change – Ell 50

41 / 67

slide-42
SLIDE 42

Small structural change – Ell 50

f (Ax) SoA from Ax

42 / 67

slide-43
SLIDE 43

Small structural change – DAGAN

43 / 67

slide-44
SLIDE 44

Small structural change – DAGAN

f (Ax) SoA from Ax

44 / 67

slide-45
SLIDE 45

Small structural change – Deep MRI

45 / 67

slide-46
SLIDE 46

Small structural change – Deep MRI

f (Ax) SoA from Ax

46 / 67

slide-47
SLIDE 47

Adding more samples

Ell 50/Med 50 DAGAN MRI-VN Deep MRI

47 / 67

slide-48
SLIDE 48

Summary of so far...

◮ Tiny perturbations lead to a myriad of different artefacts. ◮ Variety in failure of recovering structural changes. ◮ Networks must be retrained on any subsampling pattern? ◮ Universality – Instabilities regardless of architecture? ◮ Rare events? – Empirical tests are needed.

48 / 67

slide-49
SLIDE 49

Can we fix it?

◮ Computational power is

  • increasing. We can train,

test and at a substantial higher rate than just a few years ago. ◮ The datasets are growing. ◮ Increased knowledge about good learning techniques

49 / 67

slide-50
SLIDE 50
slide-51
SLIDE 51

Theorem 1

Let A : CN → Cm be a linear sampling map and let f : Cm → CN. Suppose that there are x, η, ξη, ξx ∈ CN with ξη, ξx ≤ δ ∈ (0, 1/2) such that f (Ax) = x + ξx, f (A(x + η)) = x + η + ξη, (1) where η = 1 and Aη = δ > 0. Then we have the following. (i) (Instabilities) Then the local Lipschitz constant of f at Ax, defined for ǫ ≥ δ > 0, satisfies Lǫ

Ax =

sup

0<Az≤ǫ

f (Ax + Az) − f (Ax) Az ≥ 1 − 2δ ǫ

slide-52
SLIDE 52

Theorem 2

Let A : CN → Cm be a linear sampling map and let f : Cm → CN. Suppose that there are x, η, ξη, ξx ∈ CN with ξη, ξx ≤ δ ∈ (0, 1/2) such that f (Ax) = x + ξx, f (A(x + η)) = x + η + ξη, (2) where η = 1 and Aη = δ > 0. Then we have the following. (ii) (False positives) Moreover, there exists a perturbation r ∈ Cm with r = δ such that f (r + Ax) − (x + η) ≤ δ. (iii) (False negatives) and there exists a perturbation r ∈ Cm with r = δ such that f (r + A(x + η)) − x ≤ δ.

slide-53
SLIDE 53

Neural network reconstruction

53 / 67

slide-54
SLIDE 54

Neural network reconstruction

54 / 67

slide-55
SLIDE 55

Neural network reconstruction

55 / 67

slide-56
SLIDE 56

Neural network reconstruction

56 / 67

slide-57
SLIDE 57

Neural network reconstruction

57 / 67

slide-58
SLIDE 58

Neural network reconstruction

η |Fη|, F- Fourier matrix

58 / 67

slide-59
SLIDE 59

Neural network reconstruction

59 / 67

slide-60
SLIDE 60

Neural network reconstruction

60 / 67

slide-61
SLIDE 61

Neural network reconstruction

61 / 67

slide-62
SLIDE 62

Neural network reconstruction

62 / 67

slide-63
SLIDE 63

Neural network reconstruction

63 / 67

slide-64
SLIDE 64

Typical sparse regularization result

Recall that if AW −1 satisfies the restricted isometry property in levels (RIPL). ˆ x ∈ argminz∈CN Wz1 subject to Az − y ≤ η satisfies ||ˆ x − x||2 σs,M(Wx)1 √s + η where σs,M(Wx)1 = inf{||Wx − z||1 : z is (s, M)-sparse}

64 / 67

slide-65
SLIDE 65

Wavelet reconstruction

65 / 67

slide-66
SLIDE 66

Wavelet reconstruction

66 / 67

slide-67
SLIDE 67

Summary

◮ Kernel awareness is important ◮ It seems hard to protect against to high preformance. ◮ Universality – Instabilities regardless of architecture

67 / 67