Learning the Sampling for MRI Matthias J. Ehrhardt Institute for - - PowerPoint PPT Presentation

learning the sampling for mri
SMART_READER_LITE
LIVE PREVIEW

Learning the Sampling for MRI Matthias J. Ehrhardt Institute for - - PowerPoint PPT Presentation

Learning the Sampling for MRI Matthias J. Ehrhardt Institute for Mathematical Innovation, University of Bath, UK June 24, 2020 Joint work with: F. Sherry, M. Graves, G. Maierhofer, G. Williams, C.-B. Sch onlieb (all Cambridge, UK), M.


slide-1
SLIDE 1

Learning the Sampling for MRI

Matthias J. Ehrhardt

Institute for Mathematical Innovation, University of Bath, UK

June 24, 2020

Joint work with:

  • F. Sherry, M. Graves, G. Maierhofer, G. Williams, C.-B. Sch¨
  • nlieb (all

Cambridge, UK), M. Benning (Queen Mary, UK), J.C. De los Reyes (EPN, Ecuador)

slide-2
SLIDE 2

Outline

1) Motivation

minx 1

2SFx−y2 2+λR(x)

2) Bilevel Learning

minx,y f (x, y) x = arg minz g(z, y)

3) Learn sampling pattern in MRI

slide-3
SLIDE 3

Inverse problems

Ax = y

x : desired solution y : observed data A : mathematical model

Goal: recover x given y

slide-4
SLIDE 4

Inverse problems

Ax = y

x : desired solution y : observed data A : mathematical model

Goal: recover x given y

Hadamard (1902): We call an inverse problem Ax = y well-posed if (1) a solution x∗ exists (2) the solution x∗ is unique (3) x∗ depends continuously on data y. Otherwise, it is called ill-posed.

Jacques Hadamard

Most interesting problems are ill-posed.

slide-5
SLIDE 5

Example: Magnetic Resonance Imaging (MRI)

MRI scanner T ∗

2

Continuous model: Fourier transform Ax(s) =

  • R2 x(s) exp(−ist)dt

Dicrete model: A = F ∈ CN×N

slide-6
SLIDE 6

Example: Magnetic Resonance Imaging (MRI)

MRI scanner T ∗

2

Continuous model: Fourier transform Ax(s) =

  • R2 x(s) exp(−ist)dt

Dicrete model: A = SF ∈ Cn×N

Solution not unique.

slide-7
SLIDE 7

How to solve inverse problems?

Variational regularization (∼2000) Approximate a solution x∗ of Ax = y via ˆ x ∈ arg min

x

1 2Ax − y2

2 + λR(x)

  • R regularizer: penalizes unwanted features, ensures stability

and uniqueness λ regularization parameter: λ ≥ 0. If λ = 0, then an original solution is recovered. If λ → ∞, more and more weight is given to the regularizer R. textbooks: Scherzer et al. 2008, Ito and Jin 2015, Benning and Burger 2018

slide-8
SLIDE 8

Example: Regularizers

Tikhonov regularization (∼1960): R(x) = 1 2x2

2

  • A. Tikhonov
slide-9
SLIDE 9

Example: Regularizers

Tikhonov regularization (∼1960): R(x) = 1 2x2

2

  • A. Tikhonov

Total Variation Rudin, Osher, Fatemi 1992 R(x) = ∇x1

  • S. Osher

H1 (∼1960-1990?) R(x) = 1 2∇x2

2

Wavelet sparsity (∼1990) R(x) = W x1 Total Generalized Variation: Bredies, Kunisch, Pock 2010 R(x) = inf

v ∇x − v1 + β∇v1

slide-10
SLIDE 10

Example: MRI reconstruction

Compressed Sensing MRI: A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F, sampling Sw = (wi)i∈Ω ˆ x ∈ arg min

x

1 2SFx − y2

2 + λ∇x1

  • Miki Lustig
slide-11
SLIDE 11

Example: MRI reconstruction

Compressed Sensing MRI: A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F, sampling Sw = (wi)i∈Ω ˆ x ∈ arg min

x

1 2SFx − y2

2 + λ∇x1

  • Miki Lustig

sampling S∗y λ = 0 λ = 1

slide-12
SLIDE 12

Example: MRI reconstruction

Compressed Sensing MRI: A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F, sampling Sw = (wi)i∈Ω ˆ x ∈ arg min

x

1 2SFx − y2

2 + λ∇x1

  • Miki Lustig

sampling S∗y λ = 0 λ = 10−4

slide-13
SLIDE 13

Example: MRI reconstruction

Compressed Sensing MRI: A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F, sampling Sw = (wi)i∈Ω ˆ x ∈ arg min

x

1 2SFx − y2

2 + λ∇x1

  • Miki Lustig

sampling S∗y λ = 0 λ = 10−4

slide-14
SLIDE 14

Example: MRI reconstruction

Compressed Sensing MRI: A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F, sampling Sw = (wi)i∈Ω ˆ x ∈ arg min

x

1 2SFx − y2

2 + λ∇x1

  • Miki Lustig

sampling S∗y λ = 0 λ = 10−3

slide-15
SLIDE 15

Example: MRI reconstruction

Compressed Sensing MRI: A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F, sampling Sw = (wi)i∈Ω ˆ x ∈ arg min

x

1 2SFx − y2

2 + λ∇x1

  • Miki Lustig

sampling S∗y λ = 0 λ = 10−3 How to choose the sampling S? Is there an optimal sampling?

slide-16
SLIDE 16

Example: MRI reconstruction

Compressed Sensing MRI: A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F, sampling Sw = (wi)i∈Ω ˆ x ∈ arg min

x

1 2SFx − y2

2 + λ∇x1

  • Miki Lustig

sampling S∗y λ = 0 λ = 10−3 How to choose the sampling S? Is there an optimal sampling? Does the optimal sampling depend on R and λ?

slide-17
SLIDE 17

Some important works on sampling for MRI

Uninformed ◮ Cartesian, radial, variable density ... e.g. Lustig et al. 2007

✓ simple to implement ✗

not tailored to application

not tailored to regularizer / reconstruction method

◮ compressed sensing theory: random sampling, mostly uniform

e.g. Candes and Romberg 2007

✓ mathematical guarantees ✗

limited to few sparsity promoting regularizers: mostly ℓ1 type

specific yet uninformed class of recoverable signals: sparse

slide-18
SLIDE 18

Some important works on sampling for MRI

Uninformed ◮ Cartesian, radial, variable density ... e.g. Lustig et al. 2007

✓ simple to implement ✗

not tailored to application

not tailored to regularizer / reconstruction method

◮ compressed sensing theory: random sampling, mostly uniform

e.g. Candes and Romberg 2007

✓ mathematical guarantees ✗

limited to few sparsity promoting regularizers: mostly ℓ1 type

specific yet uninformed class of recoverable signals: sparse

Learned ◮ Largest Fourier coefficients of training set Knoll et al. 2011

✓ simple to implement, computationally light ✗

not tailored to regularizer / reconstruction method

◮ greedy: iteratively select ”best” sample G¨

  • zc¨

u et al. 2018

✓ adaptive to dataset, regularizer / reconstruction method ✗

  • nly discrete values, e.g. can’t learn regularization parameter

computationally heavy

slide-19
SLIDE 19

Bilevel Learning

slide-20
SLIDE 20

Bilevel learning for inverse problems

ˆ x = arg min

x

1 2Ax − y2

2 + λR(x)

  • R smooth and

strongly convex

slide-21
SLIDE 21

Bilevel learning for inverse problems

Upper level (learning): Given (x†, y), y = Ax† + ε, solve min

λ≥0,ˆ x ˆ

x − x†2

2

Lower level (solve inverse problem): ˆ x = arg min

x

1 2Ax − y2

2 + λR(x)

  • Carola Sch¨
  • nlieb

R smooth and strongly convex

von Stackelberg 1934, Kunisch and Pock 2013, De los Reyes and Sch¨

  • nlieb 2013
slide-22
SLIDE 22

Bilevel learning for inverse problems

Upper level (learning): Given (x†

i , yi)n i=1, yi = Ax† i + εi, solve

min

λ≥0,ˆ xi

1 n

n

  • i=1

ˆ xi − x†

i 2 2

Lower level (solve inverse problem): ˆ xi = arg min

x

1 2Ax − yi2

2 + λR(x)

  • Carola Sch¨
  • nlieb

R smooth and strongly convex

von Stackelberg 1934, Kunisch and Pock 2013, De los Reyes and Sch¨

  • nlieb 2013
slide-23
SLIDE 23

Bilevel learning: Reduced formulation

Upper level: min

λ≥0,ˆ x ˆ

x − x†2

2

Lower level: ˆ x = arg min

x

1 2Ax − y2

2 + λR(x)

slide-24
SLIDE 24

Bilevel learning: Reduced formulation

Upper level: min

λ≥0,ˆ x U(ˆ

x) Lower level: ˆ x = arg min

x

1 2Ax − y2

2 + λR(x)

slide-25
SLIDE 25

Bilevel learning: Reduced formulation

Upper level: min

λ≥0,ˆ x U(ˆ

x) Lower level: ˆ x = arg min

x L(x, λ)

slide-26
SLIDE 26

Bilevel learning: Reduced formulation

Upper level: min

λ≥0,ˆ x U(ˆ

x) Lower level: xλ := ˆ x = arg min

x L(x, λ)

Reduced formulation: min

λ≥0 U(xλ) =: ˜

U(λ)

slide-27
SLIDE 27

Bilevel learning: Reduced formulation

Upper level: min

λ≥0,ˆ x U(ˆ

x) Lower level: xλ := ˆ x = arg min

x L(x, λ)

⇔ ∂xL(xλ, λ) = 0 Reduced formulation: min

λ≥0 U(xλ) =: ˜

U(λ)

slide-28
SLIDE 28

Bilevel learning: Reduced formulation

Upper level: min

λ≥0,ˆ x U(ˆ

x) Lower level: xλ := ˆ x = arg min

x L(x, λ)

⇔ ∂xL(xλ, λ) = 0 Reduced formulation: min

λ≥0 U(xλ) =: ˜

U(λ) 0 = ∂2

xL(xλ, λ)∂λxλ + ∂θ∂xL(xλ, λ)

⇔ ∂λxλ = −B−1A

slide-29
SLIDE 29

Bilevel learning: Reduced formulation

Upper level: min

λ≥0,ˆ x U(ˆ

x) Lower level: xλ := ˆ x = arg min

x L(x, λ)

⇔ ∂xL(xλ, λ) = 0 Reduced formulation: min

λ≥0 U(xλ) =: ˜

U(λ) 0 = ∂2

xL(xλ, λ)∂λxλ + ∂θ∂xL(xλ, λ)

⇔ ∂λxλ = −B−1A ∇ ˜ U(λ) = (∂λxλ)∗∇U(xλ)

slide-30
SLIDE 30

Bilevel learning: Reduced formulation

Upper level: min

λ≥0,ˆ x U(ˆ

x) Lower level: xλ := ˆ x = arg min

x L(x, λ)

⇔ ∂xL(xλ, λ) = 0 Reduced formulation: min

λ≥0 U(xλ) =: ˜

U(λ) 0 = ∂2

xL(xλ, λ)∂λxλ + ∂θ∂xL(xλ, λ)

⇔ ∂λxλ = −B−1A ∇ ˜ U(λ) = (∂λxλ)∗∇U(xλ) = −A∗B−1∇U(xλ) = −A∗w where w solves Bw = ∇U(xλ).

slide-31
SLIDE 31

Algorithm for Bilevel learning

Upper level: minλ≥0,ˆ

x U(ˆ

x) Lower level: xλ := arg minx L(x, λ) Reduced formulation: minλ≥0 U(xλ) =: ˜ U(λ) ◮ Solve reduced formulation via L-BFGS-B Nocedal and Wright 2000 ◮ Compute gradients: Given λ

(1) Compute xλ, e.g. via PDHG Chambolle and Pock 2011 (2) Solve Bw = ∇U(xλ), B := ∂2

xL(xλ, λ) e.g. via CG

(3) Compute ∇ ˜ U(λ) = −A∗w, A := ∂θ∂xL(xλ, λ)

slide-32
SLIDE 32

Learn sampling pattern in MRI

slide-33
SLIDE 33

Learn sampling pattern in MRI

Lower level (MRI reconstruction): R(λ, s, y) = arg min

x

1 2S(Fx − y)2

2 + λR(x)

  • S = diag(s),

si ∈ {0, 1}

Sherry et al. 2019, https://arxiv.org/pdf/1906.08754.pdf

slide-34
SLIDE 34

Learn sampling pattern in MRI

Upper level (learning): Given training data (x†

i , yi)n i=1, solve

min

λ≥0,s∈{0,1}m

1 n

n

  • i=1

R(λ, s, yi) − x†

i 2 2

Lower level (MRI reconstruction): R(λ, s, y) = arg min

x

1 2S(Fx − y)2

2 + λR(x)

  • S = diag(s),

si ∈ {0, 1}

Sherry et al. 2019, https://arxiv.org/pdf/1906.08754.pdf

slide-35
SLIDE 35

Learn sampling pattern in MRI

Upper level (learning): Given training data (x†

i , yi)n i=1, solve

min

λ≥0,s∈[0,1]m

1 n

n

  • i=1

R(λ, s, yi) − x†

i 2 2

Lower level (MRI reconstruction): R(λ, s, y) = arg min

x

1 2S(Fx − y)2

2 + λR(x)

  • S = diag(s),

si ∈ [0, 1]

Sherry et al. 2019, https://arxiv.org/pdf/1906.08754.pdf

slide-36
SLIDE 36

Learn sampling pattern in MRI

Upper level (learning): Given training data (x†

i , yi)n i=1, solve

min

λ≥0,s∈[0,1]m

1 n

n

  • i=1

R(λ, s, yi) − x†

i 2 2+β1s1 + β2s(1 − s)1

Lower level (MRI reconstruction): R(λ, s, y) = arg min

x

1 2S(Fx − y)2

2 + λR(x)

  • S = diag(s),

si ∈ [0, 1]

Sherry et al. 2019, https://arxiv.org/pdf/1906.08754.pdf

slide-37
SLIDE 37

Classical compressed sensing versus learned

Sherry et al. 2019

slide-38
SLIDE 38

Increasing sparsity

Reminder: Upper level (learning) min

λ≥0,s∈[0,1]m

1 n

n

  • i=1

R(λ, s, yi) − xi2

2+β1s1 + β2s(1 − s)1

β = β1 = β2

Sherry et al. 2019

slide-39
SLIDE 39

Compare regularizers

slide-40
SLIDE 40

Compare ”free” samplings

”ours” = Sherry et al. 2019 [41] = Knoll et al. 2011 [2] = Lustig et al. 2007 regularizer = dTV Ehrhardt and Betcke 2016

slide-41
SLIDE 41

Compare Cartesian samplings

”ours” = Sherry et al. 2019 [23] = G¨

  • zc¨

u et al. 2018

[2] = Lustig et al. 2007 regularizer = TV

slide-42
SLIDE 42

More insights: sampling and number of data

Sherry et al. 2019

slide-43
SLIDE 43

High resolution imaging: 10242

Sherry et al. 2019

slide-44
SLIDE 44

Conclusions and outlook

Conclusions ◮ Learn parameters via Bilevel / machine learning ◮ Learned sampling better than generic sampling ◮ ”Optimal” sampling depends on regularizer ◮ Very little data needed Outlook ◮ Investigate other algorithms tailored to problem

◮ DFO with errors in objective (ongoing work with Lindon Roberts) ◮ not based on reduced formulation, e.g. nonlinear ADMM

◮ Unrolling: replace lower level problem by algorithm ◮ End-to-end learning: learn reconstruction and sampling