Faster PET Reconstruction with Non-Smooth Anatomical Priors by - - PowerPoint PPT Presentation

faster pet reconstruction with non smooth anatomical
SMART_READER_LITE
LIVE PREVIEW

Faster PET Reconstruction with Non-Smooth Anatomical Priors by - - PowerPoint PPT Presentation

Faster PET Reconstruction with Non-Smooth Anatomical Priors by Randomization and Preconditioning Matthias J. Ehrhardt Institute for Mathematical Innovation University of Bath, UK November 4, 2019 Joint work with: Mathematics : Chambolle


slide-1
SLIDE 1

Faster PET Reconstruction with Non-Smooth Anatomical Priors by Randomization and Preconditioning

Matthias J. Ehrhardt

Institute for Mathematical Innovation University of Bath, UK

November 4, 2019

Joint work with: Mathematics: Chambolle (Paris), Richt´ arik (KAUST), Sch¨

  • nlieb (Cambridge)

PET imaging: Markiewicz, Schott (both UCL)

slide-2
SLIDE 2

Outline

1) PET reconstruction via Optimization

n

  • i=1

fi(Bix) + g(x) 2) Randomized Algorithm for Convex Optimization non-smooth Bix expensive 3) Numerical Evaluation: clinical PET imaging

slide-3
SLIDE 3

PET Reconstruction1

uλ ∈ arg min

u

N

  • i=1

KL(bi; Aiu + ri) + λR(u; v) + ı+(u)

  • ◮ Kullback–Leibler divergence

KL(b; y) =

  • y − b + b log
  • b

y

  • if y > 0

∞ else ◮ Nonnegativity constraint ı+(u) =

  • 0,

if ui ≥ 0 for all i ∞, else ◮ Regularizer: e.g. R(u; v) = TV(u)

1Brune ’10, Brune et al. ’10, Setzer et al. ’10, M¨

uller et al. ’11, Anthoine et al. ’12, Knoll et al. ’16, Ehrhardt et al. ’16, Hohage and Werner ’16, Schramm et al. ’17, Rasch et al. ’17, Ehrhardt et al. ’17, Mehranian et al. ’17 and many, many more

slide-4
SLIDE 4

PET Reconstruction1

uλ ∈ arg min

u

N

  • i=1

KL(bi; Aiu + ri) + λR(u; v) + ı+(u)

  • ◮ Kullback–Leibler divergence

KL(b; y) =

  • y − b + b log
  • b

y

  • if y > 0

∞ else ◮ Nonnegativity constraint ı+(u) =

  • 0,

if ui ≥ 0 for all i ∞, else ◮ Regularizer: e.g. R(u; v) = TV(u) How to incorporate MRI information into R?

1Brune ’10, Brune et al. ’10, Setzer et al. ’10, M¨

uller et al. ’11, Anthoine et al. ’12, Knoll et al. ’16, Ehrhardt et al. ’16, Hohage and Werner ’16, Schramm et al. ’17, Rasch et al. ’17, Ehrhardt et al. ’17, Mehranian et al. ’17 and many, many more

slide-5
SLIDE 5

Directional Total Variation

π Let ∇v = 1. Then u and v have Parallel Level Sets iff u ∼ v ⇔ ∇u ∇v ⇔ ∇u − ∇u, ∇v∇v = 0

slide-6
SLIDE 6

Directional Total Variation

π Let ∇v = 1. Then u and v have Parallel Level Sets iff u ∼ v ⇔ ∇u ∇v ⇔ ∇u − ∇u, ∇v∇v = 0 Definition: The Directional Total Variation (dTV) of u is dTV(u) :=

  • i

[I − ξiξT

i ]∇ui,

0 ≤ ξi ≤ 1

Ehrhardt and Betcke ’16, related to Kaipio et al. ’99, Bayram and Kamasak ’12

◮ If ξi = 0, then dTV = TV. ◮ ξi =

∇vi ∇viη ,

∇vi2

η = ∇vi2 + η2,

η > 0

slide-7
SLIDE 7

PET Reconstruction

Partition data in subsets Sj: Dj(y) :=

i∈Sj KL(bi; yi)

min

u

  

m

  • j=1

Dj(Aju) + λD∇u1 + ı+(u)   

slide-8
SLIDE 8

PET Reconstruction

Partition data in subsets Sj: Dj(y) :=

i∈Sj KL(bi; yi)

min

u

  

m

  • j=1

Dj(Aju) + λD∇u1 + ı+(u)   

. . .

min

x

n

  • i=1

fi(Bix) + g(x)

  • n = m + 1

g(x) = ı+(x) Bi = Ai fi = Di i = 1, . . . , m Bn = D∇ fn = λ · 1

slide-9
SLIDE 9

PET Reconstruction

Partition data in subsets Sj: Dj(y) :=

i∈Sj KL(bi; yi)

min

u

  

m

  • j=1

Dj(Aju) + λD∇u1 + ı+(u)    min

x

n

  • i=1

fi(Bix) + g(x)

  • n = m + 1

g(x) = ı+(x) Bi = Ai fi = Di i = 1, . . . , m Bn = D∇ fn = λ · 1

  • fi, g are non-smooth but can compute proximal operator

proxf (x) := arg min

z

1 2z − x2 + f (z)

  • .
  • Cannot compute proximal operator of fi ◦ Bi
  • Bix is expensive to compute
slide-10
SLIDE 10

Optimization

slide-11
SLIDE 11

Primal-Dual Hybrid Gradient (PDHG) Algorithm1

Given x0, y0, y0 = y0 (1) xk+1 = proxT

g (xk − Tn i=1B∗ i yk i )

(2) yk+1

i

= proxSi

f ∗

i (yk

i + SiBixk+1)

i = 1, . . . , n (3) yk+1

i

= yk+1

i

+ θ(yk+1

i

− yk

i )

i = 1, . . . , n ◮ evaluation of Bi and B∗

i

◮ proximal operator: proxS

f (x) := arg minz

1

2z − x2 S + f (z)

  • ◮ convergence: θ = 1, Ci = S1/2

i

BiT1/2

  C1 . . . Cn   

  • 2

< 1

1Pock, Cremers, Bischof, Chambolle ’09, Chambolle and Pock ’11

slide-12
SLIDE 12

Primal-Dual Hybrid Gradient (PDHG) Algorithm1

Given x0, y0, y0 = y0 (1) xk+1 = proxT

g (xk − Tn i=1B∗ i yk i )

(2) yk+1

i

= proxSi

f ∗

i (yk

i + SiBixk+1)

i = 1, . . . , n (3) yk+1

i

= yk+1

i

+ θ(yk+1

i

− yk

i )

i = 1, . . . , n ◮ evaluation of Bi and B∗

i

◮ proximal operator: proxS

f (x) := arg minz

1

2z − x2 S + f (z)

  • ◮ convergence: θ = 1, Ci = S1/2

i

BiT1/2

  C1 . . . Cn   

  • 2

< 1

1Pock, Cremers, Bischof, Chambolle ’09, Chambolle and Pock ’11

slide-13
SLIDE 13

Stochastic PDHG Algorithm1

Given x0, y0, y0 = y0 (1) xk+1 = proxT

g (xk − Tn i=1 B∗ i yk i )

Select jk+1 ∈ {1, . . . , n} randomly. (2) yk+1

i

=

  • proxSi

f ∗

i (yk

i + SiBixk+1)

i = jk+1 yk

i

else (3) yk+1

i

=

  • yk+1

i

+ θ

pi (yk+1 i

− yk

i )

i = jk+1 yk+1

i

else ◮ probabilities pi := P(i = jk+1) > 0 (proper sampling) ◮ Compute n

i=1 B∗ i yk i using only B∗ i for i = jk+1 + memory

1Chambolle, E, Richt´

arik, Sch¨

  • nlieb ’18
slide-14
SLIDE 14

Stochastic PDHG Algorithm1

Given x0, y0, y0 = y0 (1) xk+1 = proxT

g (xk − Tn i=1 B∗ i yk i )

Select jk+1 ∈ {1, . . . , n} randomly. (2) yk+1

i

=

  • proxSi

f ∗

i (yk

i + SiBixk+1)

i = jk+1 yk

i

else (3) yk+1

i

=

  • yk+1

i

+ θ

pi (yk+1 i

− yk

i )

i = jk+1 yk+1

i

else ◮ probabilities pi := P(i = jk+1) > 0 (proper sampling) ◮ Compute n

i=1 B∗ i yk i using only B∗ i for i = jk+1 + memory

◮ evaluation of Bi and B∗

i only for i = jk+1.

1Chambolle, E, Richt´

arik, Sch¨

  • nlieb ’18
slide-15
SLIDE 15

Convergence of SPDHG

Definition: Let p ∈ ∂f (v). The Bregman distance of f is defined as Dp

f (u, v) = f (u)−

  • f (v)+p, u−v
  • .

v u Dp

f (u, v)

f (u) f f (v)+p, u−v

slide-16
SLIDE 16

Convergence of SPDHG

Definition: Let p ∈ ∂f (v). The Bregman distance of f is defined as Dp

f (u, v) = f (u)−

  • f (v)+p, u−v
  • .

v u Dp

f (u, v)

f (u) f f (v)+p, u−v Theorem: Chambolle, E, Richt´

arik, Sch¨

  • nlieb ’18

Let (x♯, y♯) be a saddle point, choose θ = 1 and step sizes Si, T := mini Ti such that

  • S1/2

i

BiT1/2

i

  • 2

< pi i = 1, . . . , n. Then almost surely Dr♯

g (xk, x♯) + Dq♯ f ∗(yk, y♯) → 0.

slide-17
SLIDE 17

Step-sizes and Preconditioning

Theorem: E, Markiewicz, Sch¨

  • nlieb ’18

Let ρ < 1. Then S1/2

i

BiT1/2

i

2 < pi is satisfied by Si = ρ BiI , Ti = pi BiI . If Bi ≥ 0, then the step-size condition is also satisfied for Si = diag ρ Bi1

  • ,

Ti = diag pi BT

i 1

  • .
slide-18
SLIDE 18

Step-sizes and Preconditioning

Theorem: E, Markiewicz, Sch¨

  • nlieb ’18

Let ρ < 1. Then S1/2

i

BiT1/2

i

2 < pi is satisfied by Si = ρ BiI , Ti = pi BiI . If Bi ≥ 0, then the step-size condition is also satisfied for Si = diag ρ Bi1

  • ,

Ti = diag pi BT

i 1

  • .
slide-19
SLIDE 19

Application

slide-20
SLIDE 20

Sanity Check: Convergence to Saddle Point (dTV)

saddle point (5000 iter PDHG) SPDHG (20 epochs, 252 subsets)

slide-21
SLIDE 21

More subsets are faster

Number of subsets: m = 1, 21, 100, 252

slide-22
SLIDE 22

”Balanced sampling” is faster

uniform sampling: pi = 1/n balanced sampling: pi =

  • 1

2m

i < n

1 2

i = n

slide-23
SLIDE 23

Preconditioning is faster

Scalar step sizes: Si = ρ BiI , Ti = pi BiI Preconditioned (vector-valued) step sizes: Si = diag ρ Bi1

  • ,

Ti = diag pi BT

i 1

slide-24
SLIDE 24

FDG

PDHG (5000 iter) SPDHG (252 subsets, precond, balanced, 20 epochs)

slide-25
SLIDE 25

FDG, 20 epochs

PDHG SPDHG (252 subsets, precond, balanced)

slide-26
SLIDE 26

FDG, 10 epochs

PDHG SPDHG (252 subsets, precond, balanced)

slide-27
SLIDE 27

FDG, 5 epochs

PDHG SPDHG (252 subsets, precond, balanced)

slide-28
SLIDE 28

FDG, 1 epoch

PDHG SPDHG (252 subsets, precond, balanced)

slide-29
SLIDE 29

Florbetapir

PDHG (5000 iter) SPDHG (252 subsets, precond, balanced, 20 epochs)

slide-30
SLIDE 30

Florbetapir, 20 epochs

PDHG SPDHG (252 subsets, precond, balanced)

slide-31
SLIDE 31

Florbetapir, 10 epochs

PDHG SPDHG (252 subsets, precond, balanced)

slide-32
SLIDE 32

Florbetapir, 5 epochs

PDHG SPDHG (252 subsets, precond, balanced)

slide-33
SLIDE 33

Florbetapir, 1 epoch

PDHG SPDHG (252 subsets, precond, balanced)

slide-34
SLIDE 34

Conclusions and Outlook

Summary: ◮ Randomized optimization which exploits “separable structure” ◮ More subsets, balanced sampling and preconditioning all speed up ◮ only 5-20 epochs needed for advanced models on clinical data Future work: ◮ evaluation in concrete situations (with Addenbrookes’ Cambridge) ◮ sampling: 1) optimal, 2) adaptive deterministic randomized