faster pet reconstruction with non smooth anatomical
play

Faster PET Reconstruction with Non-Smooth Anatomical Priors by - PowerPoint PPT Presentation

Faster PET Reconstruction with Non-Smooth Anatomical Priors by Randomization and Preconditioning Matthias J. Ehrhardt Institute for Mathematical Innovation University of Bath, UK November 4, 2019 Joint work with: Mathematics : Chambolle


  1. Faster PET Reconstruction with Non-Smooth Anatomical Priors by Randomization and Preconditioning Matthias J. Ehrhardt Institute for Mathematical Innovation University of Bath, UK November 4, 2019 Joint work with: Mathematics : Chambolle (Paris), Richt´ arik (KAUST), Sch¨ onlieb (Cambridge) PET imaging : Markiewicz, Schott (both UCL)

  2. Outline 1) PET reconstruction n via Optimization � f i ( B i x ) + g ( x ) i =1 non-smooth 2) Randomized Algorithm for B i x expensive Convex Optimization 3) Numerical Evaluation: clinical PET imaging

  3. PET Reconstruction 1 � N � � u λ ∈ arg min KL( b i ; A i u + r i ) + λ R ( u ; v ) + ı + ( u ) u i =1 ◮ Kullback–Leibler divergence � � � b y − b + b log if y > 0 y KL( b ; y ) = ∞ else ◮ Nonnegativity constraint � 0 , if u i ≥ 0 for all i ı + ( u ) = ∞ , else ◮ Regularizer : e.g. R ( u ; v ) = TV( u ) 1 Brune ’10 , Brune et al. ’10 , Setzer et al. ’10 , M¨ uller et al. ’11 , Anthoine et al. ’12 , Knoll et al. ’16 , Ehrhardt et al. ’16 , Hohage and Werner ’16 , Schramm et al. ’17 , Rasch et al. ’17 , Ehrhardt et al. ’17 , Mehranian et al. ’17 and many, many more

  4. PET Reconstruction 1 � N � � u λ ∈ arg min KL( b i ; A i u + r i ) + λ R ( u ; v ) + ı + ( u ) u i =1 ◮ Kullback–Leibler divergence � � � b y − b + b log if y > 0 y KL( b ; y ) = ∞ else ◮ Nonnegativity constraint � 0 , if u i ≥ 0 for all i ı + ( u ) = ∞ , else ◮ Regularizer : e.g. R ( u ; v ) = TV( u ) How to incorporate MRI information into R ? 1 Brune ’10 , Brune et al. ’10 , Setzer et al. ’10 , M¨ uller et al. ’11 , Anthoine et al. ’12 , Knoll et al. ’16 , Ehrhardt et al. ’16 , Hohage and Werner ’16 , Schramm et al. ’17 , Rasch et al. ’17 , Ehrhardt et al. ’17 , Mehranian et al. ’17 and many, many more

  5. Directional Total Variation π 0 Let �∇ v � = 1. Then u and v have Parallel Level Sets iff u ∼ v ⇔ ∇ u � ∇ v ⇔ ∇ u − �∇ u , ∇ v �∇ v = 0

  6. Directional Total Variation π 0 Let �∇ v � = 1. Then u and v have Parallel Level Sets iff u ∼ v ⇔ ∇ u � ∇ v ⇔ ∇ u − �∇ u , ∇ v �∇ v = 0 Definition: The Directional Total Variation (dTV) of u is � � [ I − ξ i ξ T dTV( u ) := i ] ∇ u i � , 0 ≤ � ξ i � ≤ 1 i Ehrhardt and Betcke ’16 , related to Kaipio et al. ’99, Bayram and Kamasak ’12 ◮ If ξ i = 0, then dTV = TV. η = �∇ v i � 2 + η 2 , ∇ v i ◮ ξ i = �∇ v i � 2 �∇ v i � η , η > 0

  7. PET Reconstruction Partition data in subsets S j : D j ( y ) := � i ∈ S j KL( b i ; y i )   m   � min D j ( A j u ) + λ � D ∇ u � 1 + ı + ( u ) u   j =1

  8. PET Reconstruction Partition data in subsets S j : D j ( y ) := � i ∈ S j KL( b i ; y i )   . m . .   � min D j ( A j u ) + λ � D ∇ u � 1 + ı + ( u ) u   j =1 � n n = m + 1 g ( x ) = ı + ( x ) � � min f i ( B i x ) + g ( x ) B i = A i f i = D i i = 1 , . . . , m x B n = D ∇ f n = λ � · � 1 i =1

  9. PET Reconstruction Partition data in subsets S j : D j ( y ) := � i ∈ S j KL( b i ; y i )   m   � min D j ( A j u ) + λ � D ∇ u � 1 + ı + ( u ) u   j =1 � n n = m + 1 g ( x ) = ı + ( x ) � � min f i ( B i x ) + g ( x ) B i = A i f i = D i i = 1 , . . . , m x B n = D ∇ f n = λ � · � 1 i =1 • f i , g are non-smooth but can compute proximal operator � 1 � 2 � z − x � 2 + f ( z ) prox f ( x ) := arg min . z • Cannot compute proximal operator of f i ◦ B i • B i x is expensive to compute

  10. Optimization

  11. Primal-Dual Hybrid Gradient (PDHG) Algorithm 1 Given x 0 , y 0 , y 0 = y 0 (1) x k +1 = prox T g ( x k − T � n i =1 B ∗ i y k i ) (2) y k +1 = prox S i i ( y k i + S i B i x k +1 ) i = 1 , . . . , n i f ∗ (3) y k +1 = y k +1 + θ ( y k +1 − y k i ) i = 1 , . . . , n i i i ◮ evaluation of B i and B ∗ i � 1 ◮ proximal operator: prox S 2 � z − x � 2 � f ( x ) := arg min z S + f ( z ) ◮ convergence: θ = 1 , C i = S 1 / 2 B i T 1 / 2 i 2 � �   C 1 � � � . � . < 1 �   � . � �   � � C n � � 1 Pock, Cremers, Bischof, Chambolle ’09 , Chambolle and Pock ’11

  12. Primal-Dual Hybrid Gradient (PDHG) Algorithm 1 Given x 0 , y 0 , y 0 = y 0 (1) x k +1 = prox T g ( x k − T � n i =1 B ∗ i y k i ) (2) y k +1 = prox S i i ( y k i + S i B i x k +1 ) i = 1 , . . . , n i f ∗ (3) y k +1 = y k +1 + θ ( y k +1 − y k i ) i = 1 , . . . , n i i i ◮ evaluation of B i and B ∗ i � 1 ◮ proximal operator: prox S 2 � z − x � 2 � f ( x ) := arg min z S + f ( z ) ◮ convergence: θ = 1 , C i = S 1 / 2 B i T 1 / 2 i 2 � �   C 1 � � � . � . < 1 �   � . � �   � � C n � � 1 Pock, Cremers, Bischof, Chambolle ’09 , Chambolle and Pock ’11

  13. Stochastic PDHG Algorithm 1 Given x 0 , y 0 , y 0 = y 0 (1) x k +1 = prox T g ( x k − T � n i =1 B ∗ i y k i ) Select j k +1 ∈ { 1 , . . . , n } randomly. � prox S i i ( y k i + S i B i x k +1 ) i = j k +1 (2) y k +1 f ∗ = i y k else i � y k +1 + θ p i ( y k +1 − y k i = j k +1 i ) (3) y k +1 i i = i y k +1 else i ◮ probabilities p i := P ( i = j k +1 ) > 0 ( proper sampling) i for i = j k +1 + memory ◮ Compute � n i =1 B ∗ i using only B ∗ i y k 1 Chambolle, E, Richt´ arik, Sch¨ onlieb ’18

  14. Stochastic PDHG Algorithm 1 Given x 0 , y 0 , y 0 = y 0 (1) x k +1 = prox T g ( x k − T � n i =1 B ∗ i y k i ) Select j k +1 ∈ { 1 , . . . , n } randomly. � prox S i i ( y k i + S i B i x k +1 ) i = j k +1 (2) y k +1 f ∗ = i y k else i � y k +1 + θ p i ( y k +1 − y k i = j k +1 i ) (3) y k +1 i i = i y k +1 else i ◮ probabilities p i := P ( i = j k +1 ) > 0 ( proper sampling) i for i = j k +1 + memory ◮ Compute � n i =1 B ∗ i using only B ∗ i y k ◮ evaluation of B i and B ∗ i only for i = j k +1 . 1 Chambolle, E, Richt´ arik, Sch¨ onlieb ’18

  15. Convergence of SPDHG Definition: Let p ∈ ∂ f ( v ). The D p f ( u ) f ( u , v ) Bregman distance of f is defined as � � D p f ( u , v ) = f ( u ) − f ( v )+ � p , u − v � . f ( v )+ � p , u − v � f v u

  16. Convergence of SPDHG Definition: Let p ∈ ∂ f ( v ). The D p f ( u ) f ( u , v ) Bregman distance of f is defined as � � D p f ( u , v ) = f ( u ) − f ( v )+ � p , u − v � . f ( v )+ � p , u − v � f v u Theorem: Chambolle, E, Richt´ arik, Sch¨ onlieb ’18 Let ( x ♯ , y ♯ ) be a saddle point, choose θ = 1 and step sizes S i , T := min i T i such that 2 � � � S 1 / 2 B i T 1 / 2 < p i i = 1 , . . . , n . � � i i � g ( x k , x ♯ ) + D q ♯ Then almost surely D r ♯ f ∗ ( y k , y ♯ ) → 0.

  17. Step-sizes and Preconditioning Theorem: E, Markiewicz, Sch¨ onlieb ’18 � 2 < p i is satisfied by Let ρ < 1. Then � S 1 / 2 B i T 1 / 2 i i ρ p i S i = � B i � I , T i = � B i � I . If B i ≥ 0, then the step-size condition is also satisfied for � ρ � p i � � S i = diag , T i = diag . B i 1 B T i 1

  18. Step-sizes and Preconditioning Theorem: E, Markiewicz, Sch¨ onlieb ’18 � 2 < p i is satisfied by Let ρ < 1. Then � S 1 / 2 B i T 1 / 2 i i ρ p i S i = � B i � I , T i = � B i � I . If B i ≥ 0, then the step-size condition is also satisfied for � ρ � p i � � S i = diag , T i = diag . B i 1 B T i 1

  19. Application

  20. Sanity Check: Convergence to Saddle Point (dTV) saddle point (5000 iter PDHG) SPDHG (20 epochs, 252 subsets)

  21. More subsets are faster Number of subsets : m = 1 , 21 , 100 , 252

  22. ”Balanced sampling” is faster uniform sampling: p i = 1 / n � 1 i < n 2 m balanced sampling: p i = 1 i = n 2

  23. Preconditioning is faster ρ p i Scalar step sizes: S i = � B i � I , T i = � B i � I Preconditioned (vector-valued) step sizes: � ρ � p i � � S i = diag , T i = diag B i 1 B T i 1

  24. FDG PDHG (5000 iter) SPDHG (252 subsets, precond, balanced, 20 epochs)

  25. FDG, 20 epochs PDHG SPDHG (252 subsets, precond, balanced)

  26. FDG, 10 epochs PDHG SPDHG (252 subsets, precond, balanced)

  27. FDG, 5 epochs PDHG SPDHG (252 subsets, precond, balanced)

  28. FDG, 1 epoch PDHG SPDHG (252 subsets, precond, balanced)

  29. Florbetapir PDHG (5000 iter) SPDHG (252 subsets, precond, balanced, 20 epochs)

  30. Florbetapir, 20 epochs PDHG SPDHG (252 subsets, precond, balanced)

  31. Florbetapir, 10 epochs PDHG SPDHG (252 subsets, precond, balanced)

  32. Florbetapir, 5 epochs PDHG SPDHG (252 subsets, precond, balanced)

  33. Florbetapir, 1 epoch PDHG SPDHG (252 subsets, precond, balanced)

  34. Conclusions and Outlook deterministic Summary: ◮ Randomized optimization which exploits “separable structure” ◮ More subsets, balanced sampling and randomized preconditioning all speed up ◮ only 5-20 epochs needed for advanced models on clinical data Future work: ◮ evaluation in concrete situations (with Addenbrookes’ Cambridge) ◮ sampling : 1) optimal, 2) adaptive

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend