cuDIMOT: A CUDA toolbox for modelling the brain tissue - - PowerPoint PPT Presentation

cudimot a cuda toolbox for modelling the brain tissue
SMART_READER_LITE
LIVE PREVIEW

cuDIMOT: A CUDA toolbox for modelling the brain tissue - - PowerPoint PPT Presentation

cuDIMOT: A CUDA toolbox for modelling the brain tissue microstructure from diffusion-MRI Mois es Hern andez Fern andez Istvan Reguly, Mike Giles, Stephen Smith and Stamatios N. Sotiropoulos GPU Technology Conference Europe 2017 Talk


slide-1
SLIDE 1

cuDIMOT: A CUDA toolbox for modelling the brain tissue microstructure from diffusion-MRI

Mois´ es Hern´ andez Fern´ andez Istvan Reguly, Mike Giles, Stephen Smith and Stamatios N. Sotiropoulos

GPU Technology Conference Europe 2017 Talk ID: 23165

slide-2
SLIDE 2

Brain tissue microstructure

We want to gain information about tissue microstructure from diffusion MRI (dMRI) data: Understand the brain mechanisms Develop new biomarkers

Fibres dispersion

1

Fibres Orientation

Superior - Inferior Anterior - Posterior Medial - Lateral

Mois´ es Hern´ andez Fern´ andez, FMRIB cuDIMOT 2 / 23

slide-3
SLIDE 3

Outline

  • 1. Diffusion MRI
  • 2. cuDIMOT: CUDA Diffusion Modelling Toolbox

Parallel design Functionality and features

  • 3. Results: Validation & Performance gains
  • 4. Conclusions

Mois´ es Hern´ andez Fern´ andez, FMRIB Outline 3 / 23

slide-4
SLIDE 4

diffusion MRI (dMRI)

Molecules are in constant

  • motion. We want to quantify

water diffusion within a tissue. Different tissues: Grey Matter: Diffusion without preferred direction. White Matter: Diffusion along preferred direction. Information about tissue microstructure features can be gained getting several diffusion-weighted measurements and modelling the diffusion process using biophysical parameters.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. diffusion MRI

4 / 23

slide-5
SLIDE 5

diffusion MRI (dMRI)

CSF Grey matter

Particle 1 Particle 2 Particle 3

White matter

Molecules are in constant

  • motion. We want to quantify

water diffusion within a tissue. Different tissues: Grey Matter: Diffusion without preferred direction. White Matter: Diffusion along preferred direction. Information about tissue microstructure features can be gained getting several diffusion-weighted measurements and modelling the diffusion process using biophysical parameters.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. diffusion MRI

4 / 23

slide-6
SLIDE 6

diffusion MRI (dMRI)

Molecules are in constant

  • motion. We want to quantify

water diffusion within a tissue. Different tissues: Grey Matter: Diffusion without preferred direction. White Matter: Diffusion along preferred direction. Information about tissue microstructure features can be gained getting several diffusion-weighted measurements and modelling the diffusion process using biophysical parameters.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. diffusion MRI

4 / 23

slide-7
SLIDE 7

cuDIMOT: Motivation

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 2. cuDIMOT: Parellel design

5 / 23

slide-8
SLIDE 8

cuDIMOT: Design

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 2. cuDIMOT: Parellel design

6 / 23

slide-9
SLIDE 9

cuDIMOT: Parallel design v1

Vx VOXELS Vy VOXELS Thread Block 0

................................... 1 K-1 2

Thread Block (Vx*Vy/K) -1

................................... 1 2

. . .

K-1

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 2. cuDIMOT: Parellel design

7 / 23

slide-10
SLIDE 10

cuDIMOT: Parallel design v2

Thread 1 Thread 0 (Leader) Thread 2 Thread 31

...

m63

... ...

m31 m65 m1 m2 m34 m0 m64 m32 m33

... M measurements

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 2. cuDIMOT: Parellel design

8 / 23

slide-11
SLIDE 11

cuDIMOT: Parallel design v2

Vx VOXELS Vy VOXELS Thread Block 0

. .

Group 0 Group B-1

Thread Block (Vx*Vy/B) -1

. . .

Group 0 Group B-1 .................................. 1

.

2 31 .................................. 1 2 .................................. 1 2 .................................. 1 2 31 31 31 Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 2. cuDIMOT: Parellel design

9 / 23

slide-12
SLIDE 12

cuDIMOT: Levenberg Implementation

Levenberg

Thread working Idle thread

I iterations Active Threads in a Block

0 1 2 3 4 ......

...... ......

STEPS:

  • 1. Calculate partial derivatives for all the parameters.
  • 2. Calculate Gradient & Jacobian & Hessian
  • 3. LU solver
  • 4. Compute model predicted signal & squared residuals
  • 5. Calculate Cost function Accept/Reject step & Adapt λ
  • 1. Step 1
  • 3. Step 2
  • 2. synchronise()
  • 4. Step 3
  • 5. synchronise()
  • 6. Step 4
  • 7. synchronise()
  • 8. Step 5

...... ......

30 31

Cost function: sum of squared differences between measurements and model predictions Gradient descent method: needs partial derivatives for Gradient and Jacobian (NParameters × NMeasurements) Threads collaborate for computing the partial derivatives Hessian = Jacobian ∗ JacobianT Shuffle instructions 2 warps (and voxels) per block

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 2. cuDIMOT: Parellel design

10 / 23

slide-13
SLIDE 13

A Generic toolbox

Options at compilation time: Bounds - Lower and/or Upper limits (any routine). Levenberg kernel implements reparameterisations internally Priors (MCMC):

Gaussian probability distribution Gamma probability distribution Automatic Relevance Determination (ARD) Angle uniformly distributed on a sphere

Constraints: relation between parameters Different noise models: Gaussian & Rician Numerical differentiation in Levenberg kernel

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 2. cuDIMOT: Functionality

11 / 23

slide-14
SLIDE 14

A Generic toolbox

Options at compilation time:

MACRO T Predicted_Signal (int npar, T* P, T* CFP, T* FixP){ return P[0]*exp(-P[1]*CFP[0]); } MACRO void Partial_Derivatives (int npar, T* P, T* CFP, T* FixP, T* derivatives){ derivatives[0]=exp(-P[1]*CFP[0]); derivatives[1]=-P[0]*CFP[0]*exp(-P[1]*CFP[0]); } bounds[0] = (80,120) bounds[1] = (,1.5) prior[0] = Gaussian(100,10) prior[1] = ARD(1) Model Predicted Signal f(θ)=θ1*exp(-θ2*x) and Partial Derivatives Parameters Bounds and Priors

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 2. cuDIMOT: Functionality

12 / 23

slide-15
SLIDE 15

A Flexible toolbox

Functionality at execution time: Choosing fitting routines: Grid search, Levenberg-Marquardt, MCMC Selecting number of iterations in Levenberg-Marquardt Selecting number of iterations in MCMC (burn-in, total, sample thinning interval) Cascaded model fitting (Initialising parameters) Choose parameters of the model to be kept fixed during the fitting process Bayesian & Akaike Inference Criterion The toolbox can be easily extended

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 2. cuDIMOT: Functionality

13 / 23

slide-16
SLIDE 16

Validation: Fibre Orientation

We have implemented several diffusion models Fibre Orientation estimation:

Superior - Inferior Anterior - Posterior Medial - Lateral

CPU GPU Coronal Sagittal Axial cuDIMOT

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 3. Results: Validation

14 / 23

slide-17
SLIDE 17

Validation: Fibre Orientation

Mean estimates: 1000 repeats

S0 d f1 f2 Corpus Callosum Centrum Semiovale Grey Matter CPU GPU cuDIMOT

1 1775 1780 1785 1790 1795 1800 1 0.0020 0.0022 0.0024 0.0026 0.0028 1 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 1 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 1 1.0 1.1 1.2 1.3 1.4 1 3.04 3.05 3.06 3.07 3.08 3.09 1 1910 1920 1930 1940 1950 1960 1970 1 0.00090 0.00095 0.00100 0.00105 0.00110 0.00115 0.00120 0.00125 1 0.225 0.230 0.235 0.240 0.245 0.250 0.255 0.260 1 0.195 0.200 0.205 0.210 0.215 0.220 0.225 1 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1 2.075 2.080 2.085 2.090 2.095 2.100 2.105 2.110 2.115 1 1310 1320 1330 1340 1350 1 0.00095 0.00100 0.00105 0.00110 0.00115 0.00120 1 0.12 0.14 0.16 0.18 0.20 1 0.00 0.02 0.04 0.06 0.08 0.10 0.12 1 1.3 1.4 1.5 1.6 1.7 1.8 1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

th1 ph1

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 3. Results: Validation

15 / 23

slide-18
SLIDE 18

Validation: Fibre Orientation

Standard deviation estimates: 1000 repeats

S0 d f1 f2 Orientation Uncertainty1 Corpus Callosum Centrum Semiovale Grey Matter CPU GPU cuDIMOT

1 12 14 16 18 20 22 24 26 28 1 0.00005 0.00010 0.00015 0.00020 0.00025 0.00030 1 0.02 0.04 0.06 0.08 0.10 0.12 1 0.02 0.04 0.06 0.08 0.10 0.12 1 0.000 0.001 0.002 0.003 0.004 0.005 0.006 1 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 1 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012 1 0.008 0.010 0.012 0.014 0.016 1 0.008 0.010 0.012 0.014 0.016 1 0.002 0.003 0.004 0.005 0.006 1 12 14 16 18 20 22 24 26 1 0.00002 0.00004 0.00006 0.00008 0.00010 1 0.01 0.02 0.03 0.04 0.05 0.06 1 0.00 0.01 0.02 0.03 0.04 0.05 0.06 1 0.0 0.1 0.2 0.3 0.4

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 3. Results: Validation

16 / 23

slide-19
SLIDE 19

Validation: Fibre Dispersion

NODDI Watson MATLAB NODDI Watson cuDIMOT

Fiso Fintra OD 0.2 Difference % Fiso Difference % Fintra Difference % OD 0% 20% Resources / Time

72 CPU cores 40 hours 1 GPU 6.8 minutes

1

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 3. Results: Validation

17 / 23

slide-20
SLIDE 20

Validation: Fibre Dispersion

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 3. Results: Validation

18 / 23

slide-21
SLIDE 21

Performance

Time fitting different dMRI models

101 102 103 104 105

Times in seconds (logarithm scale)

CPU tool - 72 cores cuDIMOT - single K80 GPU

NODDI Watson Matlab NODDI Bingham Matlab Ball & 1 Stick C++ Ball & 1 Stick Gamma C++ Ball & 2 Sticks C++ Ball & 2 Sticks Gamma C++

Speedup 352x Speedup 6.98x Speedup 3.7x Speedup 3.26x Speedup 3.88x Speedup 3.85x

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 3. Results: Performance gains

19 / 23

slide-22
SLIDE 22

Performance limitations

50% 100% 0% 25% 70% Global Memory Load Efficiency 91.4% 50% 100% 0% 25% 70% Global Memory Store Efficiency 90.5% Medium Max Idle Low High Global Memory Bandwidth Reads: Writes: Total: ECC Overhead: 95.23 GB/s 61.50 GB/s 156.73 GB/s 40.36 GB/s

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 3. Results: Performance gains

20 / 23

slide-23
SLIDE 23

Conclusions

Diffusion MRI allows the study of brain microstructure non-invasively and in-vivo, but it can be very time-consuming. cuDIMOT: We have designed and implemented a generic and flexible CUDA toolbox for nonlinear model fitting (new models can easily be implemented on GPUs). It reduces computational times: ∼200X. These accelerations are tremendously beneficial, especially in very large recent studies such as:

The Human Connectome Project (HCP): data from 1,200 adults The UK Biobank Project: data from 100,000 adults.

cuDIMOT can be used in other modalities and can be easily extended

Mois´ es Hern´ andez Fern´ andez, FMRIB Conclusions 21 / 23

slide-24
SLIDE 24

Acknowledgements

Funding: Human Connectome Project (1U54MH091657-01) UK EPSRC (EP/L023067/1)

Mois´ es Hern´ andez Fern´ andez, FMRIB Acknowledgements 22 / 23

slide-25
SLIDE 25

cuDIMOT

Mois´ es Hern´ andez Fern´ andez, FMRIB cuDIMOT 23 / 23