cuDIMOT: A CUDA toolbox for modelling the brain tissue microstructure from diffusion-MRI
Mois´ es Hern´ andez Fern´ andez Istvan Reguly, Mike Giles, Stephen Smith and Stamatios N. Sotiropoulos
GPU Technology Conference Europe 2017 Talk ID: 23165
cuDIMOT: A CUDA toolbox for modelling the brain tissue - - PowerPoint PPT Presentation
cuDIMOT: A CUDA toolbox for modelling the brain tissue microstructure from diffusion-MRI Mois es Hern andez Fern andez Istvan Reguly, Mike Giles, Stephen Smith and Stamatios N. Sotiropoulos GPU Technology Conference Europe 2017 Talk
cuDIMOT: A CUDA toolbox for modelling the brain tissue microstructure from diffusion-MRI
Mois´ es Hern´ andez Fern´ andez Istvan Reguly, Mike Giles, Stephen Smith and Stamatios N. Sotiropoulos
GPU Technology Conference Europe 2017 Talk ID: 23165
We want to gain information about tissue microstructure from diffusion MRI (dMRI) data: Understand the brain mechanisms Develop new biomarkers
Fibres dispersion
1
Fibres Orientation
Superior - Inferior Anterior - Posterior Medial - Lateral
Mois´ es Hern´ andez Fern´ andez, FMRIB cuDIMOT 2 / 23
Parallel design Functionality and features
Mois´ es Hern´ andez Fern´ andez, FMRIB Outline 3 / 23
Molecules are in constant
water diffusion within a tissue. Different tissues: Grey Matter: Diffusion without preferred direction. White Matter: Diffusion along preferred direction. Information about tissue microstructure features can be gained getting several diffusion-weighted measurements and modelling the diffusion process using biophysical parameters.
Mois´ es Hern´ andez Fern´ andez, FMRIB
4 / 23
CSF Grey matter
Particle 1 Particle 2 Particle 3
White matter
Molecules are in constant
water diffusion within a tissue. Different tissues: Grey Matter: Diffusion without preferred direction. White Matter: Diffusion along preferred direction. Information about tissue microstructure features can be gained getting several diffusion-weighted measurements and modelling the diffusion process using biophysical parameters.
Mois´ es Hern´ andez Fern´ andez, FMRIB
4 / 23
Molecules are in constant
water diffusion within a tissue. Different tissues: Grey Matter: Diffusion without preferred direction. White Matter: Diffusion along preferred direction. Information about tissue microstructure features can be gained getting several diffusion-weighted measurements and modelling the diffusion process using biophysical parameters.
Mois´ es Hern´ andez Fern´ andez, FMRIB
4 / 23
Mois´ es Hern´ andez Fern´ andez, FMRIB
5 / 23
Mois´ es Hern´ andez Fern´ andez, FMRIB
6 / 23
Vx VOXELS Vy VOXELS Thread Block 0
................................... 1 K-1 2
Thread Block (Vx*Vy/K) -1
................................... 1 2
K-1
Mois´ es Hern´ andez Fern´ andez, FMRIB
7 / 23
Thread 1 Thread 0 (Leader) Thread 2 Thread 31
... M measurements
Mois´ es Hern´ andez Fern´ andez, FMRIB
8 / 23
Vx VOXELS Vy VOXELS Thread Block 0
Group 0 Group B-1
Thread Block (Vx*Vy/B) -1
Group 0 Group B-1 .................................. 1
2 31 .................................. 1 2 .................................. 1 2 .................................. 1 2 31 31 31 Mois´ es Hern´ andez Fern´ andez, FMRIB
9 / 23
Levenberg
Thread working Idle thread
I iterations Active Threads in a Block
0 1 2 3 4 ......
...... ......
STEPS:
...... ......
30 31
Cost function: sum of squared differences between measurements and model predictions Gradient descent method: needs partial derivatives for Gradient and Jacobian (NParameters × NMeasurements) Threads collaborate for computing the partial derivatives Hessian = Jacobian ∗ JacobianT Shuffle instructions 2 warps (and voxels) per block
Mois´ es Hern´ andez Fern´ andez, FMRIB
10 / 23
Options at compilation time: Bounds - Lower and/or Upper limits (any routine). Levenberg kernel implements reparameterisations internally Priors (MCMC):
Gaussian probability distribution Gamma probability distribution Automatic Relevance Determination (ARD) Angle uniformly distributed on a sphere
Constraints: relation between parameters Different noise models: Gaussian & Rician Numerical differentiation in Levenberg kernel
Mois´ es Hern´ andez Fern´ andez, FMRIB
11 / 23
Options at compilation time:
MACRO T Predicted_Signal (int npar, T* P, T* CFP, T* FixP){ return P[0]*exp(-P[1]*CFP[0]); } MACRO void Partial_Derivatives (int npar, T* P, T* CFP, T* FixP, T* derivatives){ derivatives[0]=exp(-P[1]*CFP[0]); derivatives[1]=-P[0]*CFP[0]*exp(-P[1]*CFP[0]); } bounds[0] = (80,120) bounds[1] = (,1.5) prior[0] = Gaussian(100,10) prior[1] = ARD(1) Model Predicted Signal f(θ)=θ1*exp(-θ2*x) and Partial Derivatives Parameters Bounds and Priors
Mois´ es Hern´ andez Fern´ andez, FMRIB
12 / 23
Functionality at execution time: Choosing fitting routines: Grid search, Levenberg-Marquardt, MCMC Selecting number of iterations in Levenberg-Marquardt Selecting number of iterations in MCMC (burn-in, total, sample thinning interval) Cascaded model fitting (Initialising parameters) Choose parameters of the model to be kept fixed during the fitting process Bayesian & Akaike Inference Criterion The toolbox can be easily extended
Mois´ es Hern´ andez Fern´ andez, FMRIB
13 / 23
We have implemented several diffusion models Fibre Orientation estimation:
Superior - Inferior Anterior - Posterior Medial - Lateral
CPU GPU Coronal Sagittal Axial cuDIMOT
Mois´ es Hern´ andez Fern´ andez, FMRIB
14 / 23
Mean estimates: 1000 repeats
S0 d f1 f2 Corpus Callosum Centrum Semiovale Grey Matter CPU GPU cuDIMOT
1 1775 1780 1785 1790 1795 1800 1 0.0020 0.0022 0.0024 0.0026 0.0028 1 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 1 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 1 1.0 1.1 1.2 1.3 1.4 1 3.04 3.05 3.06 3.07 3.08 3.09 1 1910 1920 1930 1940 1950 1960 1970 1 0.00090 0.00095 0.00100 0.00105 0.00110 0.00115 0.00120 0.00125 1 0.225 0.230 0.235 0.240 0.245 0.250 0.255 0.260 1 0.195 0.200 0.205 0.210 0.215 0.220 0.225 1 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1 2.075 2.080 2.085 2.090 2.095 2.100 2.105 2.110 2.115 1 1310 1320 1330 1340 1350 1 0.00095 0.00100 0.00105 0.00110 0.00115 0.00120 1 0.12 0.14 0.16 0.18 0.20 1 0.00 0.02 0.04 0.06 0.08 0.10 0.12 1 1.3 1.4 1.5 1.6 1.7 1.8 1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0th1 ph1
Mois´ es Hern´ andez Fern´ andez, FMRIB
15 / 23
Standard deviation estimates: 1000 repeats
S0 d f1 f2 Orientation Uncertainty1 Corpus Callosum Centrum Semiovale Grey Matter CPU GPU cuDIMOT
1 12 14 16 18 20 22 24 26 28 1 0.00005 0.00010 0.00015 0.00020 0.00025 0.00030 1 0.02 0.04 0.06 0.08 0.10 0.12 1 0.02 0.04 0.06 0.08 0.10 0.12 1 0.000 0.001 0.002 0.003 0.004 0.005 0.006 1 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 1 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012 1 0.008 0.010 0.012 0.014 0.016 1 0.008 0.010 0.012 0.014 0.016 1 0.002 0.003 0.004 0.005 0.006 1 12 14 16 18 20 22 24 26 1 0.00002 0.00004 0.00006 0.00008 0.00010 1 0.01 0.02 0.03 0.04 0.05 0.06 1 0.00 0.01 0.02 0.03 0.04 0.05 0.06 1 0.0 0.1 0.2 0.3 0.4Mois´ es Hern´ andez Fern´ andez, FMRIB
16 / 23
NODDI Watson MATLAB NODDI Watson cuDIMOT
Fiso Fintra OD 0.2 Difference % Fiso Difference % Fintra Difference % OD 0% 20% Resources / Time
72 CPU cores 40 hours 1 GPU 6.8 minutes
1
Mois´ es Hern´ andez Fern´ andez, FMRIB
17 / 23
Mois´ es Hern´ andez Fern´ andez, FMRIB
18 / 23
Time fitting different dMRI models
101 102 103 104 105
Times in seconds (logarithm scale)
CPU tool - 72 cores cuDIMOT - single K80 GPU
NODDI Watson Matlab NODDI Bingham Matlab Ball & 1 Stick C++ Ball & 1 Stick Gamma C++ Ball & 2 Sticks C++ Ball & 2 Sticks Gamma C++
Speedup 352x Speedup 6.98x Speedup 3.7x Speedup 3.26x Speedup 3.88x Speedup 3.85x
Mois´ es Hern´ andez Fern´ andez, FMRIB
19 / 23
50% 100% 0% 25% 70% Global Memory Load Efficiency 91.4% 50% 100% 0% 25% 70% Global Memory Store Efficiency 90.5% Medium Max Idle Low High Global Memory Bandwidth Reads: Writes: Total: ECC Overhead: 95.23 GB/s 61.50 GB/s 156.73 GB/s 40.36 GB/s
Mois´ es Hern´ andez Fern´ andez, FMRIB
20 / 23
Diffusion MRI allows the study of brain microstructure non-invasively and in-vivo, but it can be very time-consuming. cuDIMOT: We have designed and implemented a generic and flexible CUDA toolbox for nonlinear model fitting (new models can easily be implemented on GPUs). It reduces computational times: ∼200X. These accelerations are tremendously beneficial, especially in very large recent studies such as:
The Human Connectome Project (HCP): data from 1,200 adults The UK Biobank Project: data from 100,000 adults.
cuDIMOT can be used in other modalities and can be easily extended
Mois´ es Hern´ andez Fern´ andez, FMRIB Conclusions 21 / 23
Funding: Human Connectome Project (1U54MH091657-01) UK EPSRC (EP/L023067/1)
Mois´ es Hern´ andez Fern´ andez, FMRIB Acknowledgements 22 / 23
Mois´ es Hern´ andez Fern´ andez, FMRIB cuDIMOT 23 / 23