White Matter Tractography and Human Brain Connections Using GPUs - - PowerPoint PPT Presentation
White Matter Tractography and Human Brain Connections Using GPUs - - PowerPoint PPT Presentation
White Matter Tractography and Human Brain Connections Using GPUs Mois es Hern andez Fern andez Istvan Reguly, Mike Giles, Stephen Smith and Stamatios N. Sotiropoulos GPU Technology Conference 2016 dMRI & Tractography Low Path
dMRI & Tractography
Low Path Probability High Path Probability
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
2 / 23
Outline
- 1. Introduction to diffusion MRI & Tractography
- 2. Parallelization of computational diffusion MRI using GPUs
Voxelwise Fibre Orientation Estimation on GPUs Probabilistic tractography on GPUs
- 3. Conclusions
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
3 / 23
diffusion MRI (dMRI)
Molecules are in constant
- motion. We want to quantify
water diffusion within a tissue. Isotropic vs Anisotropic
- diffusion. Different tissues: Grey
Matter vs White Matter. It is possible to estimate the principal diffusion orientations in each voxel of the brain.
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
4 / 23
diffusion MRI (dMRI)
Molecules are in constant
- motion. We want to quantify
water diffusion within a tissue. Isotropic vs Anisotropic
- diffusion. Different tissues: Grey
Matter vs White Matter. It is possible to estimate the principal diffusion orientations in each voxel of the brain.
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
4 / 23
diffusion MRI (dMRI)
Molecules are in constant
- motion. We want to quantify
water diffusion within a tissue. Isotropic vs Anisotropic
- diffusion. Different tissues: Grey
Matter vs White Matter. It is possible to estimate the principal diffusion orientations in each voxel of the brain.
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
4 / 23
Tractography
Post-mortem dissection
- f some white matter
fibre bundles (Tracts). Tractography: The post-imaging reconstruction of fibre bundles anatomical connections in the brain. In-vivo virtual dissection.
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
5 / 23
Tractography
Post-mortem dissection
- f some white matter
fibre bundles (Tracts). Tractography: The post-imaging reconstruction of fibre bundles anatomical connections in the brain. In-vivo virtual dissection.
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
5 / 23
dMRI &Tractography Applications
Neurosurgical Planning. Neurological and psychiatric disorders: e.g. Tracts Deterioration in Alzheimer. Brain systems wiring and network analysis.
www.humanconnectome.org.
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
6 / 23
dMRI &Tractography Applications
Jin, Yan, et al. 2015
Neurosurgical Planning. Neurological and psychiatric disorders: e.g. Tracts Deterioration in Alzheimer. Brain systems wiring and network analysis.
www.humanconnectome.org.
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
6 / 23
dMRI &Tractography Applications
Neurosurgical Planning. Neurological and psychiatric disorders: e.g. Tracts Deterioration in Alzheimer. Brain systems wiring and network analysis.
www.humanconnectome.org.
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
6 / 23
dMRI &Tractography
These tractography approaches can require very high computational resources. Computation times can be significant: Hours to Days depending on data & application. We have designed and implemented a CUDA framework to massively accelerate these estimations
- 1. Voxelwise local modelling to estimate fibre orientations: Hundreds of
thousands model fits to estimate orientations. Quite suitable for GPUs.
- 2. Global modelling: Integrates these orientations and estimate
long-range connections across the brain volume. Not as suitable for GPUs.
Mois´ es Hern´ andez Fern´ andez, FMRIB
- 1. dMRI & Tractography
7 / 23
Voxelwise Fibre Orientation Estimation
Sk = S0[(1 −
L
- j=1
fj) exp(−bkd) +
L
- j=1
fj exp(−bkd(gT
k vj)2)]
We want to estimate the main fibre orientations at each voxel. Input: Many diffusion-weighted measurements in each voxel Output: Orientations of fibres in each voxel. Ball & sticks model. Bayesian inference framework: P(Parameters|Data) = P(Data|Params)P(Params) P(Data) MCMC Sampling (∼5000 iters) Parameters initialized using Levenberg-Marquardt.
Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 8 / 23
Voxelwise Fibre Orientation Estimation
Sk = S0[(1 −
L
- j=1
fj) exp(−bkd) +
L
- j=1
fj exp(−bkd(gT
k vj)2)]
We want to estimate the main fibre orientations at each voxel. Input: Many diffusion-weighted measurements in each voxel Output: Orientations of fibres in each voxel. Ball & sticks model. Bayesian inference framework: P(Parameters|Data) = P(Data|Params)P(Params) P(Data) MCMC Sampling (∼5000 iters) Parameters initialized using Levenberg-Marquardt.
Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 8 / 23
Inferring fibre orientations on GPUs
M VOXELS N VOXELS
...
Thread Block 0
......................
Thread Block X
................................... 1 2 ................................... 1 2 Q-1 Q-2 ............................... Q-1 Q-2
Each voxel is assigned to a Thread.
Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 9 / 23
Inferring fibre orientations on GPUs
M VOXELS N VOXELS
...
Thread Block 0
......................
Thread Block MxN-1
................................... 1 2 ................................... 1 2 Q-1 Q-2 ............................... Q-1 Q-2
Each voxel is assigned to a Thread Block.
Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 9 / 23
Inferring fibre orientations on GPUs
Thread 1
g2Q+1 g1 gQ+1
Thread 0 (Leader)
g2Q g0 gQ g2 gQ+2
Thread 2 Thread Q -1
... K Gradient Directions
gQ-1
...
g2Q-1
... ...
Thread working Idle thread
TASKS:
- 1. Propose a new Parameter
- 2. Calculate Predicted Signal - Likelihood
- 3. Calculate Posterior & Accept/Reject Parameter
MCMC T iterations Active Threads in a Block
. . .
Param 1
- 1. Task 1
- 3. Task 2
- 2. synchronise()
- 5. Task 3
- 4. synchronise()
Q-2 Q-1 0 1 2 3 4 ......
...... ...... ...... ......
Param R
- 1. Task 1
- 3. Task 2
- 2. synchronise()
- 5. Task 3
- 4. synchronise()
Q-2 Q-1 0 1 2 3 4 ......
...... ...... ...... ...... . . .
Levemberg Marquardt S iterations Active Threads in a Block
- 1. Task 1
- 3. Task 2
- 2. synchronise()
- 5. Task 3
- 6. synchronise()
- 7. Task 4
- 8. synchronise()
- 9. Task 5
- 4. synchronise()
Q-2 Q-1 0 1 2 3 4 ......
...... ...... ...... ...... ...... ...... ...... ......
TASKS:
- 1. Calculate Gradient
- 2. Calculate Hessian Matrix
- 3. Update Parameters
- 4. Calculate Cost Function
- 5. Check Convergence
Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 10 / 23
Fibre orientations: Speedup
Number of Fibres Time in seconds CPU GPU Directions number: 64 256 128
100 1000 10000 1000000 100000 2 3 1
1 Tesla C2050 GPU vs 1 Intel Xeon E5620 core. Speedup: 112X 102 Tesla M2090 CPU vs 102 Intel Xeon X5650. Speedup: 120X
Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 11 / 23
Fibre orientations: Speedup
Directions number: 64 256 128 Multi GPU Number of Fibres Time in seconds Multi CPU
10 100 1000 10000 100000 1 2 3
1 Tesla C2050 GPU vs 1 Intel Xeon E5620 core. Speedup: 112X 102 Tesla M2090 CPU vs 102 Intel Xeon X5650. Speedup: 120X
Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 11 / 23
Fibre Orientations: Validation
Corpus Callosum Centrum Semiovale Grey Matter
GPU CPU f1 S0 d
1 3 1 3 1 1 3 2 5 1 1 5 2 2 5 3 1 2 8 1 2 9 1 3 5 1 1 5 2 2 5 2 9 2 9 2 2 9 4 5 1 1 5 2 2 5 .4 5 .4 7 .4 9 5 1 1 5 2 2 5 3 .2 4 .2 6 .2 8 5 1 1 5 2 2 5 .0 5 .0 1 .0 1 5 4 8 1 2 1 6 2 .9 1 x 1
- 3
1 2 3 4 5 .9 1 x 1
- 3
5 1 1 5 2 2 5 9 .4 9 .5 x 1
- 4
5 1 1 5 2 2 5 3 9 .6 1.1 1.1
Distribution of the estimated parameters after repeating the experiment 1000 times with CPU and GPU
Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 12 / 23
Probabilistic Tractography
High memory requirements Compute a global anatomical brain connectivity map integrating the brain fibre orientations
- f continuous voxels.
Propagate N streamlines from seeds, choosing
- rientations randomly.
Probability of connection between 2 regions: PAB = M/N. M: number of streamlines that go through B.
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 13 / 23
Probabilistic Tractography
High memory requirements Compute a global anatomical brain connectivity map integrating the brain fibre orientations
- f continuous voxels.
Propagate N streamlines from seeds, choosing
- rientations randomly.
Probability of connection between 2 regions: PAB = M/N. M: number of streamlines that go through B.
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 13 / 23
Probabilistic tractography on GPUs: Design
Probabilistic Tractography on GPUs
SEED 0
SL 0 SL 1 SL 2 SL 3 SL Q-4 SL Q-3 SL Q-2 SL Q-1 SL N-4 SL N-3 SL N-2 SL N-1 SL Q SL Q+1 SL Q+2 SL Q+3
Thread Block 0
................................... 1 Q-1 Q-1
Thread Block1 SEED M-1
SL 0 SL 1 SL 2 SL 3 SL N-4 SL N-3 SL N-2 SL N-1 ................................... 1 Q-1 Q-1 ................................... 1 Q-1 Q-1
Thread Block F-1
MxN streamlines (SLs) are launched from M seeds. Each SL is assigned to a thread.
This time, 1 thread per voxel is not efficient: most threads would be
- idle. Unbalanced
- workload. Better, one
thread per streamline. High memory requirements: Limitation
- n the number of
parallel threads.
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 14 / 23
Probabilistic tractography on GPUs: Design
SAMPLES ~1.5 GB
Device Memory
MASKS ~0.5 GB
X Y Z X Y Z X Y Z
...
X Y Z X Y Z X Y Z
... ...
X Y Z X Y Z X Y Z
...
X Y Z X Y Z X Y Z
... Streamlines Coordinates
This time, 1 thread per voxel is not efficient: most threads would be
- idle. Unbalanced
- workload. Better, one
thread per streamline. High memory requirements: Limitation
- n the number of
parallel threads.
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 14 / 23
Probabilistic tractography on GPUs: Design
STEP N STEP N+1 STEP N+2 STEP N+3 Segment-Triangle Intersection
- 1. Select Sample & Jump
- 2. Check Imposed
Anatomical Constraints (SEVERAL)
- 3. Update Connectivity
Heavy tasks assigned to each thread: Divide tasks into several CUDA kernels. Need to Update Connectivity Matrix: 91282 x 91282 elements (∼32 GB). CPU can do
- it. Overlap tasks using
CUDA streams. Divergences: After few steps, remove finished streamlines.
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 15 / 23
Probabilistic tractography on GPUs: Design
GPU Kernels: Calculate paths, Check Anatomical Constraint Masks & Check connectivity between nodes. Copy Path Coordinates & Nodes Connections CPU->GPU Update Connectivity Matrix (CPU)
Heavy tasks assigned to each thread: Divide tasks into several CUDA kernels. Need to Update Connectivity Matrix: 91282 x 91282 elements (∼32 GB). CPU can do
- it. Overlap tasks using
CUDA streams. Divergences: After few steps, remove finished streamlines.
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 15 / 23
Probabilistic tractography on GPUs: Design
THREADS SAME WARP IDLE THREADS STEP N STEP N+1 STEP N+2 STEP N+3
Heavy tasks assigned to each thread: Divide tasks into several CUDA kernels. Need to Update Connectivity Matrix: 91282 x 91282 elements (∼32 GB). CPU can do
- it. Overlap tasks using
CUDA streams. Divergences: After few steps, remove finished streamlines.
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 15 / 23
Probabilistic Tractography: Speedup
2466 154 16 11816 78 151 9707 102 95 9066 55 164 44977 339 132 56578 314 180
Time & Speedup: Some major brain white maer tracts
1 10 100000 10000 1000 100
Time in seconds \ Speedup factor (Log scale)
1 Core Intel Xeon E5-2680 v3 2.50 GHz Single GPU Nvidia K80 Speedup
Corticospinal Tract Superior Thalamic Radiation Inferior Fronto-occipital Fasciculus Posterior Thalamic Radiation Forceps Minor Cingulate gyrus part of cingulum
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 16 / 23
Probabilistic Tractography: Speedup
Time & Speedup: Brain Connecvity Matrix
1 Core Intel Xeon E5-2680 v3 2.50 GHz Single GPU Nvidia K80 Speedup
1 10 100 1000 10000 100000 1000000
68.29 10788 736720
Time in seconds \ Speedup factor (Log scale)
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 17 / 23
Probabilistic Tractography:Performance
Global Load Throughput Global Load Efficiency Global Store Throughput Global Store Efficiency Instrucons executed per cycle (IPC)
Single-precision floang-point
- peraons per second (FLOPS)
Single-precision floang-point instrucons per second
Integer instrucons per second
Control instrucons per second
Brain Connecvity Matrix White maer tracts
23.11 % 6.55 GB/s 12.58 % 1.07 12.25 GFLOPS 284.40 GIPS 389.30 GIPS 27.78 GB/s 51.74 GIPS 23.56 % 7.57 GB/s 12.62 % 1.15 14.88 GFLOPS 344.47 GIPS 477.80 GIPS 32.81 GB/s 61.59 GIPS
Aer few iteraons threads of the same warp access to remote memory posions (Uncoalesced): Different brain paths. Different brain paths cause divergences. A lot of other kind of instrucons.
GPU Performance Analysis
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 18 / 23
Results: Probabilistic Tractography
Superior thalamic radiation Corticospinal tract Cingulate gyrus part of cingulum Forceps minor Posterior thalamic radiation Inferior fronto-occipital fasciculus
Coronal Sagittal Axial CPU GPU
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 19 / 23
Results: Probabilistic Tractography
Brain Connectivity Map from a Vertex in Motor Cortex (Grey Point)
CPU GPU
Low Path Probability High Path Probability
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 20 / 23
Probabilistic Tractography: Validation
Correlaon Coefficient and Variability (σ) for a Human Connectome (full brain connecvity map)
σ=8.95894E-06 σ=3.18924E-06
CPU-GPU CPU-CPU
0.999 0.9992 0.9994 0.9996 0.9998 1
Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 21 / 23
Conclusions
Diffusion MRI uniquely allows the study of brain microstructure and long-range brain connections, non-invasively and in-vivo. The analysis of dMRI data can require high computational resources. We have designed and implemented frameworks using GPUs that reduce computational times: ∼150X These accelerations are tremendously beneficial, especially in very large recent studies such as:
The Human Connectome Project (HCP): data from 1,200 adults. The Developing Human Connectome Project (dHCP): data from 1,000 babies. The UK Biobank Project: data from 100,000 adults.
Mois´ es Hern´ andez Fern´ andez, FMRIB Conclusions 22 / 23
Acknowledgements
Funding: Human Connectome Project (1U54MH091657-01) UK EPSRC (EP/L023067/1)
Mois´ es Hern´ andez Fern´ andez, FMRIB Acknowledgements 23 / 23