White Matter Tractography and Human Brain Connections Using GPUs - - PowerPoint PPT Presentation

white matter tractography and human brain connections
SMART_READER_LITE
LIVE PREVIEW

White Matter Tractography and Human Brain Connections Using GPUs - - PowerPoint PPT Presentation

White Matter Tractography and Human Brain Connections Using GPUs Mois es Hern andez Fern andez Istvan Reguly, Mike Giles, Stephen Smith and Stamatios N. Sotiropoulos GPU Technology Conference 2016 dMRI & Tractography Low Path


slide-1
SLIDE 1

White Matter Tractography and Human Brain Connections Using GPUs

Mois´ es Hern´ andez Fern´ andez Istvan Reguly, Mike Giles, Stephen Smith and Stamatios N. Sotiropoulos

GPU Technology Conference 2016

slide-2
SLIDE 2

dMRI & Tractography

Low Path Probability High Path Probability

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

2 / 23

slide-3
SLIDE 3

Outline

  • 1. Introduction to diffusion MRI & Tractography
  • 2. Parallelization of computational diffusion MRI using GPUs

Voxelwise Fibre Orientation Estimation on GPUs Probabilistic tractography on GPUs

  • 3. Conclusions

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

3 / 23

slide-4
SLIDE 4

diffusion MRI (dMRI)

Molecules are in constant

  • motion. We want to quantify

water diffusion within a tissue. Isotropic vs Anisotropic

  • diffusion. Different tissues: Grey

Matter vs White Matter. It is possible to estimate the principal diffusion orientations in each voxel of the brain.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

4 / 23

slide-5
SLIDE 5

diffusion MRI (dMRI)

Molecules are in constant

  • motion. We want to quantify

water diffusion within a tissue. Isotropic vs Anisotropic

  • diffusion. Different tissues: Grey

Matter vs White Matter. It is possible to estimate the principal diffusion orientations in each voxel of the brain.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

4 / 23

slide-6
SLIDE 6

diffusion MRI (dMRI)

Molecules are in constant

  • motion. We want to quantify

water diffusion within a tissue. Isotropic vs Anisotropic

  • diffusion. Different tissues: Grey

Matter vs White Matter. It is possible to estimate the principal diffusion orientations in each voxel of the brain.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

4 / 23

slide-7
SLIDE 7

Tractography

Post-mortem dissection

  • f some white matter

fibre bundles (Tracts). Tractography: The post-imaging reconstruction of fibre bundles anatomical connections in the brain. In-vivo virtual dissection.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

5 / 23

slide-8
SLIDE 8

Tractography

Post-mortem dissection

  • f some white matter

fibre bundles (Tracts). Tractography: The post-imaging reconstruction of fibre bundles anatomical connections in the brain. In-vivo virtual dissection.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

5 / 23

slide-9
SLIDE 9

dMRI &Tractography Applications

Neurosurgical Planning. Neurological and psychiatric disorders: e.g. Tracts Deterioration in Alzheimer. Brain systems wiring and network analysis.

www.humanconnectome.org.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

6 / 23

slide-10
SLIDE 10

dMRI &Tractography Applications

Jin, Yan, et al. 2015

Neurosurgical Planning. Neurological and psychiatric disorders: e.g. Tracts Deterioration in Alzheimer. Brain systems wiring and network analysis.

www.humanconnectome.org.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

6 / 23

slide-11
SLIDE 11

dMRI &Tractography Applications

Neurosurgical Planning. Neurological and psychiatric disorders: e.g. Tracts Deterioration in Alzheimer. Brain systems wiring and network analysis.

www.humanconnectome.org.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

6 / 23

slide-12
SLIDE 12

dMRI &Tractography

These tractography approaches can require very high computational resources. Computation times can be significant: Hours to Days depending on data & application. We have designed and implemented a CUDA framework to massively accelerate these estimations

  • 1. Voxelwise local modelling to estimate fibre orientations: Hundreds of

thousands model fits to estimate orientations. Quite suitable for GPUs.

  • 2. Global modelling: Integrates these orientations and estimate

long-range connections across the brain volume. Not as suitable for GPUs.

Mois´ es Hern´ andez Fern´ andez, FMRIB

  • 1. dMRI & Tractography

7 / 23

slide-13
SLIDE 13

Voxelwise Fibre Orientation Estimation

Sk = S0[(1 −

L

  • j=1

fj) exp(−bkd) +

L

  • j=1

fj exp(−bkd(gT

k vj)2)]

We want to estimate the main fibre orientations at each voxel. Input: Many diffusion-weighted measurements in each voxel Output: Orientations of fibres in each voxel. Ball & sticks model. Bayesian inference framework: P(Parameters|Data) = P(Data|Params)P(Params) P(Data) MCMC Sampling (∼5000 iters) Parameters initialized using Levenberg-Marquardt.

Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 8 / 23

slide-14
SLIDE 14

Voxelwise Fibre Orientation Estimation

Sk = S0[(1 −

L

  • j=1

fj) exp(−bkd) +

L

  • j=1

fj exp(−bkd(gT

k vj)2)]

We want to estimate the main fibre orientations at each voxel. Input: Many diffusion-weighted measurements in each voxel Output: Orientations of fibres in each voxel. Ball & sticks model. Bayesian inference framework: P(Parameters|Data) = P(Data|Params)P(Params) P(Data) MCMC Sampling (∼5000 iters) Parameters initialized using Levenberg-Marquardt.

Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 8 / 23

slide-15
SLIDE 15

Inferring fibre orientations on GPUs

M VOXELS N VOXELS

...

Thread Block 0

......................

Thread Block X

................................... 1 2 ................................... 1 2 Q-1 Q-2 ............................... Q-1 Q-2

Each voxel is assigned to a Thread.

Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 9 / 23

slide-16
SLIDE 16

Inferring fibre orientations on GPUs

M VOXELS N VOXELS

...

Thread Block 0

......................

Thread Block MxN-1

................................... 1 2 ................................... 1 2 Q-1 Q-2 ............................... Q-1 Q-2

Each voxel is assigned to a Thread Block.

Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 9 / 23

slide-17
SLIDE 17

Inferring fibre orientations on GPUs

Thread 1

g2Q+1 g1 gQ+1

Thread 0 (Leader)

g2Q g0 gQ g2 gQ+2

Thread 2 Thread Q -1

... K Gradient Directions

gQ-1

...

g2Q-1

... ...

Thread working Idle thread

TASKS:

  • 1. Propose a new Parameter
  • 2. Calculate Predicted Signal - Likelihood
  • 3. Calculate Posterior & Accept/Reject Parameter

MCMC T iterations Active Threads in a Block

. . .

Param 1

  • 1. Task 1
  • 3. Task 2
  • 2. synchronise()
  • 5. Task 3
  • 4. synchronise()

Q-2 Q-1 0 1 2 3 4 ......

...... ...... ...... ......

Param R

  • 1. Task 1
  • 3. Task 2
  • 2. synchronise()
  • 5. Task 3
  • 4. synchronise()

Q-2 Q-1 0 1 2 3 4 ......

...... ...... ...... ...... . . .

Levemberg Marquardt S iterations Active Threads in a Block

  • 1. Task 1
  • 3. Task 2
  • 2. synchronise()
  • 5. Task 3
  • 6. synchronise()
  • 7. Task 4
  • 8. synchronise()
  • 9. Task 5
  • 4. synchronise()

Q-2 Q-1 0 1 2 3 4 ......

...... ...... ...... ...... ...... ...... ...... ......

TASKS:

  • 1. Calculate Gradient
  • 2. Calculate Hessian Matrix
  • 3. Update Parameters
  • 4. Calculate Cost Function
  • 5. Check Convergence

Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 10 / 23

slide-18
SLIDE 18

Fibre orientations: Speedup

Number of Fibres Time in seconds CPU GPU Directions number: 64 256 128

100 1000 10000 1000000 100000 2 3 1

1 Tesla C2050 GPU vs 1 Intel Xeon E5620 core. Speedup: 112X 102 Tesla M2090 CPU vs 102 Intel Xeon X5650. Speedup: 120X

Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 11 / 23

slide-19
SLIDE 19

Fibre orientations: Speedup

Directions number: 64 256 128 Multi GPU Number of Fibres Time in seconds Multi CPU

10 100 1000 10000 100000 1 2 3

1 Tesla C2050 GPU vs 1 Intel Xeon E5620 core. Speedup: 112X 102 Tesla M2090 CPU vs 102 Intel Xeon X5650. Speedup: 120X

Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 11 / 23

slide-20
SLIDE 20

Fibre Orientations: Validation

Corpus Callosum Centrum Semiovale Grey Matter

GPU CPU f1 S0 d

1 3 1 3 1 1 3 2 5 1 1 5 2 2 5 3 1 2 8 1 2 9 1 3 5 1 1 5 2 2 5 2 9 2 9 2 2 9 4 5 1 1 5 2 2 5 .4 5 .4 7 .4 9 5 1 1 5 2 2 5 3 .2 4 .2 6 .2 8 5 1 1 5 2 2 5 .0 5 .0 1 .0 1 5 4 8 1 2 1 6 2 .9 1 x 1

  • 3

1 2 3 4 5 .9 1 x 1

  • 3

5 1 1 5 2 2 5 9 .4 9 .5 x 1

  • 4

5 1 1 5 2 2 5 3 9 .6 1.1 1.1

Distribution of the estimated parameters after repeating the experiment 1000 times with CPU and GPU

Mois´ es Hern´ andez Fern´ andez, FMRIB Voxelwise application 12 / 23

slide-21
SLIDE 21

Probabilistic Tractography

High memory requirements Compute a global anatomical brain connectivity map integrating the brain fibre orientations

  • f continuous voxels.

Propagate N streamlines from seeds, choosing

  • rientations randomly.

Probability of connection between 2 regions: PAB = M/N. M: number of streamlines that go through B.

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 13 / 23

slide-22
SLIDE 22

Probabilistic Tractography

High memory requirements Compute a global anatomical brain connectivity map integrating the brain fibre orientations

  • f continuous voxels.

Propagate N streamlines from seeds, choosing

  • rientations randomly.

Probability of connection between 2 regions: PAB = M/N. M: number of streamlines that go through B.

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 13 / 23

slide-23
SLIDE 23

Probabilistic tractography on GPUs: Design

Probabilistic Tractography on GPUs

SEED 0

SL 0 SL 1 SL 2 SL 3 SL Q-4 SL Q-3 SL Q-2 SL Q-1 SL N-4 SL N-3 SL N-2 SL N-1 SL Q SL Q+1 SL Q+2 SL Q+3

Thread Block 0

................................... 1 Q-1 Q-1

Thread Block1 SEED M-1

SL 0 SL 1 SL 2 SL 3 SL N-4 SL N-3 SL N-2 SL N-1 ................................... 1 Q-1 Q-1 ................................... 1 Q-1 Q-1

Thread Block F-1

MxN streamlines (SLs) are launched from M seeds. Each SL is assigned to a thread.

This time, 1 thread per voxel is not efficient: most threads would be

  • idle. Unbalanced
  • workload. Better, one

thread per streamline. High memory requirements: Limitation

  • n the number of

parallel threads.

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 14 / 23

slide-24
SLIDE 24

Probabilistic tractography on GPUs: Design

SAMPLES ~1.5 GB

Device Memory

MASKS ~0.5 GB

X Y Z X Y Z X Y Z

...

X Y Z X Y Z X Y Z

... ...

X Y Z X Y Z X Y Z

...

X Y Z X Y Z X Y Z

... Streamlines Coordinates

This time, 1 thread per voxel is not efficient: most threads would be

  • idle. Unbalanced
  • workload. Better, one

thread per streamline. High memory requirements: Limitation

  • n the number of

parallel threads.

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 14 / 23

slide-25
SLIDE 25

Probabilistic tractography on GPUs: Design

STEP N STEP N+1 STEP N+2 STEP N+3 Segment-Triangle Intersection

  • 1. Select Sample & Jump
  • 2. Check Imposed

Anatomical Constraints (SEVERAL)

  • 3. Update Connectivity

Heavy tasks assigned to each thread: Divide tasks into several CUDA kernels. Need to Update Connectivity Matrix: 91282 x 91282 elements (∼32 GB). CPU can do

  • it. Overlap tasks using

CUDA streams. Divergences: After few steps, remove finished streamlines.

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 15 / 23

slide-26
SLIDE 26

Probabilistic tractography on GPUs: Design

GPU Kernels: Calculate paths, Check Anatomical Constraint Masks & Check connectivity between nodes. Copy Path Coordinates & Nodes Connections CPU->GPU Update Connectivity Matrix (CPU)

Heavy tasks assigned to each thread: Divide tasks into several CUDA kernels. Need to Update Connectivity Matrix: 91282 x 91282 elements (∼32 GB). CPU can do

  • it. Overlap tasks using

CUDA streams. Divergences: After few steps, remove finished streamlines.

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 15 / 23

slide-27
SLIDE 27

Probabilistic tractography on GPUs: Design

THREADS SAME WARP IDLE THREADS STEP N STEP N+1 STEP N+2 STEP N+3

Heavy tasks assigned to each thread: Divide tasks into several CUDA kernels. Need to Update Connectivity Matrix: 91282 x 91282 elements (∼32 GB). CPU can do

  • it. Overlap tasks using

CUDA streams. Divergences: After few steps, remove finished streamlines.

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 15 / 23

slide-28
SLIDE 28

Probabilistic Tractography: Speedup

2466 154 16 11816 78 151 9707 102 95 9066 55 164 44977 339 132 56578 314 180

Time & Speedup: Some major brain white maer tracts

1 10 100000 10000 1000 100

Time in seconds \ Speedup factor (Log scale)

1 Core Intel Xeon E5-2680 v3 2.50 GHz Single GPU Nvidia K80 Speedup

Corticospinal Tract Superior Thalamic Radiation Inferior Fronto-occipital Fasciculus Posterior Thalamic Radiation Forceps Minor Cingulate gyrus part of cingulum

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 16 / 23

slide-29
SLIDE 29

Probabilistic Tractography: Speedup

Time & Speedup: Brain Connecvity Matrix

1 Core Intel Xeon E5-2680 v3 2.50 GHz Single GPU Nvidia K80 Speedup

1 10 100 1000 10000 100000 1000000

68.29 10788 736720

Time in seconds \ Speedup factor (Log scale)

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 17 / 23

slide-30
SLIDE 30

Probabilistic Tractography:Performance

Global Load Throughput Global Load Efficiency Global Store Throughput Global Store Efficiency Instrucons executed per cycle (IPC)

Single-precision floang-point

  • peraons per second (FLOPS)

Single-precision floang-point instrucons per second

Integer instrucons per second

Control instrucons per second

Brain Connecvity Matrix White maer tracts

23.11 % 6.55 GB/s 12.58 % 1.07 12.25 GFLOPS 284.40 GIPS 389.30 GIPS 27.78 GB/s 51.74 GIPS 23.56 % 7.57 GB/s 12.62 % 1.15 14.88 GFLOPS 344.47 GIPS 477.80 GIPS 32.81 GB/s 61.59 GIPS

Aer few iteraons threads of the same warp access to remote memory posions (Uncoalesced): Different brain paths. Different brain paths cause divergences. A lot of other kind of instrucons.

GPU Performance Analysis

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 18 / 23

slide-31
SLIDE 31

Results: Probabilistic Tractography

Superior thalamic radiation Corticospinal tract Cingulate gyrus part of cingulum Forceps minor Posterior thalamic radiation Inferior fronto-occipital fasciculus

Coronal Sagittal Axial CPU GPU

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 19 / 23

slide-32
SLIDE 32

Results: Probabilistic Tractography

Brain Connectivity Map from a Vertex in Motor Cortex (Grey Point)

CPU GPU

Low Path Probability High Path Probability

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 20 / 23

slide-33
SLIDE 33

Probabilistic Tractography: Validation

Correlaon Coefficient and Variability (σ) for a Human Connectome (full brain connecvity map)

σ=8.95894E-06 σ=3.18924E-06

CPU-GPU CPU-CPU

0.999 0.9992 0.9994 0.9996 0.9998 1

Mois´ es Hern´ andez Fern´ andez, FMRIB Probabilistic Tractography 21 / 23

slide-34
SLIDE 34

Conclusions

Diffusion MRI uniquely allows the study of brain microstructure and long-range brain connections, non-invasively and in-vivo. The analysis of dMRI data can require high computational resources. We have designed and implemented frameworks using GPUs that reduce computational times: ∼150X These accelerations are tremendously beneficial, especially in very large recent studies such as:

The Human Connectome Project (HCP): data from 1,200 adults. The Developing Human Connectome Project (dHCP): data from 1,000 babies. The UK Biobank Project: data from 100,000 adults.

Mois´ es Hern´ andez Fern´ andez, FMRIB Conclusions 22 / 23

slide-35
SLIDE 35

Acknowledgements

Funding: Human Connectome Project (1U54MH091657-01) UK EPSRC (EP/L023067/1)

Mois´ es Hern´ andez Fern´ andez, FMRIB Acknowledgements 23 / 23