A GPU Implementation of a Spatial GLMM: Assessing Spatially Varying Coefficients of Multiple Sclerosis Lesions
Timothy D. Johnson
Department of Biostatistics University of Michigan
Johnson (University of Michigan) MS-GPU Jan 6, 2014 1 / 18
A GPU Implementation of a Spatial GLMM: Assessing Spatially Varying - - PowerPoint PPT Presentation
A GPU Implementation of a Spatial GLMM: Assessing Spatially Varying Coefficients of Multiple Sclerosis Lesions Timothy D. Johnson Department of Biostatistics University of Michigan Johnson (University of Michigan) MS-GPU Jan 6, 2014 1 / 18
Timothy D. Johnson
Department of Biostatistics University of Michigan
Johnson (University of Michigan) MS-GPU Jan 6, 2014 1 / 18
1
Disease Background
2
The Model
3
Intro to GPUs
4
Results
5
Acknowledgements
Johnson (University of Michigan) MS-GPU Jan 6, 2014 2 / 18
Disease Background
Autoimmune disease affecting the central nervous system
Johnson (University of Michigan) MS-GPU Jan 6, 2014 3 / 18
Disease Background
Autoimmune disease affecting the central nervous system Affects women more than men
Johnson (University of Michigan) MS-GPU Jan 6, 2014 3 / 18
Disease Background
Autoimmune disease affecting the central nervous system Affects women more than men Commonly diagnosed between 20 and 40 years of age.
Johnson (University of Michigan) MS-GPU Jan 6, 2014 3 / 18
Disease Background
Autoimmune disease affecting the central nervous system Affects women more than men Commonly diagnosed between 20 and 40 years of age. Immune system attacks the myelin sheath surrounding axons
Slows nerve impulse
Johnson (University of Michigan) MS-GPU Jan 6, 2014 3 / 18
Disease Background
Autoimmune disease affecting the central nervous system Affects women more than men Commonly diagnosed between 20 and 40 years of age. Immune system attacks the myelin sheath surrounding axons
Slows nerve impulse
Immune system attacks the myelin
Johnson (University of Michigan) MS-GPU Jan 6, 2014 3 / 18
Disease Background
Autoimmune disease affecting the central nervous system Affects women more than men Commonly diagnosed between 20 and 40 years of age. Immune system attacks the myelin sheath surrounding axons
Slows nerve impulse
Immune system attacks the myelin Causes inflammation visible on MRI
Johnson (University of Michigan) MS-GPU Jan 6, 2014 3 / 18
Disease Background
Autoimmune disease affecting the central nervous system Affects women more than men Commonly diagnosed between 20 and 40 years of age. Immune system attacks the myelin sheath surrounding axons
Slows nerve impulse
Immune system attacks the myelin Causes inflammation visible on MRI Cause of autoimmune disease is unknown, theories include
Viral infection Genetics Environmental factors
Johnson (University of Michigan) MS-GPU Jan 6, 2014 3 / 18
Disease Background
Autoimmune disease affecting the central nervous system Affects women more than men Commonly diagnosed between 20 and 40 years of age. Immune system attacks the myelin sheath surrounding axons
Slows nerve impulse
Immune system attacks the myelin Causes inflammation visible on MRI Cause of autoimmune disease is unknown, theories include
Viral infection Genetics Environmental factors
No known cure
Johnson (University of Michigan) MS-GPU Jan 6, 2014 3 / 18
Disease Background
FSS
Johnson (University of Michigan) MS-GPU Jan 6, 2014 4 / 18
Disease Background
FSS Visual function — blind spots, decrease in visual acuity
Johnson (University of Michigan) MS-GPU Jan 6, 2014 4 / 18
Disease Background
FSS Visual function — blind spots, decrease in visual acuity Cerebellar function — coordination
Johnson (University of Michigan) MS-GPU Jan 6, 2014 4 / 18
Disease Background
FSS Visual function — blind spots, decrease in visual acuity Cerebellar function — coordination Brainstem function — difficulty in speech, swallowing, ...
Johnson (University of Michigan) MS-GPU Jan 6, 2014 4 / 18
Disease Background
FSS Visual function — blind spots, decrease in visual acuity Cerebellar function — coordination Brainstem function — difficulty in speech, swallowing, ... Pyramidal function — walking
Johnson (University of Michigan) MS-GPU Jan 6, 2014 4 / 18
Disease Background
FSS Visual function — blind spots, decrease in visual acuity Cerebellar function — coordination Brainstem function — difficulty in speech, swallowing, ... Pyramidal function — walking Sensory function — touch and pain
Johnson (University of Michigan) MS-GPU Jan 6, 2014 4 / 18
Disease Background
FSS Visual function — blind spots, decrease in visual acuity Cerebellar function — coordination Brainstem function — difficulty in speech, swallowing, ... Pyramidal function — walking Sensory function — touch and pain Bowel and Bladder function
Johnson (University of Michigan) MS-GPU Jan 6, 2014 4 / 18
Disease Background
FSS Visual function — blind spots, decrease in visual acuity Cerebellar function — coordination Brainstem function — difficulty in speech, swallowing, ... Pyramidal function — walking Sensory function — touch and pain Bowel and Bladder function Cerebral function — memory, mood, dementia
Johnson (University of Michigan) MS-GPU Jan 6, 2014 4 / 18
Disease Background
FSS Visual function — blind spots, decrease in visual acuity Cerebellar function — coordination Brainstem function — difficulty in speech, swallowing, ... Pyramidal function — walking Sensory function — touch and pain Bowel and Bladder function Cerebral function — memory, mood, dementia PASAT score Paced auditory serial addition test
Assesses auditory speed/flexibility + calculation ability
Johnson (University of Michigan) MS-GPU Jan 6, 2014 4 / 18
Disease Background
Data are T2 hyperintense lesions
Johnson (University of Michigan) MS-GPU Jan 6, 2014 5 / 18
Disease Background
Data are T2 hyperintense lesions
Johnson (University of Michigan) MS-GPU Jan 6, 2014 5 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering)
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj Xi: subject specific covariate vector.
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj Xi: subject specific covariate vector. β(sj): parameter vector at voxel sj.
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj Xi: subject specific covariate vector. β(sj): parameter vector at voxel sj. MRF (Markov Random Field) model
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj Xi: subject specific covariate vector. β(sj): parameter vector at voxel sj. MRF (Markov Random Field) model
Voxels are rectangular prisms (commonly cubes)
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj Xi: subject specific covariate vector. β(sj): parameter vector at voxel sj. MRF (Markov Random Field) model
Voxels are rectangular prisms (commonly cubes) If sj and sk are neighboring sites, we denote this by sj ∼ sk
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj Xi: subject specific covariate vector. β(sj): parameter vector at voxel sj. MRF (Markov Random Field) model
Voxels are rectangular prisms (commonly cubes) If sj and sk are neighboring sites, we denote this by sj ∼ sk The set of neighbors of sj is denoted by ∂sj
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj Xi: subject specific covariate vector. β(sj): parameter vector at voxel sj. MRF (Markov Random Field) model
Voxels are rectangular prisms (commonly cubes) If sj and sk are neighboring sites, we denote this by sj ∼ sk The set of neighbors of sj is denoted by ∂sj The number of neighbors of sj is N(sj)
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj Xi: subject specific covariate vector. β(sj): parameter vector at voxel sj. MRF (Markov Random Field) model
Voxels are rectangular prisms (commonly cubes) If sj and sk are neighboring sites, we denote this by sj ∼ sk The set of neighbors of sj is denoted by ∂sj The number of neighbors of sj is N(sj) Two voxels are neighbors if they shared a common face
each voxel has at most 6 neighbors
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
sj, j = 1, . . . , M will denote the jth voxel at location (x, y, z) (lexicographical ordering) Yi(sj) ∈ {0, 1}: outcome for subject i = 1, . . . , N at voxel, or site, sj Xi: subject specific covariate vector. β(sj): parameter vector at voxel sj. MRF (Markov Random Field) model
Voxels are rectangular prisms (commonly cubes) If sj and sk are neighboring sites, we denote this by sj ∼ sk The set of neighbors of sj is denoted by ∂sj The number of neighbors of sj is N(sj) Two voxels are neighbors if they shared a common face
each voxel has at most 6 neighbors
If two voxels are not neighbors, they are cond. indep. given their neighbors (Markov property)
Johnson (University of Michigan) MS-GPU Jan 6, 2014 6 / 18
The Model
Let βT = (βT(s1), . . . , βT(sM)) where each β(si) is a p-col vector
Johnson (University of Michigan) MS-GPU Jan 6, 2014 7 / 18
The Model
Let βT = (βT(s1), . . . , βT(sM)) where each β(si) is a p-col vector Following Mardia (1988) the full conditionals are given by [β(si) | β(s−i), Σ] ∼ MVN
N(si) , Σ N(si)
MS-GPU Jan 6, 2014 7 / 18
The Model
Let βT = (βT(s1), . . . , βT(sM)) where each β(si) is a p-col vector Following Mardia (1988) the full conditionals are given by [β(si) | β(s−i), Σ] ∼ MVN
N(si) , Σ N(si)
π [β | Σ] ∝ exp −1 2
T Σ−1 β(si) − β(sj)
Johnson (University of Michigan) MS-GPU Jan 6, 2014 7 / 18
The Model
Let βT = (βT(s1), . . . , βT(sM)) where each β(si) is a p-col vector Following Mardia (1988) the full conditionals are given by [β(si) | β(s−i), Σ] ∼ MVN
N(si) , Σ N(si)
π [β | Σ] ∝ exp −1 2
T Σ−1 β(si) − β(sj)
We write [β | Σ] ∼ G-MRF(Σ)
Johnson (University of Michigan) MS-GPU Jan 6, 2014 7 / 18
The Model
Probit Regression Model Φ−1 Pr(Yi(sj) = 1 | ·)
X T
i α + X T i β(sj)
Johnson (University of Michigan) MS-GPU Jan 6, 2014 8 / 18
The Model
Probit Regression Model Φ−1 Pr(Yi(sj) = 1 | ·)
X T
i α + X T i β(sj)
Xi: vector of p covariates α: associated parameter vector (fixed effects, intercepts and slopes)
Johnson (University of Michigan) MS-GPU Jan 6, 2014 8 / 18
The Model
Probit Regression Model Φ−1 Pr(Yi(sj) = 1 | ·)
X T
i α + X T i β(sj)
Xi: vector of p covariates β(sj): associated parameter vector (spatial random effects) Let βT = (βT(s1), . . . , βT(sM))
Johnson (University of Michigan) MS-GPU Jan 6, 2014 8 / 18
The Model
Probit Regression Model Φ−1 Pr(Yi(sj) = 1 | ·)
X T
i α + X T i β(sj)
= X T
i β∗(sj)
Xi: vector of p covariates Reparametrize: β∗(sj) = α + β(sj)—aids in mixing Let β∗T = (β∗T(s1), . . . , β∗T(sM)) [β∗ | Σ] ∼ G-MRF(Σ) Σ−1 ∼ W(ν, I) ν = 0 gives an improper, uninformative prior
Johnson (University of Michigan) MS-GPU Jan 6, 2014 8 / 18
The Model
We use the Latent Variable rep introduced by Albert & Chib (1993)
1
Introduce latent variable di(sj) ∼ N(X T
i β∗(sj), 1)
Johnson (University of Michigan) MS-GPU Jan 6, 2014 9 / 18
The Model
We use the Latent Variable rep introduced by Albert & Chib (1993)
1
Introduce latent variable di(sj) ∼ N(X T
i β∗(sj), 1)
2
Define Pr[Yi(sj) = 1 | di(sj)] = 1 : di(sj) > 0 : di(sj) ≤ 0
binary regression transformed into a cont. regression problem
Johnson (University of Michigan) MS-GPU Jan 6, 2014 9 / 18
The Model
We use the Latent Variable rep introduced by Albert & Chib (1993)
1
Introduce latent variable di(sj) ∼ N(X T
i β∗(sj), 1)
2
Define Pr[Yi(sj) = 1 | di(sj)] = 1 : di(sj) > 0 : di(sj) ≤ 0
binary regression transformed into a cont. regression problem
3
Given the di(sj) our model now a spatial (continuous) mixed effects model with normal errors.
Johnson (University of Michigan) MS-GPU Jan 6, 2014 9 / 18
The Model
We use the Latent Variable rep introduced by Albert & Chib (1993)
1
Introduce latent variable di(sj) ∼ N(X T
i β∗(sj), 1)
2
Define Pr[Yi(sj) = 1 | di(sj)] = 1 : di(sj) > 0 : di(sj) ≤ 0
binary regression transformed into a cont. regression problem
3
Given the di(sj) our model now a spatial (continuous) mixed effects model with normal errors.
4
Update the full conditional of each di(sj) from its truncated normal distribution
Johnson (University of Michigan) MS-GPU Jan 6, 2014 9 / 18
The Model
We use the Latent Variable rep introduced by Albert & Chib (1993)
1
Introduce latent variable di(sj) ∼ N(X T
i β∗(sj), 1)
2
Define Pr[Yi(sj) = 1 | di(sj)] = 1 : di(sj) > 0 : di(sj) ≤ 0
binary regression transformed into a cont. regression problem
3
Given the di(sj) our model now a spatial (continuous) mixed effects model with normal errors.
4
Update the full conditional of each di(sj) from its truncated normal distribution
5
Given di(sj), model parameters can be drawn from full conditionals due to conjugacy
Johnson (University of Michigan) MS-GPU Jan 6, 2014 9 / 18
The Model
We use the Latent Variable rep introduced by Albert & Chib (1993)
1
Introduce latent variable di(sj) ∼ N(X T
i β∗(sj), 1)
2
Define Pr[Yi(sj) = 1 | di(sj)] = 1 : di(sj) > 0 : di(sj) ≤ 0
binary regression transformed into a cont. regression problem
3
Given the di(sj) our model now a spatial (continuous) mixed effects model with normal errors.
4
Update the full conditional of each di(sj) from its truncated normal distribution
5
Given di(sj), model parameters can be drawn from full conditionals due to conjugacy
6
Marginalizing over di(sj), retrieve probit regression model.
Johnson (University of Michigan) MS-GPU Jan 6, 2014 9 / 18
The Model
15 Subject Specific Covariates 7 functional system scores, PASAT score MS subtype (4 distinct subtypes) age, disease duration, gender—nuisance parameters
Johnson (University of Michigan) MS-GPU Jan 6, 2014 10 / 18
The Model
15 Subject Specific Covariates 7 functional system scores, PASAT score MS subtype (4 distinct subtypes) age, disease duration, gender—nuisance parameters Problem Size 3D image Approx 66 million observed outcomes
275K voxels times 239 subjects
Approx 41 million spatially varying coefficients
275K voxels times 15 covariates
Johnson (University of Michigan) MS-GPU Jan 6, 2014 10 / 18
Intro to GPUs
GPUs dedicate more space to ALUs
Johnson (University of Michigan) MS-GPU Jan 6, 2014 11 / 18
Intro to GPUs
MS-GPU Jan 6, 2014 12 / 18
Intro to GPUs
Three basic strategies for performance optimization
1
maximize parallel execution
find parts of code that can execute in parallel
Johnson (University of Michigan) MS-GPU Jan 6, 2014 13 / 18
Intro to GPUs
Three basic strategies for performance optimization
1
maximize parallel execution
find parts of code that can execute in parallel
2
restructure data reduce memory transfers (esp CPU ⇐ ⇒ GPU) load global into shared memory and process
Johnson (University of Michigan) MS-GPU Jan 6, 2014 13 / 18
Intro to GPUs
Three basic strategies for performance optimization
1
maximize parallel execution
find parts of code that can execute in parallel
2
restructure data reduce memory transfers (esp CPU ⇐ ⇒ GPU) load global into shared memory and process
3
avoid conditional statements (if, switch, do, for, while) replace regular functions by intrinsic functions
e.g. replace x/y by fdividef(x, y) however a little less precise
Johnson (University of Michigan) MS-GPU Jan 6, 2014 13 / 18
Intro to GPUs
My GPU has 2496 processors 5 Gb DRAM All data and parameters fit in 0.53 Gb DRAM Combination of GPU/CPU computation
Johnson (University of Michigan) MS-GPU Jan 6, 2014 14 / 18
Intro to GPUs
2 4 6 8 2 4 6 8 Johnson (University of Michigan) MS-GPU Jan 6, 2014 15 / 18
Intro to GPUs
Update CPU (ms) GPU (ms) Efficiency Gain Latent Var 25059 56 447
10944 130 84
870 24 36
301 14 22 Total 37174 224 166 Times are per iteration For 30K iterations
CPU approx 13 days GPU approx 2 hrs 25 minutes
Johnson (University of Michigan) MS-GPU Jan 6, 2014 16 / 18
Results
Coronal View β
Johnson (University of Michigan) MS-GPU Jan 6, 2014 17 / 18
Results
Coronal View β
Johnson (University of Michigan) MS-GPU Jan 6, 2014 17 / 18
Acknowledgements
Tian Ge
Department of Computer Science, University of Warwick
Thomas E. Nichols
Department of Statistics, University of Warwick
Ernst Wilhelm Rad¨ u
Department of Neurology, University Hospital Basel
US NIH/NINDS grant: 1 R01 NS075066-01A1
Johnson (University of Michigan) MS-GPU Jan 6, 2014 18 / 18