 
              Using Multiple GPUs To Reconstruct The Brain From Histological Images Prof. Dr. Katrin Amunts Dr. Markus Axer Dr. Timo Dickscheid Jiri Kraus INSTITUTE OF NEUROSCIENCE AND MEDICINE (INM-1)
Requirements : High Resolution Accurate Reconstruction 1 2 Image Data Algorithms
Part 1 Data & Algorithm
Preparation Imaging
Preparation Imaging Analysis* * M. Axer, A novel approach to the human connectome (NeuroImage, 2011)
Preparation Imaging Analysis* Registration Blockface images Histologies (150 sections) (800 sections) * M. Axer, A novel approach to the human connectome (NeuroImage, 2011)
1. Geometric Transformation ( B-Spline model )* * D. Rueckert, Nonrigid registration using free-form deformations (IEEE Trans Med Imaging, 1999)
1. Geometric Transformation ( B-Spline model )* 2. Metric function ( Mutual Information, MI )** 3. Optimizer * D. Rueckert, Nonrigid registration using free-form deformations (IEEE Trans Med Imaging, 1999) ** J. Pluim, Mutual information based registration of medical images (IEEE Trans Med Imaging, 2003)
Blockfaces : Histologies (b-spline) :
Blockfaces : Histologies (affine) :
Assumptions : • Layers: 1000 • Grid size: 10x10 200.000 Parameters Solution : • Efficient global metric • Efficient optimizer (MRF)
Efficient global metric*: 1. Similarity between Histology and Blockface 2. Similarity between consecutive Histologies * B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008) M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration ( MICCAI, 2011)
Efficient global metric*: 1. Similarity between Histology and Blockface (#Displ · #Nodes · #Images) Data Terms 40 · 100 · 1000 = 4.000.000 2. Similarity between consecutive Histologies * B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008) M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration ( MICCAI, 2011)
Efficient global metric*: 1. Similarity between Histology and Blockface (#Displ · #Nodes · #Images) Data Terms 40 · 100 · 1000 = 4.000.000 2. Similarity between consecutive Histologies (#Displ 2 · #Nodes · #Gaps) Data Terms 40 2 · 100 · 999 ≈ 160.000.000 3. Optimizer (MRF*): Best node displacements * B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008) M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration ( MICCAI, 2011)
Efficient global metric*: 1. Similarity between Histology and Blockface (#Displ · #Nodes · #Images) Data Terms 40 · 100 · 1000 = 4.000.000 Refinement 2. Similarity between consecutive Histologies (few 100 iterations) (#Displ 2 · #Nodes · #Gaps) Data Terms 40 2 · 100 · 999 ≈ 160.000.000 3. Optimizer (MRF*): Best node displacements * B. Glocker, Dense Image Registration through MRFs and efficient linear programming (Medical Image Analysis, 2008) M. Feuerstein, Reconstruction of 3D Histology Images by Simultaneous Deformable Registration ( MICCAI, 2011)
Section-wise Registration: (150 CPUs  14 hours) Simultaneous (global) Registration:
Part 2 GPU-accelerated Implementation
1. Distribution of the image sections among mutliple GPUs 2. Each GPU delivers the data terms depending on its assigned image sections calcDataTerms_Horizontally( bf_image, histo_image, nodeDisp d) •  100 Joint Histograms establishJointHistograms(d) •  establishMarginalHistograms() 300 Histograms •  calculate_MIValues() 100 MI values calcDataTerms_Vertically( histo_1, histo_2, d1, d2 ) •  establishJointHistograms(d1,d2) 100 Joint Histograms •  establishMarginalHistograms() 300 Histograms •  calculate_MIValues() 100 MI values
JuDGE (Westmere + Fermi) PSG-Cluster (Ivy Bridge + Kepler)
2 images with 1 GPU (PSG-Cluster) 1. Multiple CUDA-streams (Double Buffering) 342 sec 40 sec  33 sec (17.5 % on Kepler) 2. Incremental atomic operations 81 % Non-atomic: 92 sec vs. 39 sec (2.4 x) Atomic: 342 sec vs. 33 sec (10.3 x) 33 sec 30 % Transfer additional load to the GPU! M2090 K40
• Multiple GPUs offer the power to solve a simultaneous registration within a reasonable time • In the future: Optimization for microscopic images (memory limitations)
JSC, Research Centre Jülich INM, Research Centre Jülich Oliver Bücker Prof. Dr. Katrin Amunts Andrew V. Adinetz Dr. Markus Axer Dr. Timo Dickscheid Nvidia Support David Graessel Philipp Schlömer Jiri Kraus Daniel Schmitz Martin Schober Nicole Schubert Contact: Marcel Huysegoms, m.huysegoms@fz-juelich.de
Appendix
Preparation Imaging Analysis* Low Resolution: High Resolution: 3.000 × 3.000 pixel 100.000 × 100.000 pixel Size: Size: Pixel size: 64 μm × 64 μm Pixel size: 1.3 μm × 1.3 μm File size: 10 MB (8 bit) File size: 10 GB (8 bit) * M. Axer, A novel approach to the human connectome (NeuroImage, 2011)
1. Distribution of the image sections among mutliple GPUs 2. Each GPU delivers the data terms depending on its assigned image sections calcDataTerms_Horizontally( bf_image, histo_image, nodeDisp d) 1000 •  100 Joint Histograms X establishJointHistograms(d) •  40 establishMarginalHistograms() 300 Histograms times •  calculate_MIValues() 100 MI values calcDataTerms_Vertically( histo_1, histo_2, d1, d2 ) 999 •  X establishJointHistograms(d1,d2) 100 Joint Histograms •  1600 establishMarginalHistograms() 300 Histograms times •  calculate_MIValues() 100 MI values
JuDGE (Westmere + Fermi) vs. PSG-Cluster (Ivy Bridge + Kepler)
Recommend
More recommend