the context bioimage data explosion
play

The context : bioimage data explosion High-throughput imaging - PowerPoint PPT Presentation

The context : bioimage data explosion High-throughput imaging techniques have led to bioimage data explosion Submicrometer resolution Whole mouse brain mapping TeraByte scale images Heterogeneous images Need for tools capable to


  1. The context : bioimage data explosion  High-throughput imaging techniques have led to bioimage data explosion  Submicrometer resolution Whole mouse brain mapping  TeraByte scale images  Heterogeneous images  Need for tools capable to process these huge datasets  Which requirements?  Need for automation to extract information  Fully- or semi-automated analysis?  Is human assistance needed? March 26, 2018 2 University Campus Bio-Medico of Rome

  2. TeraStitcher : born for Confocal Light Sheet Microscope Images  Generates a 2D matrix of 3D tiles (image stacks)  Tiles are regularly spaced with reasonably reliable nominal positions - Stack acquisition is performed with a single movement guaranteeing a reliable constant inter-slices distance along z - Long range movements (order of cm) make necessary an alignment step in all directions  Sufficient overlap between adjacent tiles March 26, 2018 4 University Campus Bio-Medico of Rome

  3. TeraStitcher : Ultra-terabyte images  Need to process 16 bits images larger than 20,000 x 20,000 x 3,000 voxels ( ~2.5 TB)  Increasingly high resolutions in X-Y  Multiple channels  Very large slices (e.g. 40 K x 30 K pixels)  Solutions:  Thousands of slices  Sparse datasets  Compressed formats  Parallelism  Multi-channel support March 26, 2018 5 University Campus Bio-Medico of Rome

  4. ?

  5. What is image stitching? Image stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image (from Wikipedia) March 26, 2018 7 University Campus Bio-Medico of Rome

  6. TeraStitcher : overview  Initially developed for confocal light sheet microscopy (CLSM) images  Fast 2D approach to align adjacent stacks  Efficient use of memory resources  Minimum I/O workload (2 readings, 1 writing)  Can generate a multi-resolution representation suited for further processing March 26, 2018 8 University Campus Bio-Medico of Rome

  7. TeraStitcher : software structure VolumeManager Stack StackedVolume 1 N « use s» Stitcher 1 Stitcher + computeDisplacements(algorithm_type: int , … ) + projectDisplacements() + thresholdDisplacements(thres: float ) + computeTilesPlacement(algorithm_type: int ) + mergeTiles(blending_type: int , … )  The software structure provides N « executes » « executes » « use s» means for incorporating new « interfac e» « interfac e» « interfac e» functionalities Ti l esPl acem ent Al go Pai r w i seD i spl Al go D i spl acem ent # def int[3] _di spl acem ent s: … + execut … + execut Displacement e( ) : e( ) « produces » int[3] # di spl acem ent s: + eval float Rel i abi l i t y( ) :  Object-oriented software int + get D i spl acem ent ( di r ect i on: i nt ) : « use s» float ) + t hr eshol d( t hr es: Displacement ) + pr oj ect ( di spl : architecture and design patterns … … MST MIP-NCC … + execute( … ) + execute (… MIP-NCC-Displ ): Displacement  Functional decomposition visible to - MIP_displacements: int[6] - NCC_widths: float[3] Strategy - NCC_maxs: float[3] the user through command line - reliab_factors: float[3] + evalReliability(): float options + … Abstract Factory March 26, 2018 9 University Campus Bio-Medico of Rome

  8. TeraStitcher : separating the strategy from the algorithm  Strategy (aims at minimizing memory requirements) o Divide the volume into layers (substacks) along Z o For each layer o Keep in memory a row (or column) of substacks at the time o For each pair of adjacent substacks, call  Algorithm o Compute the Maximum Intensity Projections (MIPs) along X, Y, and Z for the two substacks o Apply 2D Normalized Cross-Correlation (NCC) to the three pairs of MIPs to find three 2D displacements o Output “most - reliable” displacement for each direction March 26, 2018 10 University Campus Bio-Medico of Rome

  9. TeraStitcher : global optimization step  Global optimization is horizontal axis H performed using V:375( ฀ ) V:375( ฀ ) V:375( ฀ ) V:0( ฀ ) computed (or even H:375( ฀ ) H:0( ฀ ) H:0( ฀ ) H:0( ฀ ) D:0 ( ฀ ) D:0( ฀ ) D:0( ฀ ) D:0( ฀ ) V:375( ฀ ) V:375( ฀ ) V:375( ฀ ) V:375( ฀ ) V:372( 1,41 ) MST along V H:0( ฀ ) H:0( ฀ ) H:0( ฀ ) H:0( ฀ ) nominal) alignments H:3( 1,22 ) D:0( ฀ ) D:0( ฀ ) D:0( ฀ ) D:0( ฀ ) D:-4( 1,35 ) V:0( ฀ ) V:0( ฀ ) V:0( ฀ ) V:1( 1,37 ) and their reliability H:375( ฀ ) H:375( ฀ ) H:373( 1,15 ) H:371( 1,35 ) D:0 ( ฀ ) D:0 ( ฀ ) D:2( 1,11 ) D:-3( 1,37 ) vertical axis V V:375( ฀ ) V:375( ฀ ) V:371( 1,36 ) V:372( 1,21 ) V:374( 1,25 ) H:0( ฀ ) H:0( ฀ ) H:2( 1,35 ) H:4( 1,35 ) H:6( 1,15 ) D:0( ฀ ) D:0( ฀ ) D:2( 1,33 ) D:-1( 1,31 ) D:2( 1,12 )  The resulting V:0( ฀ ) V:0( ฀ ) V:1( 1,40 ) V:3( 1,34 ) MST along H H:375( ฀ ) H:375( ฀ ) H:375( 1,26 ) H:373( 1,23 ) D:0 ( ฀ ) D:0 ( ฀ ) D:-1( 1,12 ) D:0( 1,08 ) alignments are used to V:375( ฀ ) V:375( ฀ ) V:375( ฀ ) V:375( ฀ ) V:372( 1,35 ) H:0( ฀ ) H:0( ฀ ) H:0( ฀ ) H:0( ฀ ) H:3( 1,22 ) D:0( ฀ ) D:0( ฀ ) D:0( ฀ ) D:0( ฀ ) D:-1( 1,37 ) generate the xml file V:0( ฀ ) V:0( ฀ ) V:0( ฀ ) V:0( ฀ ) H:375( ฀ ) H:375( ฀ ) H:375 ( ฀ ) H:375( ฀ ) D:0 ( ฀ ) D:0( ฀ ) D:0( ฀ ) D:0 ( ฀ ) V:375( ฀ ) V:375( ฀ ) V:375( ฀ ) V:372( 1,37 ) V:374( 1,35 ) MST along D H:0( ฀ ) H:0( ฀ ) H:0( ฀ ) H:1( 1,30 ) H:5( 1,22 )  Manual intervention is D:0( ฀ ) D:0( ฀ ) D:0( ฀ ) D:0( 1,40 ) D:-1( 1,12 ) V:0( ฀ ) V:0( ฀ ) V:1( 1,35 ) V:0( 1,32 ) H:375( ฀ ) H:375( ฀ ) H:372( 1,40 ) H:373( 1,22 ) possible to correct D:0 ( ฀ ) D:0 ( ฀ ) D:-3( 1,09 ) D:-1( 1,10 ) isolated errors March 26, 2018 12 University Campus Bio-Medico of Rome

  10. TeraStitcher : developments and issues  Parallelization: maintenance issues  different alternatives to exploit parallel processing  different code for sequential and parallel version  Solution: launching multiple instances on disjoint sub- regions  command line options to specify the sub-region to process  use a scripting language (python) to implement a driver  Alignment computation (computation bound)  each instance processes a group of adjacent sub-stacks  merge the xml files with alignment information  Generation of the stitched image (I/O bound)  the directory structure is first generated  each instance writes disjoint groups of tiles in parallel  metadata to be used in subsequent steps are finally generated March 26, 2018 13 University Campus Bio-Medico of Rome

  11. Why do we need GPUs?  Tile size: 2048 x 2048 x 2950 voxel (16 bits per voxel)  Tile matrix: 3 x 3  Total dataset size: Tile size x #tiles ( >220 GByte )  Stitching on one Power8 (2.061 Ghz) core: ~15000 secs !  Stitching on one Xeon ES2640 (2.6 Ghz) core: ~20000 secs !  GPUs come to rescue! March 26, 2018 16

  12. Normalized Cross Correlation  The most time-consuming part of the stitching process is the evaluation of the Normalized Cross- Correlation (NCC) between the Maximum Intensity Projections (MIP) of the tiles.  NCC is a variant of the classic Cross Correlation in which data are normalized by subtracting the mean and dividing by the standard deviation of the two datasets.  TeraStitcher implements a fast NCC from J. P. Lewis, “Fast Template Matching”, Vision Interface (1995). March 26, 2018 17

  13. CPU Normalized Cross Correlation f_mean = t_mean = 0.; for ( i=0, pxl1=im1, pxl2=im2; i<dimi; i++, pxl1+=stride, pxl2+=stride ) { for ( j=0; j<dimj; j++, pxl1++, pxl2++ ) { f_mean += *pxl1; t_mean += *pxl2; } } f_mean /= (dimi*dimj); t_mean /= (dimi*dimj); numerator = factor1 = factor2 = 0.; for ( ij=0, pxl1=im1, pxl2=im2; ij <dimi*dimj; ij++) { f_prime = pxl1[(ij%dimj)+(stride+dimj)*(ij/dimj)] - f_mean; t_prime = pxl2[(ij%dimj)+(stride+dimj)*(ij/dimj)] - t_mean; numerator += pxl1[(ij%dimj)+(stride+dimj)*(ij/dimj)] * t_prime; factor1 += f_prime * f_prime; factor2 += t_prime * t_prime; } March 26, 2018 18

  14. From CPU TO GPU NCC  The previous fragment of code is repeated ( 2*delayu+1 ) x ( 2*delayv+1 ) times at each iteration (there are few hundreds iterations per run).  A typical value of delayu and delayv is ~ 100 We developed a gpu_NCC kernel that does all the work with a single invocation per iteration:  minimize the overhead due to the invocation  allow to overlap memory copies from CPU to GPU (using streams ) March 26, 2018 19

  15. GPU NCC  The gpu_NCC kernel heavily relies on shuffle primitives to compute the mean values required by the NCC #define CALCSUM(v) (v)=warpReduceSumD((v)); \ if(lane==0) shared[wid]=(v); \ __syncthreads(); \ (v) = (threadIdx.x<(blockDim.x/warpSize))*shared[lane]; \ if(wid==0) (v)=warpReduceSumD((v)); \ if(tid==0) shared[0]=(v); \ __syncthreads(); \ (v)=shared[0]; \ __syncthreads(); #define CALCAVE(v) CALCSUM(v) \ (v) /= (dimi*dimj); March 26, 2018 20

  16. Test Platform  IBM S822LC HPC, courtesy of IBM Italy (G. Richelli)  512 GB of RAM,  16 Power8 cores (SMT 8)  2 960 GB solid state disks  4 P100 GPUs. March 26, 2018 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend