presentation of real time surface extraction and
play

Presentation of "Real-Time Surface Extraction and Visualization - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/267623041 Presentation of "Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs" Data


  1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/267623041 Presentation of "Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs" Data · October 2014 CITATIONS READS 0 36 3 authors , including: Erik Smistad Anne C. Elster SINTEF Norwegian University of Science and Technology 44 PUBLICATIONS 615 CITATIONS 80 PUBLICATIONS 864 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Virtuell Samhandling - VirSam View project CloudLightning View project All content following this page was uploaded by Erik Smistad on 31 October 2014. The user has requested enhancement of the downloaded file.

  2. 1 Real-time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs Erik Smistad, Anne C. Elster og Frank Lindseth Erik Smistad, 2012

  3. 2 Introduction • Visualization of medical images – Diagnosis – Progress evaluation – Surgery • Medical data – Raw data from: CT, MRI, ultrasound – Segmentation result • Serial surface extraction and visualization of large medical datasets is time consuming

  4. 3 Introduction • Our goal: Utilize GPUs to extract and visualize surfaces as fast as possible • Why do we want it to go faster? – Enable visualization of real-time data such as ultrasound – During surgery, we don't want to wait for results – Easier and faster to experiment with parameters if results can be visualized immediately

  5. 4 Marching Cubes • Algorithm by Lorensen and Cline (1987) for surface extraction from 3D datasets • Divides the dataset into a grid of cubes • Creates triangles by examining values at corner points – Compare with a threshold parameter called the iso value – Limited set of possible configurations:

  6. 5 Marching Cubes in parallel • Completely data-parallel – Each cube can be processed in parallel – No dependencies among the cubes – Large datasets • => Ideal for execution on Graphic Processing Units

  7. 6 Marching Cubes in parallel • Problem: How to store triangles in parallel?? – Only a few of the cubes will actually produce triangles – To assume all cubes will produce max nr. of triangles will exhaust memory – Need to remove the empty cubes somehow • Solution: Stream compaction – Removing unwanted elements from a large stream of elements

  8. 7 Related work • Many have accelerated MC on the GPU – Most of them use shader programming – NVIDIA has both a CUDA and an OpenCL version included in their SDKs • The implementations differ a lot in how they handle the problem of storing the triangles in parallel – Reck et al. (2004) removed empty cells on the CPU in advance – NVIDIA use prefix sum scan as stream compaction method – Dyken et al. (2008) use a stream compaction method called Histogram Pyramids

  9. 8 Histogram Pyramids • Problem: How to store triangles in parallel • What do we need – A way to filter out all of the empty cells – Total nr. of triangles to allocate memory – An unique index to each cube that produce output • One solution: Histogram Pyramids by Ziegler et al. 2006 – A data structure that performs stream compaction – Consists of a stack of textures – Suites the GPUs texture memory system

  10. 9 Histogram Pyramids • Construction takes O(log N) when run in parallel • Log N reduction steps • 2x2 (or 2x2x2 in 3D) cells are summed up in one level and written to the next level • Caching with 2D/3D spatial locality significantly reduces memory access times

  11. 10 Histogram Pyramids • Each element can be retrieved in O(log N) • Create indexes using the final sum • Benefits from cache

  12. 11 Histogram Pyramids • We have extended the data structure to 3D • NVIDIA does not support writing directly to 3D textures • Use buffers and morton codes on NVIDIA GPUs

  13. 12 Our implementation • 6 steps • OpenCL • OpenGL

  14. 13 Our implementation • In the first step the dataset is transferred to the GPU • Each cube is processed in parallel and each corner is compared to the iso value • Determines the cube configuration • Store number of triangles needed for each cube in the base level of the HP

  15. 14 Histogram Pyramids for MC • Construction of HP in 3D • Over 90% cache hits • Total sum is used to allocate memory for the triangles in a VBO

  16. 15 HistoPyramids for MC • Traversal • For every cube that produce triangles – Traverse the HP to find the 3D coordinates – Construct triangles and normals and store these in the VBO • Over 90% cache hits

  17. 16 Rendering: OpenCL-GL interop. • OpenGL is used for rendering – Cross-platform graphics library • OpenCL and OpenGL can share data • Makes rendering of data from OpenCL possible without data transfer back to CPU • Synchronization between the two APIs is needed – CPU interaction is needed to do this

  18. 17 Memory optimizations • Dyken et al. (2008) packed the 3D Histogram Pyramid into a 2D texture and created all levels with Mipmapping – Had to use 32 bit integers values for all levels – Address translation • 4*M² >= N³ • Since our implementation creates one 3D texture per level, smaller integers can be used for the first levels 16 bit ints 8 bit ints

  19. 18 Memory optimizations Size Default Optimized 64³ 1 < 1 128³ 21 2 256³ 85 18 512³ 1 365 148 1024³ 5 461 1 188 2048³ 87 381 9 509 Memory consumption of HP in MBs

  20. 19 Results Interactive speeds for datasets with sizes up to 512³ and 1024³

  21. 20 Comparison Dyken et al. Our implementation

  22. 21 Other implementations • NVIDIA has a CUDA and OpenCL implementation of MC in their SDKs – Both use prefix sum scan to perform stream compaction – Prefix sum scan use regular buffers • The size of buffers is limited • 512³ ~ 134 million elements – NVIDIA OpenCL MC: largest volume possible: 64³ – NVIDIA CUDA MC: largest volume possible: 256³

  23. 22 Conclusions • Our implementation can handle larger datasets – Due to a compressed memory storage scheme for the HP • For smaller datasets ours is slower than other methods – This was found to be mainly due to an expensive OpenGL-OpenCL synchronization where the CPU is used – OpenCL and OpenGL extensions are proposed to deal with this – Hopefully this synchronization will happen on the GPU in the future

  24. 23 Questions? • Source code can be downloaded from: http://github.com/smistad/GPU-Marching-Cubes/ • Thanks to NVIDIA, AMD and IDI and IME at NTNU for their contributions to the HPC Lab at NTNU View publication stats View publication stats

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend