Presentation of "Real-Time Surface Extraction and Visualization - - PDF document

presentation of real time surface extraction and
SMART_READER_LITE
LIVE PREVIEW

Presentation of "Real-Time Surface Extraction and Visualization - - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/267623041 Presentation of "Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs" Data


slide-1
SLIDE 1

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/267623041

Presentation of "Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs"

Data · October 2014

CITATIONS READS

36

3 authors, including: Some of the authors of this publication are also working on these related projects: Virtuell Samhandling - VirSam View project CloudLightning View project Erik Smistad SINTEF

44 PUBLICATIONS 615 CITATIONS

SEE PROFILE

Anne C. Elster Norwegian University of Science and Technology

80 PUBLICATIONS 864 CITATIONS

SEE PROFILE

All content following this page was uploaded by Erik Smistad on 31 October 2014.

The user has requested enhancement of the downloaded file.

slide-2
SLIDE 2

1

Real-time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

Erik Smistad, Anne C. Elster og Frank Lindseth

Erik Smistad, 2012

slide-3
SLIDE 3

2

Introduction

  • Visualization of medical images

– Diagnosis – Progress evaluation – Surgery

  • Medical data

– Raw data from: CT, MRI, ultrasound – Segmentation result

  • Serial surface extraction and visualization of large

medical datasets is time consuming

slide-4
SLIDE 4

3

Introduction

  • Our goal: Utilize GPUs to extract and visualize

surfaces as fast as possible

  • Why do we want it to go faster?

– Enable visualization of real-time data such as ultrasound – During surgery, we don't want to wait for results – Easier and faster to experiment with parameters if results can be visualized immediately

slide-5
SLIDE 5

4

Marching Cubes

  • Algorithm by Lorensen and Cline (1987) for surface

extraction from 3D datasets

  • Divides the dataset into a grid of cubes
  • Creates triangles by examining values at corner points

– Compare with a threshold parameter called the iso value – Limited set of possible configurations:

slide-6
SLIDE 6

5

Marching Cubes in parallel

  • Completely data-parallel

– Each cube can be processed in parallel – No dependencies among the cubes – Large datasets

  • => Ideal for execution on Graphic Processing Units
slide-7
SLIDE 7

6

Marching Cubes in parallel

  • Problem: How to store triangles in parallel??

– Only a few of the cubes will actually produce triangles – To assume all cubes will produce max nr. of triangles will exhaust memory – Need to remove the empty cubes somehow

  • Solution: Stream compaction

– Removing unwanted elements from a large stream of elements

slide-8
SLIDE 8

7

Related work

  • Many have accelerated MC on the GPU

– Most of them use shader programming – NVIDIA has both a CUDA and an OpenCL version included in their SDKs

  • The implementations differ a lot in how they handle

the problem of storing the triangles in parallel

– Reck et al. (2004) removed empty cells on the CPU in advance – NVIDIA use prefix sum scan as stream compaction method – Dyken et al. (2008) use a stream compaction method called Histogram Pyramids

slide-9
SLIDE 9

8

Histogram Pyramids

  • Problem: How to store triangles in parallel
  • What do we need

– A way to filter out all of the empty cells – Total nr. of triangles to allocate memory – An unique index to each cube that produce output

  • One solution: Histogram Pyramids by Ziegler et al.

2006

– A data structure that performs stream compaction – Consists of a stack of textures – Suites the GPUs texture memory system

slide-10
SLIDE 10

9

Histogram Pyramids

  • Construction takes O(log N) when run in parallel
  • Log N reduction steps
  • 2x2 (or 2x2x2 in 3D) cells are summed up in one level

and written to the next level

  • Caching with 2D/3D spatial locality significantly

reduces memory access times

slide-11
SLIDE 11

10

Histogram Pyramids

  • Each element can be

retrieved in O(log N)

  • Create indexes using

the final sum

  • Benefits from cache
slide-12
SLIDE 12

11

Histogram Pyramids

  • We have extended the

data structure to 3D

  • NVIDIA does not

support writing directly to 3D textures

  • Use buffers and

morton codes on NVIDIA GPUs

slide-13
SLIDE 13

12

Our implementation

  • 6 steps
  • OpenCL
  • OpenGL
slide-14
SLIDE 14

13

Our implementation

  • In the first step the dataset is

transferred to the GPU

  • Each cube is processed in

parallel and each corner is compared to the iso value

  • Determines the cube

configuration

  • Store number of triangles

needed for each cube in the base level of the HP

slide-15
SLIDE 15

14

Histogram Pyramids for MC

  • Construction of HP in 3D
  • Over 90% cache hits
  • Total sum is used to allocate

memory for the triangles in a VBO

slide-16
SLIDE 16

15

HistoPyramids for MC

  • Traversal
  • For every cube that produce

triangles

– Traverse the HP to find the 3D coordinates – Construct triangles and normals and store these in the VBO

  • Over 90% cache hits
slide-17
SLIDE 17

16

Rendering: OpenCL-GL interop.

  • OpenGL is used for rendering

– Cross-platform graphics library

  • OpenCL and OpenGL can share

data

  • Makes rendering of data from

OpenCL possible without data transfer back to CPU

  • Synchronization between the two

APIs is needed

– CPU interaction is needed to do this

slide-18
SLIDE 18

17

Memory optimizations

  • Dyken et al. (2008) packed the 3D Histogram

Pyramid into a 2D texture and created all levels with Mipmapping

– Had to use 32 bit integers values for all levels – Address translation

  • 4*M² >= N³
  • Since our implementation creates one 3D texture per

level, smaller integers can be used for the first levels 8 bit ints 16 bit ints

slide-19
SLIDE 19

18

Memory optimizations

Size Default Optimized 64³ 1 < 1 128³ 21 2 256³ 85 18 512³ 1 365 148 1024³ 5 461 1 188 2048³ 87 381 9 509 Memory consumption of HP in MBs

slide-20
SLIDE 20

19

Results

Interactive speeds for datasets with sizes up to 512³ and 1024³

slide-21
SLIDE 21

20

Comparison

Dyken et al. Our implementation

slide-22
SLIDE 22

21

Other implementations

  • NVIDIA has a CUDA and OpenCL implementation of

MC in their SDKs

– Both use prefix sum scan to perform stream compaction – Prefix sum scan use regular buffers

  • The size of buffers is limited
  • 512³ ~ 134 million elements

– NVIDIA OpenCL MC: largest volume possible: 64³ – NVIDIA CUDA MC: largest volume possible: 256³

slide-23
SLIDE 23

22

Conclusions

  • Our implementation can handle larger datasets

– Due to a compressed memory storage scheme for the HP

  • For smaller datasets ours is slower than other

methods

– This was found to be mainly due to an expensive OpenGL-OpenCL synchronization where the CPU is used – OpenCL and OpenGL extensions are proposed to deal with this – Hopefully this synchronization will happen on the GPU in the future

slide-24
SLIDE 24

23

Questions?

  • Source code can be downloaded from:

http://github.com/smistad/GPU-Marching-Cubes/

  • Thanks to NVIDIA, AMD and IDI and IME at NTNU

for their contributions to the HPC Lab at NTNU

View publication stats View publication stats