Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model - - PowerPoint PPT Presentation

load balanced parallel gpu out of core for continuous lod
SMART_READER_LITE
LIVE PREVIEW

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model - - PowerPoint PPT Presentation

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization Chao Peng, Peng Mi and Yong Cao Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA Ultrascale Visualization Workshop 2012 Motivation How


slide-1
SLIDE 1

Ultrascale Visualization Workshop 2012

Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization

Chao Peng, Peng Mi and Yong Cao

slide-2
SLIDE 2

Ultrascale Visualization Workshop 2012

Motivation

  • How to efficiently render a large 3D model

that contains a lot of objects and triangles?

The Boeing 777 model: Triangles: 332 million Vertices: 223 million Objects: 719 thousand Rendering difficulties: The objects have dramatically different shapes and are topologically disconnected. The data size is far beyond the GPU rendering capabilities.

slide-3
SLIDE 3

Ultrascale Visualization Workshop 2012

The Previous Approach

  • Our GPU-based approach in EuroGraphics’12.

– Parallel Continuous LOD: triangle-level mesh simplification. – GPU Out-of Core: CPU-GPU data streaming.

slide-4
SLIDE 4

Ultrascale Visualization Workshop 2012

A Multi-GPU and Multi-Display System

The input triangle data set CPU Core GPU Device CPU Core GPU Device

slide-5
SLIDE 5

Ultrascale Visualization Workshop 2012

The approach on a single GPU

LOD Selection GPU Out-of-Core Triangle Reformation Rendering (VBO)

slide-6
SLIDE 6

Ultrascale Visualization Workshop 2012

LOD Selection GPU Out-of-Core Triangle Reformation Rendering (VBO)

O1 O2 O3 O4 O5 O6 O7

Existing Data

Coherence Evaluation

CPU GPU Defragmentation

slide-7
SLIDE 7

Ultrascale Visualization Workshop 2012

Performance Bottleneck

O1 O2 O3 O4 O5 O6 O7

Coherence Evaluation

CPU GPU Defragmentation GPU Out-Of-Core 45% Triangle Reformation 20% OpenGL VBO Rendering 28%

slide-8
SLIDE 8

Ultrascale Visualization Workshop 2012

Contributions

LOD Selection GPU Out-of-Core Triangle Reformation Rendering (VBO) LOD Selection GPU Out-of-Core Triangle Reformation Rendering (VBO) Inter-GPU Communication Final Display Final Display Load Balancing

slide-9
SLIDE 9

Ultrascale Visualization Workshop 2012

Load Balancing

Viewpoint

1 4 5 3 2

n1 n2 n3 n4 n5 1 2 3 4 5 LOD Selection Result:

slide-10
SLIDE 10

Ultrascale Visualization Workshop 2012

Load Balancing

Viewpoint

1 4 5 3 2

n1 n2 n3 n4 n5 1 2 3 4 5 LOD Selection Result: GPU1 GPU2 n1 n2 n3 n4 n5

n1+n2 n3+n4+n5 [1-t, 1+t]

GPU1: GPU2:

slide-11
SLIDE 11

Ultrascale Visualization Workshop 2012

Load Balancing

Viewpoint

1 4 5 3 2

n1 n2 n3 n4 n5 1 2 3 4 5 LOD Selection Result: GPU1

[1-t, 1+t]

n1 n2 n3 n4 n5 GPU1: GPU2: GPU2

n1+n2 n3+n4+n5

slide-12
SLIDE 12

Ultrascale Visualization Workshop 2012

Load Balancing

Viewpoint

1 4 5 3 2

n1 n2 n3 n4 n5 1 2 3 4 5 LOD Selection Result: GPU1 GPU2

n1+n2+n3+n4 n5 [1-t, 1+t]

n1 n2 n3 n4 n5 GPU1: GPU2:

slide-13
SLIDE 13

Ultrascale Visualization Workshop 2012

Load Balancing

Viewpoint

1 4 5 3 2

n1 n2 n3 n4 n5 1 2 3 4 5 LOD Selection Result: GPU1 GPU2

[1-t, 1+t]

n1 n2 n3 n4 n5 GPU1: GPU2:

n1+n2+n3+n4 n5

slide-14
SLIDE 14

Ultrascale Visualization Workshop 2012

Load Balancing

Viewpoint

1 4 5 3 2

n1 n2 n3 n4 n5 1 2 3 4 5 LOD Selection Result: GPU1 GPU2

[1-t, 1+t] n1+n2+n3 n4+n5

n1 n2 n3 n4 n5 GPU1: GPU2:

slide-15
SLIDE 15

Ultrascale Visualization Workshop 2012

Inter-GPU Communication

Displayed image on GPU1 Displayed image on GPU2 Rendered image on GPU1 Rendered image on GPU2 CUDA Inter-Process Communication (CUDA 4.1 IPC) for transferring image buffer.

slide-16
SLIDE 16

Ultrascale Visualization Workshop 2012

Implementation

  • Two GPUs:

– NVIDIA GTX 580. – 512 cores, 3 GB DDR5. – 192.4 GB/s memory bandwidth.

  • CPU Main Memory:

– 16 GB RAMs.

  • Rendering performance:

– An average of 20 fps on the Linux system with MPI and CUDA 4.2.

slide-17
SLIDE 17

Ultrascale Visualization Workshop 2012

slide-18
SLIDE 18

Ultrascale Visualization Workshop 2012

Performance Evaluation

  • Comparison

– Dual-GPU with load balancing (our approach). – Dural-GPU without load balancing. – Single GPU.

slide-19
SLIDE 19

Ultrascale Visualization Workshop 2012

Performance Evaluation

slide-20
SLIDE 20

Ultrascale Visualization Workshop 2012

Performance Evaluation

Approach FPS Diff. Triangle Num. Visible Triangle Num. Load Balancing GPU Out-Of- Core Triangle Reformation GL Rendering

Single-GPU 14.94

  • 12.29 M
  • 29.62 ms

3.62 ms 30.24 ms Dual-GPU (NB) 17.84 7.94 M 12.29 M

  • 24.54 ms

2.85 ms 25.31 ms

Dual-GPU (B)

20.40 0.37 M 12.29 M 5.38 ms 18.56 ms 1.97 ms 19.13 ms

slide-21
SLIDE 21

Ultrascale Visualization Workshop 2012

Performance Evaluation

slide-22
SLIDE 22

Ultrascale Visualization Workshop 2012

Conclusion

  • A rendering system with two GPUs:

– The workload balancer based on view- frustum partitioning method.

  • Inter-GPU communication for image

re-arrangement.

  • Future work:

– Scalability beyond two GPUs.

slide-23
SLIDE 23

Ultrascale Visualization Workshop 2012

Acknowledgment

slide-24
SLIDE 24

Ultrascale Visualization Workshop 2012

Thank you.