Massively Parallel Ray Tracing Masahiro Fujita , LTE Inc. Kenji - - PowerPoint PPT Presentation

massively parallel ray tracing
SMART_READER_LITE
LIVE PREVIEW

Massively Parallel Ray Tracing Masahiro Fujita , LTE Inc. Kenji - - PowerPoint PPT Presentation

Massively Parallel Ray Tracing Masahiro Fujita , LTE Inc. Kenji Ono, The University of Tokyo, and Riken Sunday, November 13, 2011 Agenda Motivation and Challenges Architecture Result Conclusion Future work Sunday, November


slide-1
SLIDE 1

Massively Parallel Ray Tracing

Masahiro Fujita, LTE Inc. Kenji Ono, The University of Tokyo, and Riken

Sunday, November 13, 2011

slide-2
SLIDE 2

Agenda

  • Motivation and Challenges
  • Architecture
  • Result
  • Conclusion
  • Future work

Sunday, November 13, 2011

slide-3
SLIDE 3

Motivation

  • Visualize large scale data
  • Compute Units increase
  • Peta: 100K cores, Exa: 10M+ cores,
  • Simulation data increase
  • Storage become problem
  • Memory per Compute Units decrease
  • Increasing demand of visual quality

Sunday, November 13, 2011

slide-4
SLIDE 4

LSV

Large Scale Visualization system developing at Riken

Sunday, November 13, 2011

slide-5
SLIDE 5

Steering Parameters Control Parameters Image Compositing

  • Vis. Core

Controler

HW Renderer SW Renderer Extention Isosurface Volume Extention Generate Primitives Molecular Skeleton Rendering Structured Mesh UNS Mesh Molecular Structure Basis Functions Extention Particle API for Simulators and Data Reader Steramlines Raw Data Primitives Images Process Invocation Service

Simulators Result Files Data Reader

File Format A Extended Format Fle Format B

API for Clients CLI Client GUI Client Other Clients

Communication Relay Service

Visualization Core

Switching Local / Remote

Batch mode Interactive mode

Visualization Library

Renderer Selection

Sunday, November 13, 2011

slide-6
SLIDE 6

Steering Parameters Control Parameters Image Compositing

  • Vis. Core

Controler

HW Renderer SW Renderer Extention Isosurface Volume Extention Generate Primitives Molecular Skeleton Rendering Structured Mesh UNS Mesh Molecular Structure Basis Functions Extention Particle API for Simulators and Data Reader Steramlines Raw Data Primitives Images Process Invocation Service

Simulators Result Files Data Reader

File Format A Extended Format Fle Format B

API for Clients CLI Client GUI Client Other Clients

Communication Relay Service

Visualization Core

Switching Local / Remote

Batch mode Interactive mode

Visualization Library

Renderer Selection

Sunday, November 13, 2011

slide-7
SLIDE 7

Simulate Visualize Simulate & Visualize Simulate Visualize

data data data

... ~10,000 ~100 ~1,000,000

Compute Units

Sunday, November 13, 2011

slide-8
SLIDE 8

Simulate Visualize Simulate & Visualize Simulate Visualize

data data data

... ~10,000 ~100 ~1,000,000

Compute Units

Our focus

Sunday, November 13, 2011

slide-9
SLIDE 9

Key components

Out-of-core

Massively Parallel Ray Tracing

Sunday, November 13, 2011

slide-10
SLIDE 10
  • Visual quality. Better than OpenGL!
  • Scalable than OpenGL
  • Correct handling of transparency,

reflection, indirect illumination

  • Runs on many CPU architectures

Why raytracing?

Sponza model: (C) Marko Dabrovic

Sunday, November 13, 2011

slide-11
SLIDE 11

Primitives

Curves Particles Polygons Volumes

Sunday, November 13, 2011

slide-12
SLIDE 12

Example

Sunday, November 13, 2011

slide-13
SLIDE 13

Challenges

  • Parallel raytracing algorithm itself for

1000+ compute units is challenging

  • Limited memory per compute unit.
  • We assume 1GB per compute unit.
  • 1000+ compute units
  • MPI problem arises, parallel

performance, etc.

Sunday, November 13, 2011

slide-14
SLIDE 14

Agenda

  • Motivation and Challenges
  • Architecture
  • Result
  • Conclusion
  • Future work

Sunday, November 13, 2011

slide-15
SLIDE 15

Architecture

  • Out-of-core raytracing
  • acceleration building, traversal of prim
  • Exchange rays between Compute Units
  • Enables correct indirect illumination

Global Illumination Local Illumination

+Indirect

Sunday, November 13, 2011

slide-16
SLIDE 16

Raytracing Accel build Shade Image output

Sunday, November 13, 2011

slide-17
SLIDE 17

Acceleration structure

  • 2-level BVH(Bounding Volume

Hierarcy)

  • Toplevel: Bounding information
  • Bottomlevel: Primitive data, BVH data

Sunday, November 13, 2011

slide-18
SLIDE 18

Scene

Sunday, November 13, 2011

slide-19
SLIDE 19

Toplevel

Bottomlevel

BoundingBox data ~ 100KB All CUs share Primitive & Acceleration data ~500MB per CU

Sunday, November 13, 2011

slide-20
SLIDE 20

Trace Isect test Isect test Isect test Find nearest intersection Shade

...

Sunday, November 13, 2011

slide-21
SLIDE 21

Trace Isect test Find nearest intersection Shade

Toplevel

Sunday, November 13, 2011

slide-22
SLIDE 22

Trace Isect test Find nearest intersection Shade

Bottomlevel

Sunday, November 13, 2011

slide-23
SLIDE 23

Trace Isect test Find nearest intersection Shade

Sunday, November 13, 2011

slide-24
SLIDE 24

Trace Isect test Find nearest intersection Shade

Sunday, November 13, 2011

slide-25
SLIDE 25

Reorder by dst node

Sunday, November 13, 2011

slide-26
SLIDE 26

Agenda

  • Motivation and Challenges
  • Architecture
  • Result
  • Conclusion
  • Future work

Sunday, November 13, 2011

slide-27
SLIDE 27

Result

  • Terrain
  • Procedural terrain
  • 1 path/pixel
  • CT
  • Volume -> poly
  • 1 path/pixel

Sunday, November 13, 2011

slide-28
SLIDE 28

Measured on RICC

  • x86 Cluster at Riken
  • 1024 nodes
  • 4 Cores x 2 / node
  • 12GB / node
  • 8192 cores in total, 1.5GB/core
  • InfiniBand DDR
  • MPI

Sunday, November 13, 2011

slide-29
SLIDE 29

Terrain

  • Generates 2M polygon per CU
  • 1024: 2B, 4096: 8B, 8192: 16B polys

2250 4500 6750 9000 1024 2048 4096 8192

secs MPI process

Sunday, November 13, 2011

slide-30
SLIDE 30

Performance factor

  • # of surfaces visible to screen
  • # of BVH node hits in Toplevel BVH

traversal

  • # of computation units(MPI processes)
  • N^2 communication
  • So, render time =~ N^2 is expected

result

Sunday, November 13, 2011

slide-31
SLIDE 31

CT

  • 100 GB volume data input
  • Generate isosurf poly from volume
  • 1024 MPI processes
  • 14.34 secs for out-of-core mesh build
  • 26.87 secs for render.

Sunday, November 13, 2011

slide-32
SLIDE 32

Discussion

  • Unstructured NxN communication
  • MPI gather/scatter doesnʼt work well
  • Memory soon exhausts
  • Async, dynamic communication: Tried

ADLB, but didnʼt scale

  • Simply MPI sendrecv is only working

solution so far.

  • Hierarchical communication will improve the

performance

Sunday, November 13, 2011

slide-33
SLIDE 33

Discussion, cont.

  • Memory per process(8,192 MPI procs)
  • 400~500 MB for MPI library
  • 200~300 MB for Prim/BVH
  • 100~200 MB for ray data
  • Avg 100~200 rays/process at max
  • Need a frequent ray exchange to

reduce memory(MPI comm increases)

Sunday, November 13, 2011

slide-34
SLIDE 34

Agenda

  • Motivation and Challenges
  • Architecture
  • Result
  • Conclusion
  • Future work

Sunday, November 13, 2011

slide-35
SLIDE 35

Conclusion

  • Out-of-core, massively parallel raytracing

architecture

  • Confirmed it works up to 8,192 MPI

processes

  • Memory and MPI are the bottleneck for

massive environment

  • Need to find new way for 10k+ compute

units

Sunday, November 13, 2011

slide-36
SLIDE 36

Agenda

  • Motivation and Challenges
  • Architecture
  • Result
  • Conclusion
  • Future work

Sunday, November 13, 2011

slide-37
SLIDE 37

Future work

  • Simulate, then Visualize
  • Possible architecture for Exa era
  • Porting to K computer
  • Initial trial successes
  • K specific optimization remains
  • Integrate fully into LSV framework
  • Partially integrated already

Sunday, November 13, 2011

slide-38
SLIDE 38

Acknowledgements

  • LSV team
  • Riken
  • RICC cluster
  • K computer
  • FOCUS for x86 cluster
  • Simon Premoze

Sunday, November 13, 2011

slide-39
SLIDE 39

References

  • Matt Pharr, Craig Kolb, Reid Gershbein, and Pat

Hanrahan, Rendering Complex Scenes with Memory-Coherent Ray Tracing, Proc. SIGGRAPH 1997

  • Johannes Hanika, Alexander Keller, Hendrik P. A.

Lensch: Two-level ray tracing with reordering for highly complex scenes. Graphics Interface 2010: 145-152

  • Kirill Garanzha, Alexander Bely, Simon Premoze,

Vladimir Galaktionov, Out-of-core GPU Ray Tracing of Complex Scenes. Technical talk at SIGGRAPH 2011

Sunday, November 13, 2011

slide-40
SLIDE 40

Thank you!

  • syoyo@lighttransport.com

Sunday, November 13, 2011