Massively Parallel Ray Tracing Masahiro Fujita , LTE Inc. Kenji - PowerPoint PPT Presentation

Massively Parallel Ray Tracing Masahiro Fujita , LTE Inc. Kenji Ono, The University of Tokyo, and Riken Sunday, November 13, 2011

Agenda • Motivation and Challenges • Architecture • Result • Conclusion • Future work Sunday, November 13, 2011

Motivation • Visualize large scale data • Compute Units increase • Peta: 100K cores, Exa: 10M+ cores, • Simulation data increase • Storage become problem • Memory per Compute Units decrease • Increasing demand of visual quality Sunday, November 13, 2011

LSV L arge S cale V isualization system developing at Riken Sunday, November 13, 2011

Other Clients Interactive mode Batch mode GUI CLI Client Client Data Reader Simulators File Fle Extended Format A Format B Format API for Clients API for Simulators and Data Reader Steering Result Parameters Structured UNS Molecular Basis Particle Extention Files Mesh Mesh Structure Functions Vis. Core Raw Data Controler Visualization Library Visualization Core Generate Primitives Control Molecular Isosurface Volume Steramlines Extention Parameters Skeleton Switching Process Invocation Local / Remote Service Primitives Rendering Communication Relay Service SW Renderer HW Renderer Extention Image Compositing Images Renderer Selection Sunday, November 13, 2011

Compute ~10,000 ~100 ~1,000,000 Units Simulate Simulate Simulate ... data data data & Visualize Visualize Visualize Sunday, November 13, 2011

Compute ~10,000 ~100 ~1,000,000 Our focus Units Simulate Simulate Simulate ... data data data & Visualize Visualize Visualize Sunday, November 13, 2011

Key components Massively Ray Out-of-core Parallel Tracing Sunday, November 13, 2011

Why raytracing? • Visual quality . Better than OpenGL! • Scalable than OpenGL • Correct handling of transparency, reflection, indirect illumination • Runs on many CPU architectures Sponza model: (C) Marko Dabrovic Sunday, November 13, 2011

Primitives Polygons Volumes Curves Particles Sunday, November 13, 2011

Example Sunday, November 13, 2011

Challenges • Parallel raytracing algorithm itself for 1000+ compute units is challenging • Limited memory per compute unit. • We assume 1GB per compute unit. • 1000+ compute units • MPI problem arises, parallel performance, etc. Sunday, November 13, 2011

Architecture • Out-of-core raytracing • acceleration building, traversal of prim • Exchange rays between Compute Units • Enables correct indirect illumination +Indirect Local Illumination Global Illumination Sunday, November 13, 2011

Accel build Raytracing Shade Image output Sunday, November 13, 2011

Acceleration structure • 2-level BVH(Bounding Volume Hierarcy) • Toplevel: Bounding information • Bottomlevel: Primitive data, BVH data Sunday, November 13, 2011

Scene Sunday, November 13, 2011

BoundingBox data ~ 100KB Toplevel All CUs share Primitive & Bottomlevel Acceleration data ~500MB per CU Sunday, November 13, 2011

Trace ... Isect test Isect test Isect test Find nearest intersection Shade Sunday, November 13, 2011

Trace Isect test Find nearest intersection Toplevel Shade Sunday, November 13, 2011

Trace Isect test Find nearest intersection Bottomlevel Shade Sunday, November 13, 2011

Trace Isect test Find nearest intersection Shade Sunday, November 13, 2011

Reorder by dst node Sunday, November 13, 2011

Result • Terrain • Procedural terrain • 1 path/pixel • CT • Volume -> poly • 1 path/pixel Sunday, November 13, 2011

Measured on RICC • x86 Cluster at Riken • 1024 nodes • 4 Cores x 2 / node • 12GB / node • 8192 cores in total, 1.5GB/core • InfiniBand DDR • MPI Sunday, November 13, 2011

Terrain • Generates 2M polygon per CU • 1024: 2B, 4096: 8B, 8192: 16B polys secs 9000 6750 4500 2250 0 MPI process 1024 2048 4096 8192 Sunday, November 13, 2011

Performance factor • # of surfaces visible to screen • # of BVH node hits in Toplevel BVH traversal • # of computation units(MPI processes) • N^2 communication • So, render time =~ N^2 is expected result Sunday, November 13, 2011

CT • 100 GB volume data input • Generate isosurf poly from volume • 1024 MPI processes • 14.34 secs for out-of-core mesh build • 26.87 secs for render. Sunday, November 13, 2011

Discussion • Unstructured NxN communication • MPI gather/scatter doesn ʼ t work well • Memory soon exhausts • Async, dynamic communication: Tried ADLB , but didn ʼ t scale • Simply MPI sendrecv is only working solution so far. • Hierarchical communication will improve the performance Sunday, November 13, 2011

Discussion, cont. • Memory per process(8,192 MPI procs) • 400~500 MB for MPI library • 200~300 MB for Prim/BVH • 100~200 MB for ray data • Avg 100~200 rays/process at max • Need a frequent ray exchange to reduce memory(MPI comm increases) Sunday, November 13, 2011

Conclusion • Out-of-core, massively parallel raytracing architecture • Confirmed it works up to 8,192 MPI processes • Memory and MPI are the bottleneck for massive environment • Need to find new way for 10k+ compute units Sunday, November 13, 2011

Future work • Simulate, then Visualize • Possible architecture for Exa era • Porting to K computer • Initial trial successes • K specific optimization remains • Integrate fully into LSV framework • Partially integrated already Sunday, November 13, 2011

Acknowledgements • LSV team • Riken • RICC cluster • K computer • FOCUS for x86 cluster • Simon Premoze Sunday, November 13, 2011

References • Matt Pharr, Craig Kolb, Reid Gershbein, and Pat Hanrahan, Rendering Complex Scenes with Memory-Coherent Ray Tracing , Proc. SIGGRAPH 1997 • Johannes Hanika, Alexander Keller, Hendrik P. A. Lensch: Two-level ray tracing with reordering for highly complex scenes . Graphics Interface 2010: 145-152 • Kirill Garanzha, Alexander Bely, Simon Premoze, Vladimir Galaktionov, Out-of-core GPU Ray Tracing of Complex Scenes . Technical talk at SIGGRAPH 2011 Sunday, November 13, 2011

Thank you! • syoyo@lighttransport.com Sunday, November 13, 2011

Massively Parallel Ray Tracing Masahiro Fujita , LTE Inc. Kenji - PowerPoint PPT Presentation

Massively Parallel Ray Tracing Masahiro Fujita , LTE Inc. Kenji Ono, The University of Tokyo, and Riken Sunday, November 13, 2011 Agenda Motivation and Challenges Architecture Result Conclusion Future work Sunday, November

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Advanced Ray Tracing Stochastic ray tracing: distribute rays stochastically across pixel

61A Extra Lecture 9 Announcements Pixels (Demo) Ray Tracing Ray Tracing A technique for

Computer Graphics - Ray Tracing I - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing I

Ray Tracing 1 Ray Tracing Ray Tracing kills two birds with one stone: Solves the Hidden

Ray Tracing Basics CSE 681 Autumn 11 Han-Wei Shen Forward Ray Tracing We shoot a large

Relativistic Ray Tracing in Julia Ryan McKinnon November 30, 2015 Introduction Ray tracing is

Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and Barb Cutler Some slides

Ray tracing Computer Graphics 2006 Based on slides by: Santa Clara University Ray Tracing

Ray-tracing Acceleration Motivation Distribution Ray Tracing Soft shadows

lecture 18 Recall Ray Casting (lectures 7, 8) Ray tracing is like ray casting, but now mirror

Verifpal Cryptographic protocol analysis for students and engineers Nadim Kobeissi FOSDEM

Dependability in the Web Service Architecture Ferda Tartanoglu INRIA ARLES Research Project In

Principles of Software Construction: Objects, Design, and Concurrency Concurrency: Motivation

Stanisaw Ambroszkiew icz the leader of the enTish team IPI PAN, Polish Academy of Sciences

Parham Solaimani, Ph.D. BeDataDriven BV The Hague, The Netherlands What is Renjin R interpreter

RSP Optimisation Techniques M.I. Ali http://intizarali.org @intizarali ali.intizar@insight-

Top-k Web Service Composition in the Context of User Preferences Karim Benouaret 1 , Djamal

CS184c: Computer Architecture [Parallel and Multithreaded] Day 10: May 8, 2001 Synchronization