Porting Maxwell to the GPU Top Challenges Juan Caada Head of - PowerPoint PPT Presentation

Porting Maxwell to the GPU Top Challenges Juan Cañada Head of Visualization Next Limit Technologies

Agenda - Maxwell overview - Why porting to the GPU was challenging - Performance considerations - Using the CPU to improve the GPU engine - Summary

Maxwell Overview Visualization Fluids Physics

Maxwell Overview MAXWELL • First physically based render in the market (2004) • Ground-truth reference render • Predictive rendering tool • Light analysis tool

Maxwell in use - Animation & VFX - Architecture - Industrial Design - Science - Others

Agenda - Maxwell Render overview - Why porting to the GPU was challenging - Performance considerations - Using the CPU to improve the GPU engine - Summary

Challenges • Keep pixel accuracy • Use GPU for predictive rendering • Improve performance • Spectral, unbiased, accurate PBR • Support CPU & GPU resuming & merging • …

Predictive Rendering

Fast ☺ Correct Correct  Fast

Agenda - Maxwell overview - Why porting to the GPU was challenging - Performance considerations - Using the CPU to improve the GPU engine - Summary

Maxwell GPU Architecture Geometry Voxelization Ray Generation Direct Light GPU Thread Ray Ray Sorting Mapping Tracing Visibility Test Materials TM? Evaluation

GPU Maxwell Geometry Voxelization Ray Generation Direct Light GPU Thread Ray Ray Sorting Mapping Tracing Visibility Test Materials TM? Evaluation

GPU Maxwell • Voxelization • Same Voxelization system as the CPU render • Currently performed in CPU just once • BVH • Binary tree (each node has 2 childs) • Coherent traversal + All threads fetch same amount of data / node + Increase coherence in performance - Trees become bigger

Morton Curve GPU Maxwell • Thread Mapping • Module that manages THREAD / PIXEL mapping • Sampling Level (SL) • Low Morton Curve • Medium Balances SPP • High Uses Variance

GPU Maxwell • Ray Generation Module • Primary Rays (PR) • Rays shot from camera • High degree of coherence • Two neighboring rays will hit nearby similar objects • Secondary Rays (SR) • Rays shot from surfaces • No coherence • Two neighbouring rays might hit different objects

GPU Maxwell • Ray Generation Module • Thread blocks with just PR • High degree of coherence • Best performance situation • Thread blocks with just SR • All will take much more time than PR • The worst SR will drive the performance • Thread blocks with PR and SR • SR will hurt PR performance

GPU Maxwell • Ray Generation Module • How do we handle it? • GPU Ray sorting by Ray Type PR 0 PR 1 SR 0 PR 2 SR 1 PR 3 SR 2 PR 4

GPU Maxwell • Ray Generation Module • How do we handle it? • GPU Ray sorting by Ray Type PR 0 PR 1 SR 0 PR 2 SR 1 PR 3 SR 2 PR 4 PR 0 PR 1 PR 2 PR 3 PR 4 SR 0 SR 1 SR 2

GPU Maxwell • Ray Generation Module • How do we handle it? • GPU Ray sorting by Ray Type • Sorting is really fast • Simple, yet powerful Do it just after 2 nd bounce • • Not needed for PR • Performance boost is scene dependant

GPU Maxwell • Ray Generation Module • How do we handle it? • GPU Ray sorting by Ray Type • Considerations • Not useful for medium to small-res images • Use an indirection buffer • Cleaner code • Avoids moving global data • Much better performance

GPU Maxwell • Ray Tracing Module • GPU architecture dependent kernels • Fermi, Kepler, Maxwell • Use every architecture strengths

GPU Maxwell Render Geometry Voxelization Ray Generation Direct Light GPU Thread Ray Ray Sorting Mapping Tracing Visibility Test Materials TM? Evaluation

GPU Maxwell Direct Light Module 1. Sample scene emitters at each path node • Two strategies • Sample 1 random emitter / sample • Sample all emitters / sample 2. Visibility test • Trace shadow rays • Incoherent rays Ray sorting does not help 3. Many other optimizations

GPU Maxwell • Materials Evaluation Module • Maxwell materials are complex • Many layers and many BSDFs / layer  very generic

GPU Maxwell Materials Evaluation Module • Bbig kernels are harmful • Samples evaluating different materials • Access different data • Execute different code

GPU Maxwell • Materials Evaluation Module • Materials Group Queue System (MGQS) 1. Every material is assigned a Material Group ID 2. Queue system for Material Groups (MG) 3. Every queue has specific kernels • + Avoid big kernels 4. Samples are queued to the corresponding MG Queue 5. All samples evaluating the same MG are executed together • + Increased coherence in execution time • Increased coherence in data access +

GPU Maxwell Render • Materials Evaluation Module • Materials Group Queue System (MGQS) 1. Every material is assigned a Material Group ID 2. Queue system for Material Groups (MG) 3. Every queue has specific kernels • + Avoid big kernels 4. Samples are queued to the corresponding MG Queue 5. All samples evaluating the same MG are executed together • + Increased coherence in execution time • Increased coherence in data access +

GPU Maxwell • Materials Evaluation Module • Materials Group Queue System (MGQS) 1. Every material is assigned a Material Group ID 2. Queue system for Material Groups (MG) 3. Every queue has specific kernels (Avoid big kernels) 4. Samples are queued to the corresponding MG Queue 5. All samples evaluating the same MG are executed together + • Increased coherence in execution time + • Increased coherence in data access

GPU Maxwell • Materials Evaluation Module • Materials Group Queue System (MGQS) 1. Every material is assigned a Material Group ID 2. Queue system for Material Groups (MG) 3. Every queue has specific kernels (Avoid big kernels) 4. Samples are queued to the corresponding MG Queue 5. All samples evaluating the same MG are executed together • Increased coherence in execution time • Increased coherence in data access

Porting Maxwell to the GPU Top Challenges Juan Caada Head of - PowerPoint PPT Presentation

Porting Maxwell to the GPU Top Challenges Juan Caada Head of Visualization Next Limit Technologies Agenda - Maxwell overview - Why porting to the GPU was challenging - Performance considerations - Using the CPU to improve the GPU engine

M E E T S the GPU Agenda - Maxwell Render overview - Maxwell for the GPU -Why? -Why now?

Porting Go to NetBSD/arm64 Maya Rashish <coypu@sdf.org> Porting Go to NetBSD/arm64

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Porting Porting Biological Biological Applications Applications in Grid: An in Grid: An

PORTING THE HAMMER FILE SYSTEM TO LINUX Daniel Lorch June 10, 2009 Outline 2/13 Motivation 1.

Porting OpenVMS to x86-64 Update Clair Grant Camiel Vanderhoeven April 8, 2016 Porting OpenVMS

Challenges in Application Porting and Abstraction Presented by: Raj Johnson, President & CEO

Porting GASNet to Portals: Porting GASNet to Portals: Partitioned Global Address Space (PGAS)

Prex: Finding Guidance for Forward and Backward Porting of Linux Device Drivers Julia Lawall,

Security- -Enhanced Darwin: Enhanced Darwin: Security Porting SELinux to Mac OS X Porting

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

NATIVE MODE PORTING CASE STUDY Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Native mode

Image Analysis for cultural Heritage Digitization INTRODUCTION Naomi Langer - Digitization

KILLER features of the BEAM And what makes the BEAM a unique and powerful tool that really

1 Who we are, and what we do? Driconeq is the world leading producer of drill pipes to the

INVESTIGATION OF THE FATIGUE FAILURE MECHANISMS FOR STITCHED AND UNSTITCHED UNIDIRECTIONAL

NAU AUVSI RoboSub 2016 Mansour Alajemi, Feras Aldawsari, Curtis Green, Dan Heaton, Wenkai Ren,

BIND 9 & BIND10 Joo Damas 1 Monday, 25 June 12 What is BIND? The DNS reference

The WADF support and develop the Artistic Dancing in the World! Solo Dance Competitions

Introduction Background Design Implementation Results 2 Introduction Use dynamics to

Porting Maxwell to the GPU Top Challenges Juan Caada Head of - PowerPoint PPT Presentation

Porting Maxwell to the GPU Top Challenges Juan Caada Head of Visualization Next Limit Technologies Agenda - Maxwell overview - Why porting to the GPU was challenging - Performance considerations - Using the CPU to improve the GPU engine

M E E T S the GPU Agenda - Maxwell Render overview - Maxwell for the GPU -Why? -Why now?

Porting Go to NetBSD/arm64 Maya Rashish &lt;coypu@sdf.org&gt; Porting Go to NetBSD/arm64

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Porting Porting Biological Biological Applications Applications in Grid: An in Grid: An

PORTING THE HAMMER FILE SYSTEM TO LINUX Daniel Lorch June 10, 2009 Outline 2/13 Motivation 1.

Porting OpenVMS to x86-64 Update Clair Grant Camiel Vanderhoeven April 8, 2016 Porting OpenVMS

Challenges in Application Porting and Abstraction Presented by: Raj Johnson, President &amp; CEO

Porting GASNet to Portals: Porting GASNet to Portals: Partitioned Global Address Space (PGAS)

Prex: Finding Guidance for Forward and Backward Porting of Linux Device Drivers Julia Lawall,

Security- -Enhanced Darwin: Enhanced Darwin: Security Porting SELinux to Mac OS X Porting

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

NATIVE MODE PORTING CASE STUDY Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Native mode

Image Analysis for cultural Heritage Digitization INTRODUCTION Naomi Langer - Digitization

KILLER features of the BEAM And what makes the BEAM a unique and powerful tool that really

1 Who we are, and what we do? Driconeq is the world leading producer of drill pipes to the

INVESTIGATION OF THE FATIGUE FAILURE MECHANISMS FOR STITCHED AND UNSTITCHED UNIDIRECTIONAL

NAU AUVSI RoboSub 2016 Mansour Alajemi, Feras Aldawsari, Curtis Green, Dan Heaton, Wenkai Ren,

BIND 9 &amp; BIND10 Joo Damas 1 Monday, 25 June 12 What is BIND? The DNS reference

The WADF support and develop the Artistic Dancing in the World! Solo Dance Competitions

Introduction Background Design Implementation Results 2 Introduction Use dynamics to

Porting Go to NetBSD/arm64 Maya Rashish <coypu@sdf.org> Porting Go to NetBSD/arm64

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Challenges in Application Porting and Abstraction Presented by: Raj Johnson, President & CEO

BIND 9 & BIND10 Joo Damas 1 Monday, 25 June 12 What is BIND? The DNS reference