Lecture 9 - GPU Ray Tracing (2) Welcome! , = (, ) - PowerPoint PPT Presentation

INFOMAGR – Advanced Graphics Jacco Bikker - November 2018 - February 2019 Lecture 9 - “GPU Ray Tracing (2)” Welcome! 𝑱 𝒚, 𝒚 ′ = 𝒉(𝒚, 𝒚 ′ ) 𝝑 𝒚, 𝒚 ′ + න 𝝇 𝒚, 𝒚 ′ , 𝒚 ′′ 𝑱 𝒚 ′ , 𝒚 ′′ 𝒆𝒚′′ 𝑻

Today’s Agenda: ▪ Lecture 8 – Loose Ends ▪ State of the Art ▪ Wavefront Path Tracing ▪ Random Numbers

Advanced Graphics – Variance Reduction 3 Lecture 8 Incoming direct light = න 𝑀 𝑒 𝑦, 𝜕 𝑗 cos 𝜄 𝑗 𝑒𝜕 𝑗 𝛻 𝑂 ≈ 2𝜌 𝑂 ෍ 𝑀 𝑒 𝑞, Ω 𝑗 cos 𝜄 𝑗 𝑗=1 = න 𝑀 𝑒 𝑦, 𝜕 𝑗 cos 𝜄 𝑗 𝑒𝜕 𝑗 + න 𝑀 𝑒 𝑦, 𝜕 𝑗 cos 𝜄 𝑗 𝑒𝜕 𝑗 𝐵..𝐶 𝐷..𝐸 −½π +½π A B C D 𝑦

Advanced Graphics – Variance Reduction 4 NEE Next Event Estimation Per surface interaction, we trace two random rays. ▪ Ray A returns (via point 𝑦 ) the energy reflected by 𝑧 (estimates indirect light for 𝑦 ). ▪ Ray B returns the direct illumination on point 𝑦 (estimates direct light on 𝑦 ). ▪ Ray C returns the direct illumination on point 𝑧 , which will reach the sensor via ray A. ▪ Ray D leaves the scene. 𝑧 C D A B 𝑦

Advanced Graphics – Variance Reduction 5 NEE Next Event Estimation Color Sample( Ray ray ) { // trace ray I, N, material = Trace( ray ); BRDF = material.albedo / PI; // terminate if ray left the scene if (ray.NOHIT) return BLACK; // terminate if we hit a light source if (material.isLight) return BLACK; // sample a random light source L, Nl, dist, A = RandomPointOnLight(); Ray lr( I, L, dist ); if (N∙L > 0 && Nl ∙ -L > 0) if (!Trace( lr )) { solidAngle = ((Nl ∙ -L) * A) / dist 2 ; Ld = lightColor * solidAngle * BRDF * N∙L; } // continue random walk R = DiffuseReflection( N ); Ray r( I, R ); Ei = Sample( r ) * (N∙R); return PI * 2.0f * BRDF * Ei + Ld; }

Advanced Graphics – Variance Reduction 6 NEE Next Event Estimation Some vertices require special attention: ▪ If the first vertex after the camera is emissive, its energy can’t be reflected to the camera. ▪ For specular surfaces, the BRDF to a light is always 0. Since a light ray doesn’t make sense for specular vertices, we will include emission from a vertex directly following a specular vertex. The same goes for the first vertex after the camera: if this is emissive, we will also include this. This means we need to keep track of the type of the previous vertex during the random walk.

Advanced Graphics – Variance Reduction 7 NEE Color Sample( Ray ray, bool lastSpecular ) { // trace ray I, N, material = Trace( ray ); BRDF = material.albedo / PI; // terminate if ray left the scene if (ray.NOHIT) return BLACK; // terminate if we hit a light source if (material.isLight) if (lastSpecular) return material.emissive; else return BLACK; // sample a random light source L, Nl, dist, A = RandomPointOnLight(); Ray lr( I, L, dist ); if (N∙L > 0 && Nl ∙ -L > 0) if (!Trace( lr )) { solidAngle = ((Nl ∙ -L) * A) / dist 2 ; Ld = lightColor * solidAngle * BRDF * N∙L; } // continue random walk R = DiffuseReflection( N ); Ray r( I, R ); Ei = Sample( r, false ) * (N∙R); return PI * 2.0f * BRDF * Ei + Ld; }

Today’s Agenda: ▪ Lecture 8 – Loose Ends ▪ State of the Art ▪ Wavefront Path Tracing ▪ Random Numbers

Advanced Graphics – GPU Ray Tracing (2) 9 STAR Previously in Advanced Graphics A Brief History of GPU Ray Tracing 2002: Purcell et al., multi-pass shaders with stencil, grid, low efficiency 2005: Foley & Sugerman, kD-tree, stack-less traversal with kdrestart 2007: Horn et al., kD-tree with short stack, single pass with flow control 2007: Popov et al., kD-tree with ropes 2007: Günther et al., BVH with packets. ▪ The use of BVHs allowed for complex scenes on the GPU (millions of triangles); ▪ CPU is now outperformed by the GPU; ▪ GPU compute potential is not realized; ▪ Aspects that affect efficiency are poorly understood.

Advanced Graphics – GPU Ray Tracing (2) 10 STAR Understanding the Efficiency of Ray Traversal on GPUs* Observations on BVH traversal: Ray/scene intersection consists of an unpredictable sequence of node traversal and primitive intersection operations. This is a major cause of inefficiency on the GPU. Random access of the scene leads to high bandwidth requirement of ray tracing. BVH packet traversal as proposed by Gunther et al. should alleviate bandwidth strain and yield near-optimal performance. Packet traversal doesn’t yield near -optimal performance. Why not? *: Understanding the Efficiency of Ray Tracing on GPUs, Aila & Laine, 2009. and: Understanding the Efficiency of Ray Tracing on GPUs – Kepler & Fermi addendum, 2012.

Advanced Graphics – GPU Ray Tracing (2) 11 STAR Understanding the Efficiency of Ray Traversal on GPUs Simulator: 1. Dump sequence of traversal, leaf and triangle intersection operations required for each ray. 2. Use generated GPU assembly code to obtain a sequence of instructions that need to be executed for each ray. 3. Execute this sequence assuming ideal circumstances: ▪ Execute two instructions in parallel; ▪ Make memory access ‘free’. The simulator reports on estimated execution speed and SIMD efficiency. ➔ The same program running on an actual GPU can never do better; ➔ The simulator provides an upper bound on performance.

Advanced Graphics – GPU Ray Tracing (2) 12 STAR Understanding the Efficiency of Ray Traversal on GPUs Test setup Scene: “Conference”, 282K tris, 164K nodes Ray distributions: 1. Primary: coherent rays 2. AO: short divergent rays 3. Diffuse: long divergent rays Hardware: NVidia GTX285.

Advanced Graphics – GPU Ray Tracing (2) 13 STAR Understanding the Efficiency of Ray Traversal on GPUs Simulator, results, in MRays/s: Packet traversal as proposed by Gunther et al. is a factor 1.7-2.4 off from simulated performance: Sim Simulated Act ctual % Pr Primary 149.2 63.6 43 AO AO 100.7 39.4 39 Dif Diffu fuse 36.7 16.6 45 (this does not take into account algorithmic inefficiencies) Hardware: NVidia GTX285.

Advanced Graphics – GPU Ray Tracing (2) 14 STAR Simulating Alternative Traversal Loops Variant 1: ‘ while- while’ Here, every ray has its own stack; This is simply a GPU implementation while ray not terminated of typical CPU BVH traversal. while node is interior node Compared to packet traversal, traverse to the next node memory access is less coherent. while node contains untested primitives perform ray/prim intersection One would expect a larger gap between simulated and actual Results: performance. However, this is not the Simulated Sim Act ctual % case (not even for divergent rays). Primary Pr 166.7 88.0 53 Conclusion: bandwidth is not the 149.2 63.6 43 AO AO 160.7 86.3 54 problem. 100.7 39.4 39 Diffu Dif fuse 81.4 44.5 55 36.7 16.6 45 numbers in green: Packet traversal, Gunther-style (from previous slide). Hardware: NVidia GTX285.

Advanced Graphics – GPU Ray Tracing (2) 15 STAR Simulating Alternative Traversal Loops Variant 2: ‘if - if’ This time, each loop iteration either executes a traversal step or a while ray not terminated primitive intersection. if node is interior node Memory access is even less coherent traverse to the next node in this case. if node contains untested primitives perform a ray/prim intersection Nevertheless, it is faster than while- while. Why? Results: Simulated Sim Act ctual % While-while leads to a small number of long-running warps. Some threads Primary Pr 129.3 90.1 70 166.7 88.0 53 stall while others are still traversing, AO AO 131.6 88.8 67 160.7 86.3 54 after which they stall again while Diffu Dif fuse 70.5 45.3 64 81.4 44.5 55 others are still intersecting. numbers in green: while-while. Hardware: NVidia GTX285.

Advanced Graphics – GPU Ray Tracing (2) 16 STAR Simulating Alternative Traversal Loops Variant 3: ‘persistent while - while’ This test shows what the limiting factor was: thread scheduling. By Idea: rather than spawning a thread per ray, we spawn the handling this explicitly, we get much ideal number of threads for the hardware. closer to theoretical optimal performance. Each thread increases an atomic counter to fetch a ray from a pool, until the pool is depleted*. Benefit: we bypass the hardware thread scheduler. Results: Simulated Sim Act ctual % Primary Pri 166.7 135.6 81 129.3 90.1 70 *: In practice, this is done per warp: the AO AO 160.7 130.7 81 first thread in the warp increases the 131.6 88.8 67 counter by 32. This reduces the number of Dif Diffu fuse 81.4 62.4 77 70.5 45.3 64 atomic operations. numbers in green: if-if. Hardware: NVidia GTX285.

Lecture 9 - GPU Ray Tracing (2) Welcome! , = (, ) - PowerPoint PPT Presentation

INFOMAGR Advanced Graphics Jacco Bikker - November 2018 - February 2019 Lecture 9 - GPU Ray Tracing (2) Welcome! , = (, ) , + , , ,

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Advanced Ray Tracing Stochastic ray tracing: distribute rays stochastically across pixel

61A Extra Lecture 9 Announcements Pixels (Demo) Ray Tracing Ray Tracing A technique for

Computer Graphics - Ray Tracing I - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing I

Ray Tracing 1 Ray Tracing Ray Tracing kills two birds with one stone: Solves the Hidden

Ray Tracing Basics CSE 681 Autumn 11 Han-Wei Shen Forward Ray Tracing We shoot a large

lecture 18 Recall Ray Casting (lectures 7, 8) Ray tracing is like ray casting, but now mirror

Relativistic Ray Tracing in Julia Ryan McKinnon November 30, 2015 Introduction Ray tracing is

Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and Barb Cutler Some slides

Ray tracing Computer Graphics 2006 Based on slides by: Santa Clara University Ray Tracing

Ray-tracing Acceleration Motivation Distribution Ray Tracing Soft shadows

Introduction to Path Tracing Marc Sunet Table of contents From Ray Tracing to Path Tracing The

Entropy bounds and the holographic principle Raphael Bousso Berkeley Center for Theoretical

Finite size corrections to the classical radiation reaction. Tams Herpay(KFKI-RMKI)

Eleni Eleni Vatamidou, atamidou, Ivo Ivo Adan, Adan, Ma Maria ria Vlasiou, Vlasiou, and

Construction of Hadamard states by pseudo-di ff erential calculus Micha l Wrochna

Minimum-Norm Interpolation in Statistical Learning: new phenomena in high dimensions Tengyuan

Selective Restructuring of Bo nding Vol me Hierarchies for Bounding Volume Hierarchies for

Biclosed sets in representation theory Al Garver, UQAM (joint with Thomas McConville and Kaveh

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,