RTX-RSim Accelerated Vulkan Room Response Simulation for - PowerPoint PPT Presentation

RTX-RSim Accelerated Vulkan Room Response Simulation for Time-of-Flight Imaging Peter Thoman, Markus Wippler, Robert Hranitzky, and Thomas Fahringer peter.thoman@uibk.ac.at IWOCL 2020

Background and Motivation IWOCL 2020 – RTX-RSim 2

The Basic Idea  In room response simulation for time of flight imaging, we are interested in computing the propagation of light  from a light source ( L )  through a room L (defined by some geometry and S surface properties G )  to a sensor array ( S ) G In the real world, L and S are part of a Time-of-flight (ToF) camera assembly. IWOCL 2020 – RTX-RSim 3

The Goal r  Unlike in e.g. image rendering or lighting computations, the goal of the simulation is to compute a radiosity time series for each geometric primitive  Based on this time series, which simulates the actual photons received by a ToF camera sensor, scene depth t can be reconstructed  With RSim, since the exact depth is known, different scenes and reconstruction schemes can be easily evaluated  Use during development of better ToF hardware implementations or software algorithms IWOCL 2020 – RTX-RSim 4

Algorithm Overview 1. Read input data, including geometric primitives ( 𝐻 ), their surface material information ( 𝜍 ), and initial impulse 2. Pre-computation of the per-triangle area ( 𝐵 𝑗 ) 𝐵 𝑗 𝑕 𝑘 𝜐 𝑗𝑘 3. Mutual signal delay computation, storing the 𝑕 𝑗 signal delay for each triangle pair ( 𝑕 𝑗 , 𝑕 𝑘 ) in 𝜐 𝑗𝑘 𝑕 𝑘 4. Mutual visibility computation, evaluating the energy transfer between each triangle pair stochastically and storing in 𝐿 𝑗𝑘 𝑕 𝑗 5. For each timestep 𝑢 ∈ [0, 𝑈 ):  Propagate radiosity, computing 𝑠𝑏𝑒 𝑢,𝑗 for each triangle 𝑕 𝑗 in all pairs ( 𝑕 𝑗 , 𝑕 𝑘 ) based on 𝐿 𝑗𝑘 and 𝑠𝑏𝑒 𝑢−1,𝑗 6. Compute the distance from the light/sensor position to each triangle 𝑕 𝑗 , based on 𝑠𝑏𝑒 [0,𝑈),𝑗 IWOCL 2020 – RTX-RSim 5

Algorithm Performance and Data Requirement Analysis IWOCL 2020 – RTX-RSim 6

Algorithm Steps 1. Input data prep. 2. Pre-compute 𝐵 𝑗 3. Pre-compute 𝜐 𝑗𝑘 4. Mutual visibility Analyse time complexity for each step of the comp.  𝐿 𝑗𝑘 algorithm. 5. Radiosity propagation  𝑠𝑏𝑒 [0,𝑈),𝑗 6. Compute distance IWOCL 2020 – RTX-RSim 7

Algorithm Steps 1. Input data prep. 2. Pre-compute 𝐵 𝑗 3. Pre-compute 𝜐 𝑗𝑘 Steps 1 and 2 iterate over 𝑶 triangles, with simple 4. Mutual visibility I/O operations and area computation for each comp.  𝐿 𝑗𝑘 element. Readily identified as 𝑷 𝑶 complexity. 5. Radiosity propagation  𝑠𝑏𝑒 [0,𝑈),𝑗 6. Compute distance IWOCL 2020 – RTX-RSim 8

Algorithm Steps 1. Input data prep. 2. Pre-compute 𝐵 𝑗 3. Pre-compute 𝜐 𝑗𝑘 Computing propagation delay for each pair of triangles  𝑷 𝑶 𝟑 4. Mutual visibility comp.  𝐿 𝑗𝑘 However, the fixed factor is low, and compared to the remaining phases, even 𝑶 𝟑 complexity is largely 5. Radiosity negligible. propagation  𝑠𝑏𝑒 [0,𝑈),𝑗 6. Compute distance IWOCL 2020 – RTX-RSim 9

Algorithm Steps 1. Input data prep. Stochastically evaluate the visibility between every 2. Pre-compute 𝐵 𝑗 pair of triangles – in naïve implementation requires a ray-triangle intersection check against all other 3. Pre-compute 𝜐 𝑗𝑘 triangles in the scene. With 𝑻 stochastic samples:  𝑃(𝑂 3 ∗ 𝑇) . 4. Mutual visibility comp.  𝐿 𝑗𝑘 In practice, use geometric acceleration structure. Current RSim on CPU uses octrees, resulting in a 5. Radiosity reduction of average-case query complexity from propagation 𝑃 𝑂 to 𝑃 log(𝑂) .  𝑠𝑏𝑒 [0,𝑈),𝑗  𝑷(𝑶 𝟑 ∗ 𝒎𝒑𝒉 𝑶 ∗ 𝑻) 6. Compute distance IWOCL 2020 – RTX-RSim 10

Algorithm Steps 1. Input data prep. Uses signal delay 𝜐 𝑗𝑘 and mutual visibility 2. Pre-compute 𝐵 𝑗 information 𝐿 𝑗𝑘 , as well as the previous radiosity up 3. Pre-compute 𝜐 𝑗𝑘 to the currently computed timestep 𝑠𝑏𝑒 [0,t),𝑗 . 4. Mutual visibility For each timestep 𝑢 and each pair ( 𝑕 𝑗 , 𝑕 𝑘 ): comp.  𝐿 𝑗𝑘 Propagate energy between triangles in the pair from time 𝑢 − 𝜐 𝑗,𝑘 according to mutual visibility as well as 5. Radiosity their surface properties. propagation  𝑠𝑏𝑒 [0,𝑈),𝑗  𝑷(𝑶 𝟑 ∗ 𝑼) 6. Compute distance IWOCL 2020 – RTX-RSim 11

Algorithm Steps 1. Input data prep. 2. Pre-compute 𝐵 𝑗 Distance computation usually based on cross- 3. Pre-compute 𝜐 𝑗𝑘 correlation of radiosity time series. 4. Mutual visibility  𝑷 𝑶 ∗ 𝑼 𝟑 comp.  𝐿 𝑗𝑘 T is usually much smaller than N, and fixed factor is 5. Radiosity very small as well. Usually negligible overall, similar propagation to step 3.  𝑠𝑏𝑒 [0,𝑈),𝑗 6. Compute distance IWOCL 2020 – RTX-RSim 12

Measured Performance 120 Mutual Visibility  Scaling trend matches Relative Performance (Small = 1) 100 observations on Radiosity Simulation 80 algorithmic complexity Other  Clearly mutual visibility 60 computation and radiosity simulation are 40 main priority 20 0 Small Medium Large IWOCL 2020 – RTX-RSim 13

Vulkan Raytracing and Compute for Room Response Simulation IWOCL 2020 – RTX-RSim 14

Data Management  A Vulkan implementation needs to be massively data-parallel to be efficient  And we are constrained in the amount of data we can store on a GPU  Data-centric view of the algorithm IWOCL 2020 – RTX-RSim 15

Data Management Contents Format Size Triangles (G) Indexed vertex buffer 𝑂 3 * FP32 Material information ( ρ ) 𝑂 Raytracing Buffers Internal / opaque 𝑃(𝑂) 2 * FP32 𝑇 Sample Coordinates 𝑂 2 Mutual Visibility ( 𝐿 𝑗𝑘 ) FP16 4 * FP32 Radiosity ( 𝑠𝑏𝑒 ) 𝑂 ∗ 𝑈 Distance FP32 𝑂  Generally, 𝑇 ≪ 𝑈 ≪ 𝑂 , therefore 𝐿 𝑗𝑘 dominates.  FP16 sufficient!  Signal delay 𝜐 𝑗𝑘 recomputed instead of stored. IWOCL 2020 – RTX-RSim 16

Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no  Schematic representation of HW raytracing process IWOCL 2020 – RTX-RSim 17

Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no  Geometry is static  we can optimize AS build for traversal speed rather than build/update performance IWOCL 2020 – RTX-RSim 18

Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no  Descriptor Set: our RT shaders require read-only access to 𝐻 , 𝜍 , and the Sample Coordinates buffer, as well as write access to 𝐿 𝑗𝑘  Shaders: only require ray generation and a single hit and miss shader IWOCL 2020 – RTX-RSim 19

Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no  Ray generation: generate 𝑇 rays for every pair of triangles (order independent, thus 𝑂²/2 − 𝑂 required size, 1D grid)  Aggregate results and write to 𝐿 𝑗𝑘 IWOCL 2020 – RTX-RSim 20

Hardware Raytracing for Mutual Visibility Input Geometry Top-level AS Descriptor Set Build … … Dataset buff buff [ ] [ ] [ ] Acceleration Shader Binding Table Structures … Operation Bottom-level AS … Raygen Hit … Fixed function Miss GPU operation … GPU data structures Raytracing … RT shader Closest Hit yes Acceleration 𝐿 𝑗𝑘 … RT shader invocation Hit? Ray Generation Structure Traversal Miss no  Miss shader: trivial, simply set visible=false for use in raygen shader  Closest hit: check if expected triangle hit IWOCL 2020 – RTX-RSim 21

RTX-RSim Accelerated Vulkan Room Response Simulation for - PowerPoint PPT Presentation

RTX-RSim Accelerated Vulkan Room Response Simulation for Time-of-Flight Imaging Peter Thoman, Markus Wippler, Robert Hranitzky, and Thomas Fahringer peter.thoman@uibk.ac.at IWOCL 2020 Background and Motivation IWOCL 2020 RTX-RSim 2 The

MINECRAFT WITH RTX UPDATE JASON PAUL NVIDIA KASIA SWICA MICROSOFT EMBARGO Until Tuesday,

RTX PRESENTATION Q1 2018/19 PRESENTATION BY CEO PETER RPKE & CFO KRISTIAN FREDERIKSEN

RTX FY 2018/19 PRESENTATION Presentation by CEO Peter Rpke CFO Morten Axel Petersen

RTX PRESENTATION Presentation by CEO Peter Rpke I IR Presentation DISCLAIMER This

RTX PRESENTATION Presentation by CEO Peter Rpke DISCLAIMER This presentation contains

RTX PRESENTATION Q1 2019/20 Presentation by CEO Peter Rpke CFO Morten Axel Petersen

RTX PRESENTATION Q3 2019/20 Presentation by CEO Peter Rpke CFO Morten Axel Petersen

NVIDIA QUADRO RTX NVIDIA TURING GPU Turing SM RT Cores Turing SM RT Cores Up to 10 Giga

RTX PRESENTATION Q3 2018/19 PRESENTATION BY CEO PETER RPKE & CFO MORTEN AXEL PETERSEN

RTX PRESENTATION Q2 2018/19 PRESENTATION BY CEO PETER RPKE & CFO MORTEN AXEL PETERSEN

In the name of G D Treatment of Cytokine Storm in RTX. Hassan Argani Professor of

60% 78% H&N 705,781 18M 60% Prostate 1,276,106 61% Rectum 704,376 new patients per

TRANSFORMING WIRELESS WISDOM INTO SOLUTIONS Presentation by CEO Peter Rpke DISCLAIMER This

Ques3ons about CYC toxicity relevant to RTX approval process 1. What cumula<ve dose of CYC is

EORTC GCG 55994 Randomized phase III study of neoadjuvant CT followed by surgery vs. concomitant

PERC: Double + EKT IETF 99, July 2017, Prague - Cullen & Sergio 1 V3 Agenda One broad

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

A Composite Randomized Incremental Gradient Method Junyu Zhang (University of Minnesota) and

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

Supplemental notes: Kuhn-Tucker first-order conditions P. Dybvig Minimization problem (like in

Generalized Polynomial Decomposition for S-boxes with Application to Side-Channel Countermeasures

GI using Deep Parameter Tuning Mark Fan Wu Wes Weimer Yue Jia Jens Krinke Harman Why GI for

Automorphisms of Divisible Rigid Groups Denis Ovchinnikov Novosibirsk State University, Russia

RTX-RSim Accelerated Vulkan Room Response Simulation for - PowerPoint PPT Presentation

RTX-RSim Accelerated Vulkan Room Response Simulation for Time-of-Flight Imaging Peter Thoman, Markus Wippler, Robert Hranitzky, and Thomas Fahringer peter.thoman@uibk.ac.at IWOCL 2020 Background and Motivation IWOCL 2020 RTX-RSim 2 The

MINECRAFT WITH RTX UPDATE JASON PAUL NVIDIA KASIA SWICA MICROSOFT EMBARGO Until Tuesday,

RTX PRESENTATION Q1 2018/19 PRESENTATION BY CEO PETER RPKE &amp; CFO KRISTIAN FREDERIKSEN

RTX FY 2018/19 PRESENTATION Presentation by CEO Peter Rpke CFO Morten Axel Petersen

RTX PRESENTATION Presentation by CEO Peter Rpke I IR Presentation DISCLAIMER This

RTX PRESENTATION Presentation by CEO Peter Rpke DISCLAIMER This presentation contains

RTX PRESENTATION Q1 2019/20 Presentation by CEO Peter Rpke CFO Morten Axel Petersen

RTX PRESENTATION Q3 2019/20 Presentation by CEO Peter Rpke CFO Morten Axel Petersen

NVIDIA QUADRO RTX NVIDIA TURING GPU Turing SM RT Cores Turing SM RT Cores Up to 10 Giga

RTX PRESENTATION Q3 2018/19 PRESENTATION BY CEO PETER RPKE &amp; CFO MORTEN AXEL PETERSEN

RTX PRESENTATION Q2 2018/19 PRESENTATION BY CEO PETER RPKE &amp; CFO MORTEN AXEL PETERSEN

In the name of G D Treatment of Cytokine Storm in RTX. Hassan Argani Professor of

60% 78% H&amp;N 705,781 18M 60% Prostate 1,276,106 61% Rectum 704,376 new patients per

TRANSFORMING WIRELESS WISDOM INTO SOLUTIONS Presentation by CEO Peter Rpke DISCLAIMER This

Ques3ons about CYC toxicity relevant to RTX approval process 1. What cumula&lt;ve dose of CYC is

EORTC GCG 55994 Randomized phase III study of neoadjuvant CT followed by surgery vs. concomitant

PERC: Double + EKT IETF 99, July 2017, Prague - Cullen &amp; Sergio 1 V3 Agenda One broad

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

A Composite Randomized Incremental Gradient Method Junyu Zhang (University of Minnesota) and

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

Supplemental notes: Kuhn-Tucker first-order conditions P. Dybvig Minimization problem (like in

Generalized Polynomial Decomposition for S-boxes with Application to Side-Channel Countermeasures

GI using Deep Parameter Tuning Mark Fan Wu Wes Weimer Yue Jia Jens Krinke Harman Why GI for

Automorphisms of Divisible Rigid Groups Denis Ovchinnikov Novosibirsk State University, Russia

RTX PRESENTATION Q1 2018/19 PRESENTATION BY CEO PETER RPKE & CFO KRISTIAN FREDERIKSEN

RTX PRESENTATION Q3 2018/19 PRESENTATION BY CEO PETER RPKE & CFO MORTEN AXEL PETERSEN

RTX PRESENTATION Q2 2018/19 PRESENTATION BY CEO PETER RPKE & CFO MORTEN AXEL PETERSEN

60% 78% H&N 705,781 18M 60% Prostate 1,276,106 61% Rectum 704,376 new patients per

Ques3ons about CYC toxicity relevant to RTX approval process 1. What cumula<ve dose of CYC is

PERC: Double + EKT IETF 99, July 2017, Prague - Cullen & Sergio 1 V3 Agenda One broad