ra ray trace traced d re refl flect ctio ions ns in in
play

Ra Ray-Trace Traced d Re Refl flect ctio ions ns in in - PowerPoint PPT Presentation

It Just Works: Ra Ray-Trace Traced d Re Refl flect ctio ions ns in in Battlefield V Johannes Deligiannis Jan Schmid EA DICE *PL ACEHOL DER* * PLAY GAMESCOM TRAILER OR SIMILAR * TODAY we present Raytracing Project


  1. ”It Just Works”: Ra Ray-Trace Traced d Re Refl flect ctio ions ns in in ’Battlefield V’ Johannes Deligiannis Jan Schmid EA DICE

  2. *PL ACEHOL DER* * PLAY GAMESCOM TRAILER OR SIMILAR *

  3. TODAY we present Raytracing • Project background • GPU Raytracing Pipeline • Engine integration of DXR • GPU Performance

  4. Battlefield V • FPS set in WWII • Released Nov 2018 • Raytracing work began Dec 2017 • First DXR game released!

  5. Project Background • • ~10 months dev time Engineering • • Yasin Uludag (EA DICE) Use DXR in Battlefield V • • AO Johannes Deligiannis (EA DICE) • • GI Jiho Choi (NVIDIA) • • Shadows Pawel Kozlowski (NVIDIA) • Reflections • And a bunch of other people! ☺

  6. Main Challanges • • Not a Tech Demo Early adopter tax • • Content is set API not final • Driver hang/bugs • Game in full production • BSoD • Scope of Engine changes • No capture tool (Nsight, Pix) • Performance • But we shipped it ☺ • Denoising vs Ray Count • No RTX cards

  7. 10

  8. 11 (simple) raytracing pipeline Intersect/Material Generate Rays Light Rays Light Combine Data

  9. 12 Generate Rays Lookup Texture G Buffer *Tomasz Stachowiak and Yasin Uludag, Siggraph 2015. “ Stochasti hastic c Screen en-Space Space Reflect ection ions ”

  10. 13 Raytracing MAGIC

  11. 14 Light Rays float4 light(MaterialData surfaceInfo, float3 rayDir) { foreach (light : pointLights) radiance += calcPoint(surfaceInfo, rayDir, light); foreach (light : spotLights) radiance += calcSpot(surfaceInfo, rayDir, light); foreach (light : reflectionVolumes) radiance += calcReflVol(surfaceInfo, rayDir, light); … }

  12. 15 Light Combine Lookup Texture Lit Raster result

  13. 16 unhappy Rays Contribute Less Bad bad bad, very sad crying faces Sloooow Very Noisy

  14. 17 Improving raytracing pipeline Variable Rate Tracing Intersect/Material Generate Rays Light Rays Light Combine Data

  15. 18 Variable Rate Tracing 0 0 128 128 128 256 256 128 0 0 128 128 Classify Max Ratio 0 .1 .1 0 0 .5 .5 0 .1 .2 .2 .1 .5 1 1 .5 Normalize 0 .1 .1 0 0 .5 .5 0

  16. 19 Variable Rate Tracing 256 rays 128 rays 64 rays 32 rays

  17. 20 Variable Rate Tracing Success! - More Rays on Water - More Rays on grazing angles

  18. 21 Problem

  19. 22 Improving raytracing pipeline Variable Rate Tracing Intersect/Material Generate Rays Light Rays Light Combine Data Ray Binning

  20. 23 Ray Binning 3 012 Screen Offset Bin Index Angle

  21. 24 Ray Binning

  22. 25 Ray Binning Local Offsets Rays Atomic Increment Ray 1000 0 Bin 3011 Ray 1001 0 Bin 3013 Ray 1002 1 Bin 3011 2 0 1 Bin 3011 Bin 3012 Bin 3013

  23. 26 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002 Local Offsets 0 0 1 Exclusive Parallel Sum * 2 0 1 Bin 3011 Bin 3012 Bin 3013 *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA”

  24. 27 Ray Binning Bin 3011 Bin 3012 Bin 3013 1000 1002 1002 Local Offsets Rays Lookup Add 0 Ray 1000 0 Ray 1002 Add Add 1 Ray 1001

  25. 28 Problem

  26. 29 Improving raytracing pipeline SSR Variable Rate Hybridization Tracing Intersect/Material Generate Rays Light Rays Light Combine Data Ray Binning

  27. 30 SS SS-Hybridization Miss Rejected [Stachowiak et al 15] "Stochastic Intersect/Material Hierarchical Screen Rays Give Up Screen-Space Reflections" Data Space Trace Material Data Material Data Radiance Light Material

  28. 31 SS SS-Hybridization

  29. 32 SS SS-Hybridization

  30. 33 Problem Busy Idle Busy Idle Hit Miss Hit Miss Idle Busy Idle Miss Hit Miss Busy Hit Busy Busy Idle Miss Hit Hit Miss Idle Miss Miss Hit Hit Idle Idle Busy Busy Light Shader Wavefront Raytrace

  31. 34 Improving raytracing pipeline SSR Variable Rate Hybridization Tracing Intersect/Material Generate Rays Light Rays Light Combine Data Ray Binning Defrag

  32. 35 Defrag 1 0 1 0 0 1 1 0 1 0 1 0 Exclusive Parallel Sum * 0 1 1 2 2 2 3 4 4 5 5 6 Hit Hit Hit Hit Hit Hit *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA”

  33. 36 Problem Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Light Shader 2.0ms

  34. 37 Improving raytracing pipeline SSR Variable Rate Hybridization Tracing Intersect/Material Per Cell Light List Generate Rays Light Rays Light Combine Data Lighting Ray Binning Defrag

  35. 38 Per Cell Light Lists Light 2 Light 3 Next Next Light 0 Light 1 Light 3 Next Next Next

  36. 39 Problem

  37. 40 Improving raytracing pipeline SSR Variable Rate Denoise Hybridization Tracing Intersect/Material Per Cell Light List Generate Rays Light Combine Data Lighting Ray Binning Defrag

  38. 41 Denoising Reuse Reuse Spatial [Stachowiak et al 15] "Stochastic Temporal Information Screen-Space Reflections" Information Temporal BRDF Filter Filter

  39. 42 BRDF Denoise Filter 𝑀 𝑗 𝑚 𝑙 𝑔 𝑡 𝑚 𝑙 → 𝑤 cos Θ 𝑚 𝑙 𝑂 σ 𝑙=1 𝑞 𝑙 Kernel Size???? 𝑀 0 ≈ 𝐺𝐻 𝑔 𝑡 𝑚 𝑙 → 𝑤 cos Θ 𝑚 𝑙 𝑂 σ 𝑙=1 𝑞 𝑙

  40. 43 BRDF Denoise Filter ?????

  41. 44 BRDF Denoise Filter

  42. 45 BRDF Denoise Filter Frame N -1 Frame N

  43. 46 BRDF Denoise Filter Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad Actual: up to 13 Pad Pad Thread Thread Thread Thread Pad Pad Pad Pad Thread Thread Thread Thread Pad Pad Actual: 16 Pad Pad Thread Thread Thread Thread Pad Pad Pad Pad Thread Thread Thread Thread Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Actual: 6 Pad Pad Pad Pad Pad Pad Pad Pad

  44. 47 BRDF Denoise Filter

  45. 48 Temporal Denoise Filter Is it a good sample? If only... BRDF Denoiser!

  46. 49 temporal Denoise Filter Still Noisy

  47. 50 Image Denoise Filter Generate LUT { angle, roughness } to { width, height } for unit length ray

  48. 51 Image Denoise Filter ∗ =

  49. 52 Image Denoise Filter 1 1 ∗ = 2 2

  50. 53 Image Denoise Filter

  51. 54 New Pipeline Screen Variable Generate Intersect/ Ray Binning Space Rate Tracing Rays Material Data Hybrid 0.37ms 0.19ms 0.15ms 0.36ms 1.98ms

  52. 55 New Pipeline 6.29ms total Intersect/ ‘Improved’ Temporal Defrag Spatial Filter Image Filter Material Data Lighting Filter 0.46ms 1.45ms 0.24ms 1.00ms 1.98ms 0.08ms

  53. 56

  54. D XR – a.k .a ” BLAC K BO X” No DXR Intersection Shading

  55. DXR b asi cs A D D C B A • BLAS - Bottom Level BLAS Acceleration Structure 𝑑 1,1 ⋯ 𝑑 4,1 ⋮ ⋱ ⋮ 𝑐 1,1 ⋯ 𝑐 4,1 𝑏 1,4 ⋯ 𝑑 4,4 • 𝑏 1,1 ⋯ 𝑏 4,1 ⋮ ⋱ ⋮ TLAS - Top Level x ⋮ ⋱ ⋮ 𝑏 1,4 ⋯ 𝑐 4,4 𝑏 1,4 ⋯ 𝑏 4,4 Acceleration Structure 𝑒 1,1 ⋯ 𝑒 4,1 • CS x ⋮ ⋱ ⋮ 𝑒 1,4 ⋯ 𝑒 4,4 • Skinning, Destruction TLAS • Compute shader • Update each frame A A • Blas can update incrementally CS D D C B A

  56. ACCEL ERATI ON STRUCTURE • Which objects? • Frustum Culling • Occlusion Culling • Easy... no culling!

  57. Accel erati on structure – F I RST P A SS • Rotterdam • 20200 TLAS instances... • 5000 BLAS rebuilds... • GPU rebuild 64 ms (!)

  58. W hat to do? • Idea: Reduce instance count • Use a culling heuristic • Accept (some) minor artifacts

  59. Cul l i ng HEURI STI C • Assumtion: • Far away objects not important • Except for large objects • Bridge, building etc • Need some kind of measurement...

  60. Cul l i ng • Project bounding sphere 𝑠 • 𝜄 = 𝑢𝑏𝑜 𝑠 𝑒 • If 𝜄° < 𝑈ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒 ° : Cull 𝑒 𝜄

  61. cul l i ng 𝑠𝑓𝑔𝑓𝑠𝑓𝑜𝑑𝑓 − 𝑜𝑝 𝑑𝑣𝑚𝑚𝑗𝑜𝑕 𝜄 = 4° 𝜄 = 1 5°

  62. cul l i ng Culled Objects 𝑠𝑓𝑔𝑓𝑠𝑓𝑜𝑑𝑓 − 𝑜𝑝 𝑑𝑣𝑚𝑚𝑗𝑜𝑕 𝜄 = 4°

  63. CUL L I NG - RESUL TS • 4 deg culling • 5000 -> 400 BLAS rebuilds each frame • 20000 -> 2800 TLAS instances • TLAS + BLAS build (GPU): 64 ms -> 14.5 ms • Pros • Faster • Cons • Occasional popping • Missing objects

  64. B l as update opti mi zati ons • Still expensive! More ideas: 1. Stagger full and incremental BLAS rebuild • N frames incremental before full rebuild 2. D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_PREFER_FAST_BUILD 3. Avoid redundant rebuilds • Check CS input (bone matrix) • 400 -> 50 • Overlap BLAS update with GFX • Gbuffer, shadowmaps 77

  65. resul ts • TLAS + BLAS build (GPU): 14.5 ms -> 1.15 ms • RayGen (GPU): 0.71 ms -> 0.81 ms (staggered refit + flags) • Much better ☺ 78

  66. SH AD IN G ( OPAQU E) RT ON | SHADING OFF RT ON | SHADING ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend