Ra Ray-Trace Traced d Re Refl flect ctio ions ns in in - - PowerPoint PPT Presentation

ra ray trace traced d re refl flect ctio ions ns in in
SMART_READER_LITE
LIVE PREVIEW

Ra Ray-Trace Traced d Re Refl flect ctio ions ns in in - - PowerPoint PPT Presentation

It Just Works: Ra Ray-Trace Traced d Re Refl flect ctio ions ns in in Battlefield V Johannes Deligiannis Jan Schmid EA DICE *PL ACEHOL DER* * PLAY GAMESCOM TRAILER OR SIMILAR * TODAY we present Raytracing Project


slide-1
SLIDE 1

”It Just Works”: Ra Ray-Trace Traced d Re Refl flect ctio ions ns in in ’Battlefield V’

Johannes Deligiannis Jan Schmid EA DICE

slide-2
SLIDE 2

*PL ACEHOL DER*

* PLAY GAMESCOM TRAILER OR SIMILAR *

slide-3
SLIDE 3

TODAY we present Raytracing

  • Project background
  • GPU Raytracing Pipeline
  • Engine integration of DXR
  • GPU Performance
slide-4
SLIDE 4

Battlefield V

  • FPS set in WWII
  • Released Nov 2018
  • Raytracing work began Dec 2017
  • First DXR game released!
slide-5
SLIDE 5

Project Background

  • ~10 months dev time
  • Use DXR in Battlefield V
  • AO
  • GI
  • Shadows
  • Reflections
  • Engineering
  • Yasin Uludag (EA DICE)
  • Johannes Deligiannis (EA DICE)
  • Jiho Choi (NVIDIA)
  • Pawel Kozlowski (NVIDIA)
  • And a bunch of other people! ☺
slide-6
SLIDE 6

Main Challanges

  • Not a Tech Demo
  • Content is set
  • Game in full production
  • Scope of Engine changes
  • Performance
  • Denoising vs Ray Count
  • No RTX cards
  • Early adopter tax
  • API not final
  • Driver hang/bugs
  • BSoD
  • No capture tool (Nsight, Pix)
  • But we shipped it☺
slide-7
SLIDE 7

10

slide-8
SLIDE 8

(simple) raytracing pipeline

11 Generate Rays Intersect/Material Data Light Rays Light Combine

slide-9
SLIDE 9

Generate Rays

12 G Buffer

Lookup Texture

*Tomasz Stachowiak and Yasin Uludag, Siggraph 2015. “Stochasti hastic c Screen en-Space Space Reflect ection ions”

slide-10
SLIDE 10

Raytracing

MAGIC

13

slide-11
SLIDE 11

Light Rays

14 float4 light(MaterialData surfaceInfo, float3 rayDir) { foreach (light : pointLights) radiance += calcPoint(surfaceInfo, rayDir, light); foreach (light : spotLights) radiance += calcSpot(surfaceInfo, rayDir, light); foreach (light : reflectionVolumes) radiance += calcReflVol(surfaceInfo, rayDir, light); … }

slide-12
SLIDE 12

Light Combine

15 Lit Raster result

Lookup Texture

slide-13
SLIDE 13

unhappy

Bad bad bad, very sad crying faces

16 Very Noisy Rays Contribute Less Sloooow

slide-14
SLIDE 14

Improving raytracing pipeline

17 Generate Rays Intersect/Material Data Light Rays Light Combine Variable Rate Tracing

slide-15
SLIDE 15

Variable Rate Tracing

18 .5 .5 .5 1 1 .5 .5 .5 Max Ratio Normalize .1 .1 .1 .2 .2 .1 .1 .1 Classify

128 128 128 256 256 128 128 128

slide-16
SLIDE 16

Variable Rate Tracing

19 256 rays 128 rays 64 rays 32 rays

slide-17
SLIDE 17

Variable Rate Tracing

20

Success!

  • More Rays on Water
  • More Rays on grazing

angles

slide-18
SLIDE 18

Problem

21

slide-19
SLIDE 19

Improving raytracing pipeline

22 Generate Rays Intersect/Material Data Light Rays Light Combine Variable Rate Tracing Ray Binning

slide-20
SLIDE 20

Ray Binning

23 Bin Index

3

Screen Offset Angle

012

slide-21
SLIDE 21

Ray Binning

24

slide-22
SLIDE 22

Ray Binning

25 Bin 3011 Bin 3013 Bin 3011 Rays 1 Local Offsets 2 1 Bin 3011 Bin 3012 Bin 3013 Atomic Increment Ray 1000 Ray 1001 Ray 1002

slide-23
SLIDE 23

Ray Binning

26 2 1 Bin 3011 Bin 3012 Bin 3013 1 Local Offsets 1000 1002 1002 Bin 3011 Bin 3012 Bin 3013 Exclusive Parallel Sum * *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA”

slide-24
SLIDE 24

Ray Binning

27 1 Local Offsets 1000 1002 1002 Bin 3011 Bin 3012 Bin 3013 Ray 1000 Ray 1002 Ray 1001 Rays Lookup Add Add Add

slide-25
SLIDE 25

Problem

28

slide-26
SLIDE 26

Improving raytracing pipeline

29 Generate Rays Intersect/Material Data Light Rays Light Combine Variable Rate Tracing Ray Binning SSR Hybridization

slide-27
SLIDE 27

SS SS-Hybridization

30

Rays

Hierarchical Screen Space Trace Miss Give Up Rejected Intersect/Material Data Material Data Light Material

Radiance

Material Data [Stachowiak et al 15] "Stochastic Screen-Space Reflections"

slide-28
SLIDE 28

SS SS-Hybridization

31

slide-29
SLIDE 29

SS SS-Hybridization

32

slide-30
SLIDE 30

Problem

33 Hit Miss Hit Miss Raytrace Hit Hit Hit Hit Hit Hit Miss Miss Miss Miss Miss Miss Busy Idle Busy Idle Light Shader Wavefront Busy Busy Busy Busy Busy Busy Idle Idle Idle Idle Idle Idle

slide-31
SLIDE 31

Improving raytracing pipeline

34 Generate Rays Intersect/Material Data Light Rays Light Combine Variable Rate Tracing Defrag Ray Binning SSR Hybridization

slide-32
SLIDE 32

Defrag

35 1 1 1 1 1 1 Hit Hit Hit Hit Hit Hit 1 1 2 2 3 4 2 4 5 6 5 Exclusive Parallel Sum * *Mark Harris, Shubhabrata Sengupta, and John Owens. “Parallel Prefix Sum (Scan) with CUDA”

slide-33
SLIDE 33

Problem

36 Light Shader Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy Busy

2.0ms

slide-34
SLIDE 34

Improving raytracing pipeline

37 Generate Rays Intersect/Material Data Per Cell Light List Lighting Light Combine Variable Rate Tracing Defrag Ray Binning SSR Hybridization Light Rays

slide-35
SLIDE 35

Per Cell Light Lists

38 Light 3 Next Light 2 Next Light 3 Next Light 1 Next Light 0 Next

slide-36
SLIDE 36

Problem

39

slide-37
SLIDE 37

Improving raytracing pipeline

40 Generate Rays Intersect/Material Data Per Cell Light List Lighting Light Combine Variable Rate Tracing Defrag Denoise Ray Binning SSR Hybridization

slide-38
SLIDE 38

Denoising

41

BRDF Filter Temporal Filter

Reuse Spatial Information Reuse Temporal Information

[Stachowiak et al 15] "Stochastic Screen-Space Reflections"

slide-39
SLIDE 39

BRDF Denoise Filter

42 Kernel Size???? 𝑀0 ≈ 𝐺𝐻 σ𝑙=1

𝑂

𝑀𝑗 𝑚𝑙 𝑔

𝑡 𝑚𝑙 → 𝑤 cos Θ𝑚𝑙

𝑞𝑙 σ𝑙=1

𝑂

𝑔

𝑡 𝑚𝑙 → 𝑤 cos Θ𝑚𝑙

𝑞𝑙

slide-40
SLIDE 40

BRDF Denoise Filter

43

?????

slide-41
SLIDE 41

BRDF Denoise Filter

44

slide-42
SLIDE 42

BRDF Denoise Filter

45 Frame N Frame N -1

slide-43
SLIDE 43

BRDF Denoise Filter

46

Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad Pad

Actual: 6 Actual: 6 Actual: 16 Actual: up to 13

slide-44
SLIDE 44

BRDF Denoise Filter

47

slide-45
SLIDE 45

Temporal Denoise Filter

48 Is it a good sample? If only... BRDF Denoiser!

slide-46
SLIDE 46

temporal Denoise Filter

49

Still Noisy

slide-47
SLIDE 47

Image Denoise Filter

50 Generate LUT { angle, roughness } to { width, height } for unit length ray

slide-48
SLIDE 48

Image Denoise Filter

51

∗ =

slide-49
SLIDE 49

Image Denoise Filter

52

∗ =

1 2 1 2

slide-50
SLIDE 50

Image Denoise Filter

53

slide-51
SLIDE 51

New Pipeline

54 Variable Rate Tracing 0.37ms Generate Rays 0.19ms Ray Binning 0.15ms Screen Space Hybrid 0.36ms Intersect/ Material Data 1.98ms

slide-52
SLIDE 52

New Pipeline

55 Intersect/ Material Data 1.98ms Defrag 0.08ms ‘Improved’ Lighting 0.46ms Spatial Filter 1.45ms Temporal Filter 0.24ms Image Filter 1.00ms

6.29ms total

slide-53
SLIDE 53

56

slide-54
SLIDE 54

D XR – a.k .a ” BLAC K BO X”

Intersection Shading No DXR

slide-55
SLIDE 55

DXR b asi cs

  • BLAS - Bottom Level

Acceleration Structure

  • TLAS - Top Level

Acceleration Structure

  • CS
  • Skinning, Destruction
  • Compute shader
  • Update each frame
  • Blas can update incrementally

𝑑1,1 ⋯ 𝑑4,1 ⋮ ⋱ ⋮ 𝑏1,4 ⋯ 𝑑4,4 𝑐1,1 ⋯ 𝑐4,1 ⋮ ⋱ ⋮ 𝑏1,4 ⋯ 𝑐4,4 𝑏1,1 ⋯ 𝑏4,1 ⋮ ⋱ ⋮ 𝑏1,4 ⋯ 𝑏4,4 𝑒1,1 ⋯ 𝑒4,1 ⋮ ⋱ ⋮ 𝑒1,4 ⋯ 𝑒4,4

x x

TLAS

A D B C A A D

CS

BLAS

D B C A A D

slide-56
SLIDE 56

ACCEL ERATI ON STRUCTURE

  • Which objects?
  • Frustum Culling
  • Occlusion Culling
  • Easy... no culling!
slide-57
SLIDE 57

Accel erati on structure – F I RST P A SS

  • Rotterdam
  • 20200 TLAS instances...
  • 5000 BLAS rebuilds...
  • GPU rebuild 64 ms (!)
slide-58
SLIDE 58

W hat to do?

  • Idea: Reduce instance count
  • Use a culling heuristic
  • Accept (some) minor artifacts
slide-59
SLIDE 59

Cul l i ng HEURI STI C

  • Assumtion:
  • Far away objects not important
  • Except for large objects
  • Bridge, building etc
  • Need some kind of measurement...
slide-60
SLIDE 60

Cul l i ng

  • Project bounding sphere
  • 𝜄 = 𝑢𝑏𝑜

𝑠 𝑒

  • If 𝜄°< 𝑈ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒° : Cull

𝜄 𝑒 𝑠

slide-61
SLIDE 61

𝜄 = 15°

cul l i ng

𝜄 = 4° 𝑠𝑓𝑔𝑓𝑠𝑓𝑜𝑑𝑓 − 𝑜𝑝 𝑑𝑣𝑚𝑚𝑗𝑜𝑕

slide-62
SLIDE 62

cul l i ng

𝜄 = 4° 𝑠𝑓𝑔𝑓𝑠𝑓𝑜𝑑𝑓 − 𝑜𝑝 𝑑𝑣𝑚𝑚𝑗𝑜𝑕 Culled Objects

slide-63
SLIDE 63

CUL L I NG - RESUL TS

  • 4 deg culling
  • 5000 -> 400 BLAS rebuilds each frame
  • 20000 -> 2800 TLAS instances
  • TLAS + BLAS build (GPU): 64 ms -> 14.5 ms
  • Pros
  • Faster
  • Cons
  • Occasional popping
  • Missing objects
slide-64
SLIDE 64

77

B l as update

  • pti mi zati ons
  • Still expensive! More ideas:

1. Stagger full and incremental BLAS rebuild

  • N frames incremental before full rebuild

2. D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_PREFER_FAST_BUILD 3. Avoid redundant rebuilds

  • Check CS input (bone matrix)
  • 400 -> 50
  • Overlap BLAS update with GFX
  • Gbuffer, shadowmaps
slide-65
SLIDE 65

78

resul ts

  • TLAS + BLAS build (GPU): 14.5 ms -> 1.15 ms
  • RayGen (GPU): 0.71 ms -> 0.81 ms (staggered refit + flags)
  • Much better ☺
slide-66
SLIDE 66

SH AD IN G ( OPAQU E)

RT ON | SHADING OFF RT ON | SHADING ON

slide-67
SLIDE 67

Raytraci ng Requi rements

  • Shader output must match!
  • ClosestHit Shader
  • AnyHit Shader

80 Raster Raytrace Same?

slide-68
SLIDE 68

Shaders i n F ROSTB I TE

  • VS – Handwritten
  • PS - Shader Graphs
  • Graph -> .hlsl
  • Manual conversion... no
  • 1000s of shaders
  • Auto VS + PS to HitGroup

81

slide-69
SLIDE 69

PS - ShaderGraph

[shader(”closesthit")] void chMain() { }

Hi t shader templ ate

V0 = vsMain(IA0) V1 = vsMain(IA1) V2 = vsMain(IA2) V = lerp(V0, V1, V2, U, V, W) P = psMain(V) writePayload(P) IA2 = iaMain(id + 2) IA1 = iaMain(id + 1) IA0 = iaMain(id + 0) VS – VertexFragment World Space Normal Vertex buffer UV, Normal

#define ddx(x) x #define ddy(x) x

... ¯ \_(ツ)_/¯

#define Sample(s, uv) SampleLevel(s, uv, 0) ... ddx/ddy?

...clip?

Texture MIP?

slide-70
SLIDE 70

AL PHA TESTi ng

84

  • AnyHit Shader:
  • If (AlphaTest(alphaValue))

IgnoreHit();

slide-71
SLIDE 71

AL PHA TEST

86 ANY HIT OFF ANY HIT ON

slide-72
SLIDE 72

AL PHA TEST

87

slide-73
SLIDE 73

Al pha testi ng vi deo

slide-74
SLIDE 74

89

Summary opaque

  • Closest Hit Shader
  • Always
  • Any Hit Shader (Optional)
  • Alpha tested
  • Compute Shader (Optional)
  • Skinning, destruction etc
slide-75
SLIDE 75

RAY PAYL OAD

struct GbufferPayloadPacked { uint data0; // R10G10B10A2_UNORM uint data1; // R8G8B8A8_SRGB uint data2; // R8G8B8A8_UNORM uint data3; // R11G11B10_FLOAT float hitT; // Ray length };

  • Payload: returned on ray intersection
  • Same format as Gbuffer RTV
  • Contains Material Data
  • Normal
  • Base Color
  • Smoothness
  • …etc
slide-76
SLIDE 76

Veri f yi ng correctness

  • 1. Rasterizer output
  • 2. Shoot primary rays in to scene
  • 3. Compare Payload with Gbuffer
  • 4. Non zero output? Bug!
  • 5. Fix bug

92

  • Gbuffer (BaseColor)

Reference Raytraced (BaseColor) Primary Rays

Delta

=

slide-77
SLIDE 77

97

Shader compi l ati on

  • All shaders generated ☺
  • ~3000 per level
  • ~250 per frame
  • Single RT PSO
  • Runtime compile times?

Color coded Closest Hit Shaders

slide-78
SLIDE 78

P so generati on

  • Dx12 GFX PSO...
  • ... DXR 3000 shaders 
  • Compile times?
  • Majority > 100ms
  • Cold cache
  • 7 min 30 sec thread time
  • 6 threads: 1 min 30 sec
  • Warm cache
  • 1 min 30 sec thread time
  • 6 threads: 15 sec

milliseconds

slide-79
SLIDE 79

Parti cl es

101 Smoke, Fire and Exposions. Important elements in BFV!

slide-80
SLIDE 80

102

Parti cl es

  • Particle = Transparent+Billboard
  • Basic algorithm
  • 1. Shoot ray in Opaque TLAS
  • 2. Shoot again in Particle TLAS

(Max ray length from Opaque)

  • 3. Blend particles with opaque hit
slide-81
SLIDE 81

103

  • Camera aligned billboards 
  • Rotate odd particles 90 deg around Y?

THE Prob l em w i th Parti cl es

Billboards visible when viewed from the side.

slide-82
SLIDE 82

104 R o t a t e d b i l l b o a r d s Before: Billboards visible in reflection After: Rotating odd quads produces a more volumetric look

slide-83
SLIDE 83

105 P e r f o r m a n c e

  • Accumulate intersections along ray
  • 1 rpp => N rpp 
  • RayGen loop
  • Sounds... expensive?

*... init ray using opaqueHitT and currT* for (hitCount = 0; hitCount < MaxIntersectionCount; ++hitCount) { ... ForwardPayloadPacked forwardPayloadPacked; initForwardPayloadPacked(forwardPayloadPacked); TraceRay(g_tlasPartices, 0, 0xFF, 0, 1, 0, ray, forwardPayloadPacked); ForwardPayload forwardPayload = unpackForwardPayload(forwardPayloadPacked); if (forwardPayload.hitT <= 0.0f) // Miss, tracing done break; * ... update ray using forwardPayload.hitT, accumulate color, alpha * }

RayGen Shader

slide-84
SLIDE 84

106 T H E ( s e c o n d ) P r o b l e m w i t h P a r t i c l e s RayGen loop 0.96ms

slide-85
SLIDE 85

107 O p t i m i z i n g p a r t i c l e s

  • Loop Idea: AnyHit shader?
  • Same... but different
  • Inspired by WBOIT*
  • Weight = max(luminance,r,g,b

alpha)

  • Emissive, fire

... init ray using maxT and currT TraceRay(g_tlasPartices, 0, 0xFF, 0 , 1, 0, ray, forwardPayloadPacked); * ... process payload and calculate weighted average *

RayGen Shader Any Hit Shader

struct Attributes { float2 barycentrics; }; [shader("anyhit")] void main(inout ForwardPayloadPacked payloadPacked, in Attributes attributes) { *... Calculate color, transparency * payloadPacked.alpha += alpha * weight payloadPacked.color += color * weight; payloadPacked.weight += weight; IgnoreHit(); }

*Weighted Blended Order-Independent Tranparency: http tp:/ ://jcg /jcgt.or t.org/p g/publish lished/ ed/00 0002/0 02/02/09/ 2/09/

slide-86
SLIDE 86

109 P a r t i c l e s - R E S U L T S ’Naive’ Closest Hit 0.96ms Slow but accurate Order Independent AnyHit 0.34ms Really fast, but slightly different look

slide-87
SLIDE 87

1 1

THANK YOU!

...Any Questions?