Practical DirectX 12
- Programming Model and Hardware Capabilities
Practical DirectX 12 - Programming Model and Hardware Capabilities - - PowerPoint PPT Presentation
Practical DirectX 12 - Programming Model and Hardware Capabilities Gareth Thomas & Alex Dunn AMD & NVIDIA Agenda DX12 Best Practices DX12 Hardware Capabilities Questions 2 Expectations Who is DX12 for? Aiming to achieve
2
3
4
all consoles, PC is no different
DX11 DX12 Driver Application
5
6
7
stopping concurrent-use
8
scheduler can submit new ones
9
Example: What happens if not enough work is submitted?
10
Can give you a nice CPU boost
11
3D Queue Compute Queue Copy Queue
12
13
Good Pairing Graphics Compute
Shadow Render (I/O limited) Light culling (ALU heavy)
Poor Pairing Graphics Compute
G-Buffer (Bandwidth limited) SSAO (Bandwidth limited)
14
Unrestricted scheduling creates
Command List
Command List
(depth only) Fence
Command List
Fence
15
Prefer explicit scheduling of async compute tasks through smart use of fences
Command List
Fence
Command List
(Depth Only) Fence
Fence
Command List
Fence
16
17
Pipeline State Objects (PSOs) Root Signature Tables (RSTs)
18
generate the PSOs
hundred milliseconds
19
20
Keep the RST small
Put frequently changed slots first Aim to change one slot per draw call Limit resource visibility to the minimum set of stages
Beware, no bounds checking is done on the RST! Don’t leave resource bindings undefined after a change of Root Signature
21
22
Command Allocators Resources Residency
23
24
25
to place resource
Texture2D Buffer
26
Creating larger heaps
Call MakeResident/Evict per heap
This requires the app to keep track of allocations
free/used ranges of memory in each heap
Heap Texture2D Buffer
27
UpdateTileMappings
place resources. You're stuck until it returns
28
How much vidmem do I have?
App must handle MakeResident fail.
Non-resident read is a page fault! Likely resulting in a fatal crash What to do when there isn’t enough memory?
29
Create overflow heaps in sysmem, and move some resources over from vidmem heaps.
important to keep in vidmem
Idea: Test your application with 2 instances running
Video Memory Heap Texture2D Heap Texture3D Vertex Buffer
System Memory Overflow Heap Vertex Buffer
30
31
Barriers Fences
32
33
34
35
Multi-GPU Swap Chains Set Stable Power State Pixel vs Compute
36
37
Screen
38
HRESULT ID3D12Device::SetStablePowerState( BOOL Enable );
39
No shared memory? Threads complete at same time? High frequency cbuffer accesses? 2D buffer stores? Using group shared memory? Expect out-of-order thread completion? Using high # regs? 1D/3D buffer stores
Pixel Shader Compute Shader
Benefit from depth/stencil rejection? Requires graphics pipeline? Want to leverage color compression? Everything else
Pixel Shader Compute Shader
Best performance gained from following these guidelines (Consider the perf benefit of using async compute)
40
Conservative Rasterization Volume Tiled Resources Raster Ordered Views Typed UAV Loads Stencil Output
41
AMD Radeon NVIDIA GeForce Intel HD Graphics GCN 1.1 GCN 1.2 Kepler Maxwell 2 Skylake Feature Level 12_0 11_0 12_1 12_1 Resource Binding Tier 3 Tier 2 Tier 3 Tiled Resources Tier 2 Tier 1 Tier 3 Tier 3 Typed UAV Loads Yes No Yes Yes Conservative Rasterization No No Tier 1 Tier 3 Rasterizer-Ordered Views No No Yes Yes Stencil Reference Output Yes No Yes UAV Slots full heap 64 full heap Resource Heap Tier 2 Tier 1 Tier 2
42
Draws all pixels a primitive touches
Rasterization“, GPU Gems 2
Shadows”, D3D day - GDC 2015
43
Ray traced shadows in, ‘Tom Clancy’s The Division’, using conservative rasterization
44
updated from the CPU
Extreme memory/performance benefits
Simulation [Alex Dunn, D3D Day – GDC 2015]
45
Ordered writes
SIGGRAPH, 2014]
fixed HW
Use with care! Not free
46
Finally, no more 32-bit restriction from the API May allow you to remove console specific paths in engine Loading from UAV slower than loading from SRV
access
// Can do this e.g. RWTexture2D<float4> // and in conjunction with ROV :) RasterizerOrderedTexture2D<float4>
47
48
adunn@nvidia.com @AlexWDunn #HappyGPU gareth.thomas@amd.com # DX12PerfTweet