SPARSE FLUID SIMULATION IN DIRECTX Alex Dunn Graphics Dev. Tech. - - PowerPoint PPT Presentation

sparse fluid simulation
SMART_READER_LITE
LIVE PREVIEW

SPARSE FLUID SIMULATION IN DIRECTX Alex Dunn Graphics Dev. Tech. - - PowerPoint PPT Presentation

SPARSE FLUID SIMULATION IN DIRECTX Alex Dunn Graphics Dev. Tech. AGENDA We want more fluid in games! Eulerian Fluid Simulation. Sparse Eulerian Fluid. Feature Level 11.3 Enhancements. 2 WHY DO WE NEED FLUID IN GAMES? Replace particle


slide-1
SLIDE 1

Alex Dunn – Graphics Dev. Tech.

SPARSE FLUID SIMULATION IN DIRECTX

slide-2
SLIDE 2

2

AGENDA

We want more fluid in games!  Eulerian Fluid Simulation. Sparse Eulerian Fluid. Feature Level 11.3 Enhancements.

slide-3
SLIDE 3

3

WHY DO WE NEED FLUID IN GAMES?

Replace particle kinematics!

more realistic == better user immersion

More than just eye candy?

game mechanics?

slide-4
SLIDE 4

4

EULERIAN SIMULATION #1

Inject Advect Pressure Vorticity Evolve

2x Velocity 2x Pressure 1x Vorticity My (simple) DX11.0 eulerian fluid simulation:

slide-5
SLIDE 5

5

EULERIAN SIMULATION #2

  • Add fluid to simulation
  • Move data at, XYZ  (XYZ+Velocity)
  • Calculate localized pressure
  • Calculates localized rotational flow
  • Tick Simulation

Inject Advect Pressure Vorticity Evolve

slide-6
SLIDE 6

6

**(some imagination required)**

slide-7
SLIDE 7

7

slide-8
SLIDE 8

8

TOO MANY VOLUMES SPOIL THE…

Fluid isn’t box shaped.

clipping wastage

Simulated separately.

authoring GPU state volume-to-volume interaction

Tricky to render.

slide-9
SLIDE 9

9

slide-10
SLIDE 10

10

PROBLEM!

N-order problem

64^3 = ~0.25m cells 128^3 = ~2m cells 256^3 = ~16m cells …

Applies to:

computational complexity memory requirements

1024 2048 3072 4096 5120 6144 7168 8192 256 512 768 1024 Memory (Mb) Dimensions (X = Y = Z)

Texture3D - 4x16F

And that’s just 1 texture…

slide-11
SLIDE 11

11

BRICKS

Split simulation space into groups of cells (each known as a brick). Simulate each brick independently.

slide-12
SLIDE 12

12

BRICK MAP

Need to track which bricks contain fluid Texture3D<uint> 1 voxel per brick

0  Ignore 1  Simulate Could also use packed binary grids [Gruen15], but this requires atomics

slide-13
SLIDE 13

13

TRACKING BRICKS #1

Initialise using fluid emitters. (easy with primitives)

slide-14
SLIDE 14

14

TRACKING BRICKS #2

Simulating air is important for accuracy. Simulate? = |Velocity| > 0

slide-15
SLIDE 15

15

TRACKING BRICKS #3

Expansion (ignore  simulate)

if { V|x|y|z| > |Dbrick| } expand simulation in that axis

Reduction (simulate  ignore)

inverse of Expansion handled automatically by clear

slide-16
SLIDE 16

16

SPARSE SIMULATION

Inject Advect Pressure Vorticity Evolve*

Clear BrickMap

Fill List Reset all to 0 (ignore) in brick map. Texture3D<uint> g_BrickMapRO; AppendStructredBuffer<uint3> g_ListRW; if(g_BrickMapRO[idx] != 0) { g_ListRW.Append(idx); } Read value from brick map. Append brick coordinate to list if occupied. *Includes expansion

slide-17
SLIDE 17

17

PROBLEM!

N-order problem

64^3 = ~0.25m cells 128^3 = ~2m cells 256^3 = ~16m cells …

Applies to:

computational complexity memory requirements

1024 2048 3072 4096 5120 6144 7168 8192 256 512 768 1024 Memory (Mb) Dimensions (X = Y = Z)

Texture3D - 4x16F

And that’s just 1 texture…

slide-18
SLIDE 18

18

UNCOMPRESSED STORAGE

Allocate everything; forget about unoccupied cells  Pros:

  • simulation is coherent in memory.
  • works in DX11.0.

Cons:

  • no reduction in memory usage.

Simulate Ignore

slide-19
SLIDE 19

19

COMPRESSED STORAGE

Similar to, List<Brick> Pros:

  • good memory consumption.
  • works in DX11.0.

Cons:

  • allocation strategies.
  • indirect lookup.
  • “software translation”
  • filtering particularly costly

Indirection Table Physical Memory

slide-20
SLIDE 20

20

1 Brick = (4)3 = 64

PADDING TO REDUCE EDGE CASES

slide-21
SLIDE 21

21

1 Brick = (1+4+1)3 = 216

  • New problem;
  • “6n2 +12n + 8” problem.

Can we do better?

PADDING TO REDUCE EDGE CASES

slide-22
SLIDE 22

22

ENTER; FEATURE LEVEL 11.3

Volume Tiled Resources (VTR)!  Extends 2D functionality in FL11.2 Must query HW support: (DX11.3 != FL11.3):

ID3D11Device3* pDevice3 = nullptr; pDevice->QueryInterface(&pDevice3); D3D11_FEATURE_DATA_D3D11_OPTIONS2 support; pDevice3->CheckFeatureSupport(D3D11_FEATURE_D3D11_OPTIONS2, &support, sizeof(support)); m_UseTiledResources = support.TiledResourcesTier == D3D11_TILED_RESOURCES_TIER_3;

slide-23
SLIDE 23

23

TILED RESOURCES #1

Pros:

  • nly mapped memory is

allocated in VRAM

  • “hardware translation”
  • logically a volume texture
  • all samplers supported
  • 1 Tile = 64KB (= 1 Brick)
  • fast loads
slide-24
SLIDE 24

24

TILED RESOURCES #2

1 Tile = 64KB (= 1 Brick)

BPP Tile Dimensions 8 64x32x32 16 32x32x32 32 32x32x16

64 32x16x16

128 16x16x16

slide-25
SLIDE 25

25

TILED RESOURCES #3

HRESULT ID3D11DeviceContext2::UpdateTileMappings( ID3D11Resource *pTiledResource, UINT NumTiledResourceRegions, const D3D11_TILED_RESOURCE_COORDINATE *pTiledResourceRegionStartCoordinates, const D3D11_TILE_REGION_SIZE *pTiledResourceRegionSizes, ID3D11Buffer *pTilePool, UINT NumRanges, const UINT *pRangeFlags, const UINT *pTilePoolStartOffsets, const UINT *pRangeTileCounts, UINT Flags );

Letting the driver know which bricks/tiles should be resident:

slide-26
SLIDE 26

26

UPDATE TILE MAPPINGS – TIP

Don’t update all tiles every frame.

const UINT *pRangeFlags

Track tile deltas and use the range flags; Ignore (unmapped)  D3D11_TILE_RANGE_NULL Simulate (mapped)  D3D11_TILE_RANGE_REUSE_SINGLE_TILE Unchanged  D3D11_TILE_RANGE_SKIP

slide-27
SLIDE 27

27

CPU READ BACKS

Taboo in real time graphics CPU read backs are fine, if done correctly! (and bad if not) 2 frame latency (more for SLI) Profile map/unmap calls if unsure

Frame N+2 Frame N+1 Frame N+2 Frame N+3 Frame N Frame N+1 Frame N

N; Data Ready N+1; Data Ready N+2; Data Ready N; Tiles Mapped

CPU: GPU:

slide-28
SLIDE 28

28

LATENCY RESISTANT SIMULATION #1

Naïve Approach:

clamp velocity to Vmax CPU Read-back:

  • ccupied bricks.

2 frames of latency!

extrapolate “probable” tiles.

slide-29
SLIDE 29

29

LATENCY RESISTANT SIMULATION #2

Better Approach:

CPU Read-back:

  • ccupied bricks.

max{|V|} within brick. 2 frames of latency!

extrapolate “probable” tiles.

slide-30
SLIDE 30

30

LATENCY RESISTANT SIMULATION #3

Sparse Eulerian Simulation Read back Brick List

CPU Read back Ready?

Yes Prediction Engine Emitter Bricks No UpdateTile Mappings

CPU GPU

slide-31
SLIDE 31

31

DEMO

slide-32
SLIDE 32

32

PERFORMANCE #1

NOTE: Numbers captured on a GeForce GTX980

2.3 19.9 64.7 0.4 1.8 2.7 2.9 6.0 128 256 384 512 1024

  • Sim. Time (ms)

Grid Resolution Full Grid Sparse Grid

slide-33
SLIDE 33

33

PERFORMANCE #2

NOTE: Numbers captured on a GeForce GTX980

80 640 2,160 5,120 40,960 11 46 57 83 138 128 256 384 512 1024 Memory (MB) Grid Resolution Full Grid Sparse Grid

slide-34
SLIDE 34

Thank you!

Alex Dunn - adunn@nvidia.com Twitter: @AlexWDunn

Other “latency resistant” techniques using tiled resources??